Dataset title: National Identity Dataset (Making Identity Count) for Estonia,
1990–2020

Dataset author(s): Kilp, Alar; Pavlova, Elena; Nurseitova, Aigerim; Mölder,
Martin; Vilson, Maili; Belova-Dalton, Oksana; Pääbo, Heiko; Tadevosyan, Azniv;
Kama, Epp; Vain, Kristiina; Aavik, Anna-Lisa; Holm, Mailiis Astrid; Oks, Liisa;
Peterson, Betti Marie; Stepanova, Diana; Abzalova, Albina; Dubov, Vladislav;
Klimenko, Klim.

Dataset contact person: Martin Mölder (University of Tartu), martin.molder@ut.ee

Dataset license: this dataset is distributed under CC-BY 4.0

Date of publication: 20.01.2026

Project information: PRG1052 "National identity and Estonian-Russian relations:
a longitudinal study of elite and mass discourses" (01.01.2021−31.12.2025);
Principal Investigator: Alar Kilp; University of Tartu, Faculty of Social
Sciences, Johan Skytte Institute of Political Studies; Financier: Estonian
Research Council. For more information, see:
https://sisu.ut.ee/makingidentitycount/

Dataset files
=============

These are the main data files that contain the quotes with all the relevant
information that was the main input for the analysis and the writing of the
national identity reports that were published as part of the project.

-   MIC_EST_Estonian_codes_1.0.csv

    Estonian language data from Estonia.

-   MIC_EST_Russian_codes_1.0.csv

    Russian language data from Estonia.

-   MIC_RUS_codes_1.0.csv

    Data from Russia 2020.

The following dataset documentation considers these three datasets in common,
as the logic of their compilation and the variables included are the same.

Dataset documentation
=====================

Dataset summary
---------------

This dataset was compiled within the research project PRG1052 "National
Identity and Estonian–Russian Relations: a Longitudinal Study of Elite and
Mass Discourses" (2021–2025) and is a longitudinal dataset on national identity
created using the Making Identity Count methodology. The dataset is divided
into two language-specific subsets - Estonian-language and Russian-language
data - and spans five reference years - 1990, 1995, 2000, 2010, and 2020 -
covering six distinct genres: history textbooks, speeches by political leaders,
opinion articles and letters to the editor in major daily newspapers, the
most-watched films, and the best-selling works of fiction. The compilation of
the national identity dataset followed the standardized approach and
methodology of Making Identity Count. The resulting National Identity Reports
are publicly available at: https://hdl.handle.net/10062/108183. An overview of
the project's theoretical framework, content, methodology, and results can be
found at: https://sisu.ut.ee/makingidentitycount/.

In addition to Estonian data (in Estonian and Russian language), this dataset
also contains data for Russia 2020.

More information about the specific sources that were coded for each particular
year is available from the abovementioned National Identity Reports for the
corresponding years.

The authors of the particular years (Estonian and Russian language) for the
datasets are the following:

1990

Kilp, Alar; Anna-Lisa Aavik, Epp Kama, Liisa Oks, Betti Marie Peterson,
Kristiina Vain (2025). "Estonia 1990 National Identity Dataset (Making
Identity Count)"

Nurseitova, Aigerim; Oksana Belova-Dalton, Vladislav Dubov, Elena Pavlova,
Azniv Tadevosyan (2024). "Estonia 1990 Russophone Identity Dataset (Making
Identity Count)"

1995

Kilp, Alar; Anna-Lisa Aavik, Mailiis Astrid Holm, Epp Kama, Kristiina Vain,
Maili Vilson (2025). "Estonia 1995 National Identity Dataset (Making Identity
Count)"

Nurseitova, Aigerim; Elena Pavlova, Diana Stepanova (2023). "Estonia 1995
Russophone Identity Dataset (Making Identity Count)"

2000

Kilp, Alar; Oksana Belova-Dalton, Anna-Lisa Aavik, Epp Kama, Liisa Oks, Betti
Marie Peterson, Kristiina Vain, Maili Vilson (2025). "Estonia 2000 National
Identity Dataset (Making Identity Count)"

Nurseitova, Aigerim; Albina Abzalova, Vladislav Dubov, Elena Pavlova, Regina
Petrova, Diana Stepanova, Azniv Tadevosyan (2023). "Estonia 2000 Russophone
Identity Dataset (Making Identity Count)"

2010

Kilp, Alar; Anna-Lisa Aavik, Epp Kama, Kristiina Vain, Maili Vilson (2025).
"Estonia 2010 National Identity Dataset (Making Identity Count)"

Nurseitova, Aigerim; Albina Abzalova, Vladislav Dubov, Elena Pavlova, Regina
Petrova, Diana Stepanova (2025). "Estonia 2010 Russophone Identity Dataset
(Making Identity Count)"

2020

Mölder, Martin; Anna-Lisa Aavik, Epp Kama, Alar Kilp, Viacheslav Morozov,
Heiko Pääbo, Kristiina Vain, Maili Vilson (2025). "Estonia 2020 National
Identity Dataset (Making Identity Count)"

Nurseitova, Aigerim; Vladislav Dubov, Klim Klimenko, Elena Pavlova, Azniv
Tadevosyan (2025). "Estonia 2020 Russophone Identity Dataset (Making Identity
Count)"

The Russia 2020 dataset has the following list of authors:

Nurseitova, Aigerim; Pavlova, Elena; Tadevosyan, Azniv (2025). "Russia 2020
National Identity Dataset (Making Identity Count)"

Codebook
--------

The datasets contain the following variables:

'id'                Unique identifier for the quote (function of coding year,
                    source type, source title and sequence number).

'code'              The initial code (label) that was assigned to the quote
                    by the coder.

'code_final'        The final aggregated code that was assigned during the
                    process of compiling the national identity reports.

'valence'           The final valence of the quote. Valences were adjusted in
                    the process of assigning final codes, and "asp"
                    (aspirational) and "av" (aversive) were re-coded as
                    positive and negative. The symbols for the variable mean
                    the following: "+" positive, "-" negative, "~" ambiguous,
                    "/" neutral.

'quote'             The quote that was coded from the source text.

'author'            The author of the text.

'text_title'        The title of the text.

'page_number'       The page number of the quote (if applicable).

'time'              The time of the quote (if applicable, i.e. in the case of
                    films).

'source_title'      The title of the source.

'source_year'       The year of the source.

'source_month_day'  The month and day of the source.

'source_number'     The number of the source.

'source_type'       The type of source ('newspaper: op-ed', 'newspaper:
                    letter', 'film', 'speech', 'textbook', 'novel'; Estonian
                    Russian language sources additionally included the type
                    'magazine'; the data set for Russia 2020 does not
                    distinguish between op-eds and letters and just contains
                    a general category 'newspaper')

'original_language' The original language of the source.

'coding_year'       The coding year (1990, 1995, 2000, 2010, 2020). Can be
                    different from the source_year, which indicate the year
                    of publication of the source.

'country'           The country that is coded (Estonia or Russia).

Version notes
=============

Version 1.0

This is the initial public release of the dataset at the end of the project.