Dataset title: National Identity Dataset (Making Identity Count) for Estonia, 1990–2020 Dataset author(s): Kilp, Alar; Pavlova, Elena; Nurseitova, Aigerim; Mölder, Martin; Vilson, Maili; Belova-Dalton, Oksana; Pääbo, Heiko; Tadevosyan, Azniv; Kama, Epp; Vain, Kristiina; Aavik, Anna-Lisa; Holm, Mailiis Astrid; Oks, Liisa; Peterson, Betti Marie; Stepanova, Diana; Abzalova, Albina; Dubov, Vladislav; Klimenko, Klim. Dataset contact person: Martin Mölder (University of Tartu), martin.molder@ut.ee Dataset license: this dataset is distributed under CC-BY 4.0 Date of publication: 20.01.2026 Project information: PRG1052 "National identity and Estonian-Russian relations: a longitudinal study of elite and mass discourses" (01.01.2021−31.12.2025); Principal Investigator: Alar Kilp; University of Tartu, Faculty of Social Sciences, Johan Skytte Institute of Political Studies; Financier: Estonian Research Council. For more information, see: https://sisu.ut.ee/makingidentitycount/ Dataset files ============= These are the main data files that contain the quotes with all the relevant information that was the main input for the analysis and the writing of the national identity reports that were published as part of the project. - MIC_EST_Estonian_codes_1.0.csv Estonian language data from Estonia. - MIC_EST_Russian_codes_1.0.csv Russian language data from Estonia. - MIC_RUS_codes_1.0.csv Data from Russia 2020. The following dataset documentation considers these three datasets in common, as the logic of their compilation and the variables included are the same. Dataset documentation ===================== Dataset summary --------------- This dataset was compiled within the research project PRG1052 "National Identity and Estonian–Russian Relations: a Longitudinal Study of Elite and Mass Discourses" (2021–2025) and is a longitudinal dataset on national identity created using the Making Identity Count methodology. The dataset is divided into two language-specific subsets - Estonian-language and Russian-language data - and spans five reference years - 1990, 1995, 2000, 2010, and 2020 - covering six distinct genres: history textbooks, speeches by political leaders, opinion articles and letters to the editor in major daily newspapers, the most-watched films, and the best-selling works of fiction. The compilation of the national identity dataset followed the standardized approach and methodology of Making Identity Count. The resulting National Identity Reports are publicly available at: https://hdl.handle.net/10062/108183. An overview of the project's theoretical framework, content, methodology, and results can be found at: https://sisu.ut.ee/makingidentitycount/. In addition to Estonian data (in Estonian and Russian language), this dataset also contains data for Russia 2020. More information about the specific sources that were coded for each particular year is available from the abovementioned National Identity Reports for the corresponding years. The authors of the particular years (Estonian and Russian language) for the datasets are the following: 1990 Kilp, Alar; Anna-Lisa Aavik, Epp Kama, Liisa Oks, Betti Marie Peterson, Kristiina Vain (2025). "Estonia 1990 National Identity Dataset (Making Identity Count)" Nurseitova, Aigerim; Oksana Belova-Dalton, Vladislav Dubov, Elena Pavlova, Azniv Tadevosyan (2024). "Estonia 1990 Russophone Identity Dataset (Making Identity Count)" 1995 Kilp, Alar; Anna-Lisa Aavik, Mailiis Astrid Holm, Epp Kama, Kristiina Vain, Maili Vilson (2025). "Estonia 1995 National Identity Dataset (Making Identity Count)" Nurseitova, Aigerim; Elena Pavlova, Diana Stepanova (2023). "Estonia 1995 Russophone Identity Dataset (Making Identity Count)" 2000 Kilp, Alar; Oksana Belova-Dalton, Anna-Lisa Aavik, Epp Kama, Liisa Oks, Betti Marie Peterson, Kristiina Vain, Maili Vilson (2025). "Estonia 2000 National Identity Dataset (Making Identity Count)" Nurseitova, Aigerim; Albina Abzalova, Vladislav Dubov, Elena Pavlova, Regina Petrova, Diana Stepanova, Azniv Tadevosyan (2023). "Estonia 2000 Russophone Identity Dataset (Making Identity Count)" 2010 Kilp, Alar; Anna-Lisa Aavik, Epp Kama, Kristiina Vain, Maili Vilson (2025). "Estonia 2010 National Identity Dataset (Making Identity Count)" Nurseitova, Aigerim; Albina Abzalova, Vladislav Dubov, Elena Pavlova, Regina Petrova, Diana Stepanova (2025). "Estonia 2010 Russophone Identity Dataset (Making Identity Count)" 2020 Mölder, Martin; Anna-Lisa Aavik, Epp Kama, Alar Kilp, Viacheslav Morozov, Heiko Pääbo, Kristiina Vain, Maili Vilson (2025). "Estonia 2020 National Identity Dataset (Making Identity Count)" Nurseitova, Aigerim; Vladislav Dubov, Klim Klimenko, Elena Pavlova, Azniv Tadevosyan (2025). "Estonia 2020 Russophone Identity Dataset (Making Identity Count)" The Russia 2020 dataset has the following list of authors: Nurseitova, Aigerim; Pavlova, Elena; Tadevosyan, Azniv (2025). "Russia 2020 National Identity Dataset (Making Identity Count)" Codebook -------- The datasets contain the following variables: 'id' Unique identifier for the quote (function of coding year, source type, source title and sequence number). 'code' The initial code (label) that was assigned to the quote by the coder. 'code_final' The final aggregated code that was assigned during the process of compiling the national identity reports. 'valence' The final valence of the quote. Valences were adjusted in the process of assigning final codes, and "asp" (aspirational) and "av" (aversive) were re-coded as positive and negative. The symbols for the variable mean the following: "+" positive, "-" negative, "~" ambiguous, "/" neutral. 'quote' The quote that was coded from the source text. 'author' The author of the text. 'text_title' The title of the text. 'page_number' The page number of the quote (if applicable). 'time' The time of the quote (if applicable, i.e. in the case of films). 'source_title' The title of the source. 'source_year' The year of the source. 'source_month_day' The month and day of the source. 'source_number' The number of the source. 'source_type' The type of source ('newspaper: op-ed', 'newspaper: letter', 'film', 'speech', 'textbook', 'novel'; Estonian Russian language sources additionally included the type 'magazine'; the data set for Russia 2020 does not distinguish between op-eds and letters and just contains a general category 'newspaper') 'original_language' The original language of the source. 'coding_year' The coding year (1990, 1995, 2000, 2010, 2020). Can be different from the source_year, which indicate the year of publication of the source. 'country' The country that is coded (Estonia or Russia). Version notes ============= Version 1.0 This is the initial public release of the dataset at the end of the project.