BWRITE: Academic Writing in the Baltic States: Rhetorical Structures through culture(s) and languages. IMRaD or non IMRaD structures Authors: Susman, Margaux*; Hint, Helen**; Groom, Nicholas***; Leijen, Djuddah** *University of Bergen, Norway ** University of Tartu, Estonia *** Birmingham University, UK Corresponding author: Djuddah Leijen: djuddah.leijen@ut.ee University of Tartu, College of Foreign Languages and Cultures Lossi 3, 54001, Tartu, Estonia General Introduction: The data is used to investigate the structures of Estonian and Lithuanian Master's theses, focusing in particular on the question of whether and to what extent these texts adhere to, or depart from, the standard IMRaD structure. By applying an automatic analysis, we ask whether elements of text structures can be detected automatically, and if so, with which types of methods. As we work both with Estonian and Lithuanian data, we also ask whether it is feasible to develop a method which would work independently of the language used. The dataset contains data collected from the public library repositories at the University of Tartu and Vilnius University. It is being made public to as as supplementary data for the publication submitted to PLOS One and in line with Open Data Access under the requirements of the EMP475 Bwrite Project. This project has been made possible by the EEA Grants and Norway Grants. Purpose of the datasets: Dataset 01 - Classification:The data in these different folders was used in the different classification tasks performed with the aim to uncover the organizational structures of MA theses in two Baltic languages. Dataset 02 - Clustering: The data contained in these files was used in the clustering task performed with the aim to uncover the organizational structures of MA theses in two Baltic languages. Description of the data in this data set: Dataset 01 - Classification: X.csv files contain the features feeding the models while y.csv files contain the groundtruths. Dataset 02 - Clustering: X.csv files contain the features feeding the models while y.csv files contain the groundtruths and doc_names.csv contain the original documents name.