############################################## # # # _______ _____ _ __ _____ # # |__ __| | ___| | |/ / | ___| # # | | | |_ | / | |_ # # | | | _| | < | _| # # | | | |___ | \ | |___ # # |_| |_____| |_|\_\ |_____| # # # # Eesti teismeliste keele korpus # # Estonian Teen Language Corpus # # # ############################################## The TeKE corpus contains spoken language and instant messaging (IM) conversations collected from Estonian teenagers in the period 2020-2022. The files are described in more detail in the metadata files (teke_spoken_metadata.txt and teke_chat_metadata.txt). This corpus version (v.1.0) was compiled on February 9, 2024. The repository contains the following folders: 1) teke_doc: Folder containing metadata which describes the participants (teke_participants.csv), recordings (teke_recordings.csv), labellers (teke_labellers.txt), transcription and chat metadata (teke_chat_metadata.txt and teke_spoken_metadata.txt). 2) spoken_eaf.zip 3) spoken_tsv.zip 4) chat_tsv.zip 5) chat_html.zip 6) chat_pictures.zip Spoken language transcriptions are available in eaf and tsv format (spoken_eaf, spoken_tsv); Instant messaging (IM) conversations are available in tsv and html format (chat_tsv, chat_html). Pictures used in IM conversations are in the chat_pictures.zip folder; in the IM files, these are identified through unique codes. Sound recordings in wav or mp3 format are not included in the repository for reasons of privacy. ################ ACCESS ################ The corpus includes personal data (in both metadata and the content of conversations). For this reason, they are accessible only by permission, through a fixed-term license for research purposes only. To access spoken language transcripts and chat data, please sign a non-disclosure agreement with the Institute of Estonian and General Linguistics of the University of Tartu. People granted access to data must agree not to redistribute the files and must always keep their copies safe. They must not publish personal data. Applicants who wish to access the files are invited to fill out an agreement form and send it to teke@ut.ee. If you have any questions or concerns,please contact Virve Vihman (virve.vihman@ut.ee). Please contact Virve Vihman (virve.vihman@ut.ee) if you are seeking access to the corpus for non-academic purposes. ################ CITING THIS WORK ################ Please cite this work as: Vihman, Virve-Anneli, Maarja-Liisa Pilvik, Aive Mandel, Annika Kängsepp, Mari Aigro, Kadri Koreinik, Kristiina Praakli, Liina Lindström. 2023. Estonian Teen Language Corpus v.1.0. Institute of Estonian and General Linguistics, University of Tartu. (Add DOI of version used) Please add DOI to this citation! ################ CONTACT ################ For more information, visit https://www.teismelistekeel.ee/ or contact us directly: Virve-Anneli Vihman Associate Professor of Psycholinguistics (UT) virve.vihman@ut.ee Mari Aigro Researcher in Morphosyntax (UT) mari.aigro@ut.ee