####################################################### # _____ _ __ ___ _ __ _____ _ __ # # | ___| | |/ / / __| | |/ / | ___| | |/ / # # | |___ | / | /_ | / | |___ | / # # | ___| | < \_ \ | < | ___| | < # # | |___ | \ __| | | \ | | | \ # # |_____| |_|\_\ |___/ |_|\_\ |_| |_|\_\ # # TÜ eesti keele spontaanse kõne foneetiline korpus # # # ####################################################### This package was compiled Fri Oct 20 2023 This package corresponds to v.1.3 of the corpus The dataset includes personal data and therefore is not fully open. The data can be accessed after making a non-disclosure agreement with the University of Tartu Institute of Estonian and General Linguistics. The corpus is ment to be used for linguistic research and training of NLP models. Due to some files this pacage is restricted to academic use only (please contact if you need the corpus for non-academic use). Applicants must provide their research plan. For applying access to this corpus please contact partel.lippus@ut.ee. The repository contains following folders: EKSKFK_doc - metadata: speakers, recordings, labelling tiers SKK0_TG SKK1_TG SKK2_TG SKK3_TG SKK0_WAV (in the repository split into two packages due to size) SKK0_WAV_part_1-2 - wav files of dialogues SKK001 -- SKK036 SKK0_WAV_part_2-2 - wav files of dialogues SKK037 -- SKK072 SKK1_WAV SKK2_WAV SKK3_WAV SKK0_keypoints (in the repository split into two packages due to size) SKK0_keypoints_part_1-2 - dialogues SKK050 -- SKK062 SKK0_keypoints_part_2-2 - dialogues SKK063 -- SKK072 SKK3_keypoints SKK3_resp_TG SKK3_resp_WAV WAV - the sound recordings TG - TextGrid annotations (see metadata folder for tier info) keypoints - - OpenPose data (json by frame, see frame resolution in metadata) resp_WAV & resp_TG - respiratory data Video recordings in mp4 format are not included in the repository. If you need them please contact partel.lippus@ut.ee Also a plain text version of the word-level annotation is available in the repository: EKSKFK_v1-3_words-by-IPU.txt If you are using R, check out library(rPraat) or library(textgRid) for reading TextGrid files in R. Also library(phonTools) may come handy. See the recordings metadata file for more information about the tiers & whether they are created by a script or are hand labelled. In filenames: recordingID-speakerID_gender. For more info see the html documents in this repository or visit https://foneetikakorpus.ut.ee/ Please cite this work as Lippus, Pärtel, Kätlin Aare, Anton Malmi, Tuuli Tuisk & Pire Teras. 2023. Phonetic Corpus of Estonian Spontaneous Speech v1.3. Institute of Estonian and General Linguistics, University of Tartu. https://doi.org/10.23673/RE-438. Please use the corpus only for the research purposes that you stated in your application. Do not redistribute the files and keep your copies safe. Do not publish personal data. If you have questions: partel.lippus@ut.ee