#######################################################
#   _____    _  __    ___    _  __    _____    _  __  #
#  |  ___|  | |/ /   / __|  | |/ /   |  ___|  | |/ /  #
#  | |___   |   /   | /_    |   /    | |___   |   /   #
#  |  ___|  |  <     \_ \   |  <     |  ___|  |  <    #
#  | |___   |   \    __| |  |   \    | |      |   \   #
#  |_____|  |_|\_\  |___/   |_|\_\   |_|      |_|\_\  #
#  TÜ eesti keele spontaanse kõne foneetiline korpus  #
#                                                     #
#######################################################

This package was compiled Fri Oct 20 2023

This package corresponds to v.1.3 of the corpus

The dataset includes personal data and therefore is not fully open. The data can be accessed after making a non-disclosure agreement with the University of Tartu Institute of Estonian and General Linguistics. The corpus is ment to be used for linguistic research and training of NLP models. Due to some files this pacage is restricted to academic use only (please contact if you need the corpus for non-academic use). Applicants must provide their research plan. For applying access to this corpus please contact partel.lippus@ut.ee.

The repository contains following folders:
EKSKFK_doc - metadata: speakers, recordings, labelling tiers
SKK0_TG
SKK1_TG
SKK2_TG
SKK3_TG
SKK0_WAV (in the repository split into two packages due to size)
  SKK0_WAV_part_1-2 - wav files of dialogues SKK001 -- SKK036
  SKK0_WAV_part_2-2 - wav files of dialogues SKK037 -- SKK072
SKK1_WAV
SKK2_WAV
SKK3_WAV
SKK0_keypoints (in the repository split into two packages due to size)
  SKK0_keypoints_part_1-2 - dialogues SKK050 -- SKK062
  SKK0_keypoints_part_2-2 - dialogues SKK063 -- SKK072
SKK3_keypoints
SKK3_resp_TG
SKK3_resp_WAV

WAV - the sound recordings
TG - TextGrid annotations (see metadata folder for tier info)
keypoints -  - OpenPose data (json by frame, see frame resolution in metadata)
resp_WAV & resp_TG - respiratory data

Video recordings in mp4 format are not included in the repository. If you need them please contact partel.lippus@ut.ee

Also a plain text version of the word-level annotation is available in the repository: EKSKFK_v1-3_words-by-IPU.txt

If you are using R, check out library(rPraat) or library(textgRid) for reading TextGrid files in R. Also library(phonTools) may come handy.

See the recordings metadata file for more information about the tiers & whether they are created by a script or are hand labelled. In filenames: recordingID-speakerID_gender.

For more info see the html documents in this repository or visit https://foneetikakorpus.ut.ee/

Please cite this work as Lippus, Pärtel, Kätlin Aare, Anton Malmi, Tuuli Tuisk & Pire Teras. 2023. Phonetic Corpus of Estonian Spontaneous Speech v1.3. Institute of Estonian and General Linguistics, University of Tartu. https://doi.org/10.23673/RE-438.

Please use the corpus only for the research purposes that you stated in your application. Do not redistribute the files and keep your copies safe. Do not publish personal data.

If you have questions: partel.lippus@ut.ee