DataDOI
    • English
    • Eesti
  • English 
    • English
    • Eesti
  • Login
View Item 
  •   DataDOI
  • UT Humaniora
  • Eesti ja üldkeeleteaduse instituut
  • Eesti ja üldkeeleteaduse andmed
  • View Item
  •   DataDOI
  • UT Humaniora
  • Eesti ja üldkeeleteaduse instituut
  • Eesti ja üldkeeleteaduse andmed
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Phonetic Corpus of Estonian Spontaneous Speech v1.3

Lippus, Pärtel; Aare, Kätlin; Malmi, Anton; Tuisk, Tuuli; Teras, Pire
  • BibTex
  • EndNote (RIS)
Loading
NameSizeDescription
README.txt3.027KbShort summary
ekskfk_info_eng.html4.858Mbpaper describing background, materials and methods
ekskfk_info.html4.856MbKorpuse tutvustus eesti keeles
ekskfk_margendus.html1.169MbAnnotation principles (In Estonian)
SKK0_TG.zip94.71MbTextGrid files
SKK1_TG.zip21.41MbTextGrid files
SKK2_TG.zip19.95MbTextGrid files
SKK3_TG.zip17.12MbTextGrid files
SKK0_WAV_part_1-2.zip7.310Gbstudio dialogue wav files
SKK0_WAV_part_2-2.zip6.460Gbstudio dialogue wav files
SKK1_WAV.zip2.757Gbmonologue wav files
SKK2_WAV.zip3.222Gbfieldwork dialogue wav files
SKK3_WAV.zip3.286Gbtrialogue wav files
SKK0_keypoints_part_1-2.zip5.780GbOpenPose json files
SKK0_keypoints_part_2-2.zip7.745GbOpenPose json files
SKK3_keypoints.zip1.841GbOpenPose json files
SKK3_resp_TG.zip1.404Mbrespiratory data TextGrid files
SKK3_resp_WAV.zip735.5Mbrespiratory data wav files
EKSKFK_v1-3_words-by-IPU.txt12.48Mbtext version of the corpus
EKSKFK_doc.zip22.21Kbmetadata
Thumbnail
Date
2023-10-20
URI
https://datadoi.ee/handle/33/577
http://dx.doi.org/10.23673/re-438
Metadata
Show full item record
Abstract
The Phonetic Corpus of Estonian Spontaneous Speech consists of recordings that have been annotated on different linguistic tiers including words and segments and their boundaries in the speech signal. The corpus mainly contains dialogues. The corpus can be used for studying different phonetic and linguistic research questions and for training various language technological applications (e.g. speech recognition, dialogue systems). In addition to the detailed phonetic segmentation the corpus has wword-level annotation uses standard orthography so the corpus can be used with most NLP tools built for written language. The corpus includes: - Studio quality sound recordings, separate channels for each speaker; Spontaneous conversation between 2-3 speakers, approximately 30 minutes for each recording; - Manual transcription of words and phonemes; - 207 individual speakers in the age range of 20–85 years; - A total of 135 hours of speech recordings; - Word & phoneme level annotation of 106 hours / one milion word level intervals....  Show more  Show less
Keyword
speech corpus; phonetic annotation; time aligned annotation; multimodal speech; dialogues; voice quality; morphological analysis; Estonian language
Item type
info:eu-repo/semantics/dataset; Data Paper; Audiovisual; Sound
Collections
  • Eesti ja üldkeeleteaduse andmed

University of Tartu Library
Open Science
Contact Us
DSpace software
Mirage 2 Theme
 

 

Browse

Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister

Statistics

View Usage Statistics

University of Tartu Library
Open Science
Contact Us
DSpace software
Mirage 2 Theme