Phonetic Corpus of Estonian Spontaneous Speech v1.2

Lippus, Pärtel; Aare, Kätlin; Malmi, Anton; Tuisk, Tuuli; Teras, Pire

dc.contributor.author	Lippus, Pärtel
dc.contributor.author	Aare, Kätlin
dc.contributor.author	Malmi, Anton
dc.contributor.author	Tuisk, Tuuli
dc.contributor.author	Teras, Pire
dc.date.accessioned	2021-09-13T11:09:35Z
dc.date.available	2021-09-13T11:09:35Z
dc.date.issued	2021-09-08
dc.identifier.uri	https://datadoi.ee/handle/33/351
dc.identifier.uri	https://doi.org/10.23673/re-293
dc.description.abstract	The Phonetic Corpus of Estonian Spontaneous Speech consists of recordings that have been annotated on different linguistic tiers including words and segments and their boundaries in the speech signal. The corpus mainly contains dialogues. The corpus can be used for studying different phonetic and linguistic research questions and for training various language technological applications (e.g. speech recognition, dialogue systems). In addition to the detailed phonetic segmentation the corpus has word-level annotation uses standard orthography so the corpus can be used with most NLP tools built for written language. The corpus includes: - Studio quality sound recordings, separate channels for each speaker Spontaneous conversation between 2-3 speakers, approximately 30 minutes for each recording - Manual transcription of words and phonemes - 205 individual speakers in the age range of 20–85 years - A total of 134 hours of speech recordings - Word & phoneme level annotation of 106 hours / 914 thousand word level intervals	en
dc.format	WAV	en
dc.format	TextGrid	en
dc.format	TXT	en
dc.format	JSON	en
dc.language.iso	et	en
dc.publisher	Institute of Estonian and General Linguistics, University of Tartu	en
dc.relation	EKTB3	en
dc.rights	info:eu-repo/semantics/restrictedAccess	en
dc.subject	speech corpus	en
dc.subject	phonetic annotation	en
dc.subject	phoneme segments	en
dc.subject	multimodal speech	en
dc.subject	dialogues	en
dc.subject	voice quality	en
dc.subject	morphological analysis	en
dc.title	Phonetic Corpus of Estonian Spontaneous Speech v1.2	en
dc.type	info:eu-repo/semantics/dataset	en
dc.type	Data Paper	en
dc.type	Audiovisual	en
dc.type	Sound	en