Meadow Mari Prosody data
(2005)This dataset contains the segmental durations, F0 measurements and formant values F1-F3 from the vowels in 1-4 syllable words in Meadow Mari, a Finno-Ugric language. 8 native speakers read a list of 100 sentences, each ... -
Quantity-related variation of duration, pitch and vowel quality in spontaneous Estonian (data)
(2013)This dataset is collected from the University of Tartu Phonetic Corpus of Estonian Spontaneous Speech. The dataset consists of words with CVCV (consonant-vowel-consonant-vowel) and CVCCV structure and it has been collected ... -
Context-dependent articulation of consonant gemination in Estonian (data)
(2017)This dataset is collected from 4 native Estonian speakers with Carstens AG-500 electromagnetic articulograph articluating the 27 combinations of disyllabic words for the purpose of studying gemination in the Estonian ... -
(Non-)Literalness ratings for Estonian particle verbs
(2018-06)(Non-)literalness dataset of 1481 sentences formed with 184 Estonian particle verbs. Sentences are evaluated by 3 native speakers of Estonian on a 6-point scale [0,5] indicating the degree of compositionality of a particle ... -
Inari Saami geminates
(2018-11-08)Data extracted from the Inari Saami prosody corpus (, used in Türk et al (2018). The Acoustic Correlates of Quantity in Inari Saami. Journal of Phonetics. Target words ... -
Pretrained word and multi-sense embeddings for Estonian
(2019)Word and multi-sense embedding for Estonian trained on lemmatized etTenTen: Corpus of the Estonian Web. Word embeddings are trained with word2vec. Sense embeddings are trained with SenseGram. Sense inventory is induced ... -
Kodavere kihelkonnas 19. sajandil sündinud lapsed
(2019)Anna Edela bakalaureusetöös kasutatud andmed, mis pärinevad 19. sajandi EELK Kodavere koguduse sünnimeetrikatest, mis on üleval Eesti ajalooarhiivi Saaga andmebaasis. Need sisaldavad Kodavere kihelkonnas 1835., 1840., ... -
Foneetikakorpuse sagedussõnastik
(2019-06-20)Eesti keele spontaanse kõne foneetilise korpuse sagedussõnastik on koostatud korpuse v.1.0.5 (20.06.2019, doi:10.15155/1-00-0000-0000-0000-001A3L) versiooni põhjal, kui korpuses oli märgendatud 685 750 sõna (89 tundi ja ... -
Distribution of categorised feedback comments (e.g. class, sub-class, and features) by feedback exchange group and by group member
(2020)This entry contains data on the categorisation and classification of asynchronous written peer feedback comments within one doctorate writing group over a three-month period. The research data should be used in tandem with ... -
Data and R code for "Verbs of horizontal and vertical motion: a corpus study in Estonian"
(University of Tartu, 2021)Data and statistical code used in the paper "Verbs of horizontal and vertical motion: a corpus study in Estonian" (accepted by the Finnish Journal of Linguistics 2021) -
Data and R code for "Manner of motion in Estonian: A descriptive account of speed"
(University of Tartu, 2021)Data and statistical code used in the paper "Manner of motion in Estonian: A descriptive account of speed" (accepted by the Studies in Language in 2021). Authors of the paper: Piia Taremaa and Anetta Kopecka. -
Data and R code for "Constructional variation in Estonian: demonstrative pronouns and adverbs as determiners in noun phrases"
(2021)Data and R code used in the paper "Constructional variation in Estonian: demonstrative pronouns and adverbs as determiners in noun phrases" (accepted by Lingua 2021) -
Andmekogum ja lisamaterjalid artiklile „Liikumisverbid horisontaalsel ja vertikaalsel teljel. Ühe sorteerimiskatse tulemused“ (Keel ja Kirjandus 3/2021; Piia Taremaa)
(2021)Admekogum artiklile „Liikumisverbid horisontaalsel ja vertikaalsel teljel. Ühe sorteerimiskatse tulemused“ (Keel ja Kirjandus 3/2021; Piia Taremaa). Andmekogumisse kuuluvad: 1) statistiline kood; 2) statistilise analüüsi ... -
Phonetic Corpus of Estonian Spontaneous Speech v1.2
(Institute of Estonian and General Linguistics, University of Tartu, 2021-09-08)The Phonetic Corpus of Estonian Spontaneous Speech consists of recordings that have been annotated on different linguistic tiers including words and segments and their boundaries in the speech signal. The corpus mainly ... -
Annex 1 to the article "The role of language exposure in mediated receptive multilingualism"
(Lähivõrdlusi. Lähivertailuja, 2021-10)The annex 1 to the article "The role of language exposure in mediated receptive multilingualism" presents the socio-linguistic questionnaire that was used in the current study. -
Data and R code for 'Speed as a dimension of manner in Estonian frog stories' (Taremaa et al.)
(University of Tartu, 2022)Data and statistical code used in the paper "Speed as a dimension of manner in Estonian frog stories" (accepted by the Journal of Nordic Linguistics in 2022) -
Data and R code for 'Speed and space' (Taremaa & Kopecka)
(University of Tartu, 2022)Data and statistical code used in the paper "Speed and space: semantic asymmetries in motion descriptions in Estonian" (published in Cognitive Linguistics; Ahead of Print, published online 8 December 2022) -
Eesti murrete korpus
(Tartu Ülikool, eesti ja üldkeeleteaduse instituut, 2022-11-23)Eesti murrete korpus on kõiki eesti murdeid hõlmav elektrooniline andmekogu. Korpus koosneb helisalvestistest, foneetilises transkriptsioonis murdetekstidest, lihtsustatud transkriptsioonis murdetekstidest, morfoloogiliselt ... -
Data for "A corpus study of grammatical case forms in written and spoken Estonian: Frequency, distribution and grammatical role"
(University of Tartu, Institute of Estonian and General Linguistics, 2023)This dataset makes available the sample of clauses used in the study "A corpus study of grammatical case forms in written and spoken Estonian: Frequency, distribution and grammatical role". It includes 751 clauses from the ...