Database of sentences with deeply embedded clauses
Kehayov, Petar; Todesk, Triin
Loading
Name | Size | Description |
---|---|---|
README.txt | 2.175Kb | Short summary |
Estonian_fiction.xlsx | 40.90Kb | Samples from Estonian fiction |
Estonian_journals.xlsx | 39.05Kb | Samples from Estonian media |
Komi_journals_fiction.xlsx | 76.77Kb | Samples from Komi media and fiction |
Moksha_journals_fiction.xlsx | 49.81Kb | Samples from Moksha media and fiction |
Variables_Estonian_Moksha_morph_tags.docx | 26.47Kb | Morph tags and abbreviations in Estonian and Moksha data |
Komi_morph_tags.docx | 16.06Kb | Morph tags and abbreviations in Komi data |
Sorting_eligible_sentences_with_DECs.docx | 20.82Kb | Criteria applied for data selection |
Abstract
The database was compiled for the Estonian Research Council project STP2 “Exploring Deep Clausal Embeddings in Finno-Ugric.” It contains samples of complex sentences with deeply embedded clauses (DECs) from Estonian, Moksha Mordvin, and Komi Zyryan literary languages (fiction and journalese). DECs are clauses embedded within clauses that are themselves embedded. The samples are organized in Excel files, with each row containing a complex sentence in which each DEC and its superordinate clause arre annotated for seven variables, described in Variables.docx.... Show more Show less
Andmebaas koostati Eesti Teadusagentuuri projekti STP2 „Sügavale uputatud kõrvallaused soomeugri keeltes“ jaoks. See sisaldab sügavale uputatud kõrvallausetega liitlausete valimeid eesti, mokša ja sürjakomi kirjakeeltest (proosa ja ajakirjandus). Sügavale uputatud lause on kõrvallause, mille pealause on ise kõrvallause. Valimid on Exceli failide kujul, kus igal real on liitlause, milles iga sügavale uputatud lause ja selle pealause on märgendatud seitsme muutuja järgi, mis on kirjeldatud failiss Variables.docx.... Show more Show less