*** Database of sentences with deeply embedded clauses *** Authors: Petar Kehayov, Triin Todesk Estonian Research Council grant STP2 “Exploring Deep Clausal Embeddings in Finno-Ugric”, Institute of Estonian and General Linguistics, University of Tartu Corresponding author: Petar Kehayov Contact Information: petar.kehayov@ut.ee Professor of Finnic linguistics University of Tartu Jakobi 2, 51005 Tartu Estonia ***General Introduction*** This dataset contains data from Estonian, Komi, and Moksha Mordvin enabling research into the properties of complex sentences with deeply embedded clauses (DECs, i.e. clauses occurring at an embedding depth of two or more). The data set was produced as part of Petar Kehayov’s returning researcher’s grant “Exploring Deep Clausal Embeddings in Finno-Ugric” (2022–2024). It is being made public to be used by prospective research on grammatical and semantic properties of sentences with several consecutively embedded clauses. ***Description of the data in this dataset*** The data set includes seven files. Four of them (Estonian_fiction, Estonian_journals, Komi_journals_fiction, and Moksha_journals_fiction) are Excel tables, containing sentences with DECs, each sentence on a separate row and coded for the structural variables introduced in the file Variables_Estonian_Moksha_morph_tags. The abbreviations used in the tables stand for variable values explicated in the files Variables_Estonian_Moksha_morph_tags and Komi_morph_tags. The file Sorting_eligible_sentences_with_DECs describes the restrictive criteria applied when sorting eligible sentences with DECs from the queries in Estonian, Komi, and Moksha Mordvin corpora. ***Methodological information*** The data was retrieved from the following electronic corpora: Estonian National Corpus, version 2019 (accessible in SketchEngine: https://www.sketchengine.eu/); Komi-Zyrian Corpus (https://komi-zyrian.web-corpora.net); Korp (Giellatekno corpus), Komi-Zyrian texts (https://gtweb.uit.no/u_korp); Corpus of Contemporary Literary Moksha (Moksha.web-corpora.net). Corpus hits were manually sorted according to the criteria for a sentence with DECs (see the file Sorting_eligible_sentences_with_DECs).