Pretrained word and multi-sense embeddings for Estonian
Aedmaa, Eleri
Loading
Nimi | Suurus | Kirjeldus |
---|---|---|
README.pdf | 46.31Kb | README |
Kokkuvõte
Word and multi-sense embedding for Estonian trained on lemmatized etTenTen: Corpus of the Estonian Web. Word embeddings are trained with word2vec. Sense embeddings are trained with SenseGram. Sense inventory is induced from word embeddings. Models were trained using various parameter settings. The values of architecture, number of dimensions, window size, minimum frequency threshold and number of iterations vary.