Establishing a Role for Minority Source Language in Multilingual Facilitation

Authors

DOI:

https://doi.org/10.7557/12.6370

Keywords:

Erzya, Komi-Zyrian

Abstract

This document is dedicated to a young man, who, despite the number of times he has traveled around the Sun, is always open to new thoughts on ways to include languages, especially the smaller ones, and the people who speak them in far-reaching and sustainable open-source development. Since Trond Trosterud in Tromsø is attributed a terrific track record in transnational and circum-polar linguistics, we try to attract his attention further afield, to languages and phenomena he has only touched. The language phenomena addressed here come from Erzya and the Zyrian variety of Komi; Erzya has issues presented but not discussed in his dissertation, whereas Komi brings in issues of adnominal and predicate number marking in conjunction with case homonymy that have been resolved thanks to the flexibility of the infrastructure. These source languages, like others, have documented new dimensions and added shape to the evergrowing infrastructure.

References

Ahlqvist, August Englebreckt. 1859. Läran om verbet in mordvinskans mokscha-dialekt. Helsingfors. Frenckell & Son. Akademisk afhandling, som med den vidtberömda Historisk-filologiska Fackultetens vid Kejserliga Alexanders-Universitetet i Finland samtycke till offentlig granskning framställes af August Engelbreckt Ahlqvist, Hist-Fil. Magister.

Alnajja, Khalid, Mika Hämäläinen, Jack Rueter, and Niko Partanen. 2020. Ve’rdd. narrowing the gap between paper dictionaries, low-resource nlp and community involvement. In Conference: Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations, pp. 1–6. https://doi.org/10.18653/v1/2020.coling-demos.1.

Antonsen, L., S. Huhmarniemi, and T. Trosterud. 2009a. Interactive pedagogical programs based on constraint grammar. In Proceedings of the 17th Nordic Conference of Computational Linguistics. NEALT Proceedings Series 4. 2009, pp. 10–17.

Antonsen, Lena and Trond Trosterud. 2011. Next to nothing – a cheap South Saami disambiguator. In Proceedings of the NODALIDA 2011 workshop Constraint Grammar Applications May 11, 2011 Riga, Latvia, edited by Eckhard Bick, Kristin Hagen, Kaili Müürisep, and Trond Trosterud, vol. 14 of NEALT Proceedings Series.

Antonsen, Lene, Ciprian-Virgil Gerstenberger, Sjur Nørstebø Moshagen, and Trond Trosterud. 2009b. Ei intelligent elektronisk ordbok for samisk. In LexicoNordica 16, vol. 16, pp. 271–283.

Antonsen, Lene, Saara Huhmarniemi, and Trond Trosterud. 2009c. Constraint grammar in dialogue systems. In NEALT Proceedings Series 2009, vol. 8, pp. 13–21.

Antonsen, Lene, Trond Trosterud, and Linda Wiechetek. 2010. Reusing grammatical resources for new languages. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta.

Bartens, Raija. 1999. Mordvalaiskielten rakenne ja kehitys. Suomalais-Ugrilaisen Seuran Toimituksia 232, Helsinki.

Bartens, Raija 1999: Mordvalaiskielten rakenne ja kehitys. Suomalais-Ugrilaisen Seuran Toimituksia 232. Helsinki: Suomalais-Ugrilainen Seura.

Cygankin, D.V. 1968. Očerki Mordovskix Dialektov tom V, chap. Opyt klassifikacii èrzânskix govorov mordovskogo prisur’â. Mordovskoe knižnoe izdatel’stvo, Saransk. Naučno-issledovatel’skij institut âzyka, literatury, istorii i èkonomiki pri sovete ministrov mordovskoj ASSR.

Evsev’ev, M.E. 1928-29. Mordovskaâ grammatika. Central’noe izdatel’stvo narodov SSSR, Moskva. Èrzân’ grammatika.

Facundes, Sidney da S. 2000. The language of the Apurinã people of Brazil (Arawak). SUNY Buffalo. PhD Dissertation.

Gabelentz, Herr Conon von der. 1839. Versuch einer mordwinischen grammatik. In Zeitschrift für die Kunde des Morgenlandes, II. 2–3, pp. 235–284, 383–419. Druck und Verlag der Dieterlichschen Buchhandlung, Göttingen. Online: https://github.com/rueter/Erzya-grammar-Gabelentz-1838-39.

Gerstenberger, Ciprian, Niko Partanen, Michael Rießler, and Joshua Wilbur. 2016. Utilizing language technology in the documentation of endangered Uralic languages. Northern European Journal of Language Technology 4: 29–47. https://doi.org/10.3384/nejlt.2000-1533.1643.

Grebneva, A.M. 2000. Èrzân’ kel’, Morfemika, valon’ teevema dy morfologiâ, chap. Padežen’ luvos’. Respublikanskoj tipografiâs’ «Krasnyj Oktâbr’», Saransk. Vuzon’ erzân’ dy finnèn’-ugran’ kužotnesè student[t]nènen’ tonavtnemapel’.

Grünthal, Riho. 2008. Transitivity in Erzya Mordvin, pp. 235–246. Uralisztikai tanulmanyok. ELTE„ International.

Harrigan, Atticus G., Katherine Schmirler, Antti Arppe, Lene Antonsen, Trond Trosterud, and Arok Wolvengrey. 2017. Learning from the computational modelling of plains cree verbs. Morphology 27 4: 565–598. https://doi.org/10.1007/s11525-017-9315-x.

Hämäläinen, Mika. 2021. Endangered languages are not low-resourced! In Multilingual Facilitation. Rootroo Ltd. https://doi.org/10.20944/preprints202104.0113.v1.

Hämäläinen, Mika and Linda Wiechetek. 2020. Morphological Disambiguation of South Sámi with FSTs and Neural Networks. In Conference: Proceedings of the 1st Joint SLTU and CCURL Workshop, pp. 36–40. Keresztes, László. 1999. Development of Mordvin definite conjugation. Suomalais-Ugrilaisen Seuran toimituksia, 233. Suomalais-Ugrilainen Seura, Helsinki.

Khanna, Tanmai, Jonathan N. Washington, Francis M. Tyers, Sevilay Bayatlı, Daniel G. Swanson, Tommi A. Pirinen, Irene Tang, and Hèctor Alòs i Font. 2021. Recent advances in apertium, a free/open-source rule-based machine translation platform for low-resource languages. Machine Translation 35 4: 475–502. https://doi.org/10.1007/s10590-021-09260-6.

Nadʹkin, D. T. 1968. Očerki mordovskix dialektov, vol. Tom V, chap. Morfologiâ nižnepʹânskogo dialekta èrzâ-mordovskogo âzyka. Mordovskoe knižnoe izdatelʹstvo, Saransk.

Ornatov, Pavel. 1838. Mordovskaja grammatika / sostavlennaja na narechij mordvy mokshi Pavlom Ornatovym. V Sinodalnoj tip., Moskva.

Rueter, Jack. 2010. Adnominal Person in the Morphological System of Erzya. In Suomalais-ugrilaisen seuran toimituksia, 261. Suomalais-Ugrilainen Seura, Finland.

Rueter, Jack. 2013. Quantification in Erzya, pp. 99–122. LINCOM Studies in Language Typology. Lincom GmbH, Germany.

Rueter, Jack. 2020. Korpus nacional’nyx mordovskix âzykov: principy razrabotki i perspektivy funkcionirovaniâ/ dejstviâ. In Финно-угорские народы в контексте формирования общероссийской гражданской идентичности и меняющейся окружающей среды, pp. 118–127. Izdatel’skij centr Istoriko-sociologičeskogo instituta, Russia. Conference date: 08-10-2020 through 09-10-2020.

Rueter, Jack, Marília Fernanda Pereira de Freitas, Sidney Da Silva Facundes, Mika Hämäläinen, and Niko Partanen. 2021a. Apurinã Universal Dependencies treebank. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pp. 28–33. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.americasnlp-1.4.

Rueter, Jack and Mika Hämäläinen. 2020a. FST morphology for the endangered Skolt Sami language. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 250–257. European Language Resources association, Marseille, France. https://aclanthology.org/2020.sltu-1.35.

Rueter, Jack and Mika Hämäläinen. 2020b. Prerequisites For Shallow-Transfer Machine Translation Of Mordvin Languages: Language Documentation With A Purpose, pp. 18–29. Iževsk: Institut komp’iuternyx issledovanij, Russian Federation. https://doi.org/0.20944/preprints202104.0131.v1.

Rueter, Jack, Mika Hämäläinen, and Niko Partanen. 2020a. Open-source morphology for endangered mordvinic languages. In Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS), pp. 94–100. The Association for Computational Linguistics, United States. https://doi.org/10.18653/v1/2020.nlposs-1.13. Workshop for NLP Open Source Software, NLP-OSS ; Conference date: 19-11-2020 through 19-11-2020.

Rueter, Jack, Niko Partanen, Mika Hämäläinen, and Trond Trosterud. 2021b. Overview of open-source morphology development for the komi-zyrian language: Past and future. In Proceedings of The Seventh International Workshop on Computational Linguistics of Uralic Languages, pp. 62–72.

Rueter, Jack, Niko Partanen, and Larisa Ponomareva. 2020b. On the questions in developing computational infrastructure for Komi-Permyak. In Proceedings of the Sixth International Workshop on Computational Linguistics of Uralic Languages, pp. 15–25. https://doi.org/10.18653/v1/2020.iwclul-1.3.

Rueter, Jack Michael. 2016. Towards a systematic characterization of dialect variation in the Erzya-speaking world: Isoglosses and their reflexes attested in and around the Dubyonki Raion, pp. 109–148. No. 10 in Uralica Helsingiensia. University of Helsinki, Finland.

Rueter, Jack Michael and Francis M. Tyers. 2018. Towards an open-source universal-dependency treebank for Erzya. In International Workshop for Computational Linguistics of Uralic Languages, IWCLUL. https://doi.org/10.18653/v1/W18-0210.

Serebrenikov, B. A., R.N. Buzakova, and M.V. Mosin. 1993. Èrzânsko-russkij slovar’. Russkij Âzyk, Digora, Moskva.

Sheyanova, Mariya and Francis M. Tyers. 2017. Annotation schemes in North Sámi dependency parsing. In Proceedings of the 3rd International Workshop for Computational Linguistics of Uralic Languages, pp. 66–75.

Simonenko, Alexandra. 2020. Existential possession in Meadow Mari. In Approaches to predicative possession: the view from Slavic and Finno-Ugric, edited by Dalmi, Gréte and Witkos, Jacek and Ceglowski, Piotr, pp. 162–181. Bloomsbury Academic. https://doi.org/10.5040/9781350062498.ch-008.

Snoek, Conor, Dorothy Thunder, Kaidi Lõo, Antti Arppe, Jordan Lachler, Sjur Moshagen, and Trond Trosterud. 2014. Modeling the noun morphology of plains cree. In Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages, pp. 34–42. Association for Computational Linguistics. https://doi.org/10.3115/v1/W14-2205.

Trosterud, Reino Sindre, Trond Trosterud, Anna-Kaisa Räisänen, Leena Niiranen, Mervi Haavisto, and Kaisa Maliniemi. 2017. A morphological analyser for kven. In Proceedings of the Third Workshop on Computational Linguistics for Uralic Languages, pp. 76–88. Association for Computational Linguistics, St. Petersburg, Russia. https://doi.org/10.18653/v1/W17-0608. Online: https://aclanthology.org/W17-0608.

Trosterud, Trond. 1994. Auxiliaries, negative verbs and word order in the sami andfinnic languages. In Minor Uralic Languages: Structure and Development, edited by Ago Künnap, pp. 173–181. Tartu.

Trosterud, Trond. 2006. Homonymy in the Uralic Two-Argument Agreement Paradigms. Suomalais-Ugrilaisen Seuran Toimituksia 251. Suomalais-Ugrilainen Seura, Helsinki.

Trosterud, Trond. 2009. A constraint grammar for Faroese. In Proceedings of the NODALIDA 2009 workshop Constraint Grammar and robust parsing, edited by Eckhard Bick, Kristin Hagen, Kaili Müürisep, and Trond Trosterud, vol. 8 of NEALT Proceedings Series, pp. 1–7. Northern European Association for Language Technology (NEALT, http://omilia.uio.no/nealt). Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/14180.

Trosterud, Trond and Lene Antonsen. 2020. Hva er viktig for forståelse? Om maskinoversetting fra nordsamisk. In Bauta: Janne Bondi Johannessen in memoriam, edited by K. Hagen, A. Hjelde, K. Stjernholm, and Ø. A. Vangsnes, vol. 11(2) of Oslo Studies in Language, pp. 489–502. Oslo: University of Oslo. https://doi.org/10.5617/osla.8514.

Trosterud, Trond and Sjur Moshagen. 2021. Soft on errors? The correcting mechanism of a Skolt Sami speller. In Multilingual Facilitation, pp. 197–207. https://doi.org/10.31885/9789515150257.19.

Turunen, Rigina. 2010. Nonverbal predication in Erzya: Studies on morphosyntactic variation and part of speech distinctions. Ph.D. thesis, University of Helsinki.

Wiedemann, F.J. 1865. Grammatik der Ersa-Mordwinischer Sprache, vol. IX No5 of VII. Mémoires de l’Académie Impérial des Sciences de St.-Pétersbourg. Online: https://github.com/rueter/Erzya-grammar-Wiedemann-1865.

Анохин, Павел. 2021. Сьӧд вӧр шӧрӧд да нюрӧд. Коми му 2021-08-26.

Куратова, Н. Н. 2020. Марьюшка. Войвыв кодзув 1 No2.

Пунегова, Надежда. 2021. Илона Артеева: Уджыд мед вӧлі сьӧлӧм сертиыд. Коми му 2021-09-09. Ракин, А. 2011. Герчкан. Би кинь 1 3.

Downloads

Published

2022-08-30