Why the computer should know its Sami grammar
DOI:
https://doi.org/10.7557/sda.8605Abstract
Language technology constitutes the foundation for the necessary infrastructure needed for any language to function in a modern literary society. The Sami languages differ from the languages for which most such technology is developed in two important ways: The body of text available (either Sami or bilingual Sami – majority language) constitutes but a fraction of what is available for Western European state languages, and the Sami languages have morphological structures far more complex than the ones for most of the Western European state languages.
The article argues that the answer to this challenge is to build a grammar-based language technology for the Sami languages, and presents ongoing work fulfilling this goal. It is shown how morphophonological processes and inflectional and derivational morphology may be modelled as finite-state transducers, and combined with a syntactic component consisting of context-sensitive constraint grammar rules, to constitute a robust grammatical analyser capable of both analysing running text, and generating any word form. The speech communities of the Sami languages are not large enough to uphold a language technology industry, but the grammar-based language model is interesting for theoretical linguists as well.
Practical applications derived from the basic grammatical analysers include spell-checkers, interactive computer-assisted language learning programs, and machine translation.
References
Materiála:
DNT = (2005) Det nye testamentet. - http://www.bibelen.no.
NT = (1611) New Testament, King James Bible. - http://www.kingjamesbibleonline.org.
NAČ = (1994) Finnmárkku eatnamiid ja čázádagaid geavaheapmi historjjálaš geahččamiin. Norgga almmolaš čielggadeamit 1994:21 S. - http://www.regjeringen.no/se/dep/krd/Dokumeanttat/NA-at/1993/nac_199421.html?id=139744.
OT = (1998) Ođđa testameanta. - http://www.bibelen.no.
Girjjálašvuohta:
Antonsen, Lene - Baal, Berit Ánne Bals - Huhmarniemi, Saara - Trosterud, Trond 2009: Dihtor ja giela válljenvejolašvuođat - gielalaš ja pedagogalaš čuolmmat. - Johanna Ijäs - Nils Øivind Helander (doaim.), Sáhkavuoruin sáhkan. Sámegiela ja sámi girjjálašvuođa muhtin áigeguovdilis dutkanfáttát s. 87-102. Dieđut 1/2009. Guovdageaidnu: Sámi allaskuvla.
Antonsen, Lene - Gerstenberger, Ciprian-Virgil - Moshagen, Sjur Nørstebø - Trosterud, Trond 2009: Ei intelligent elektronisk ordbok for samisk. - LexicoNordica Volum 16 s. 271-283. Oslo: Nordisk forening for leksikografi. https://doi.org/10.7146/ln.v0i16.18479
Antonsen, Lene - Huhmarniemi, Saara - Trosterud, Trond 2009: Interactive pedagogical programs based on constraint grammar. Proceedings of the 17th Nordic Conference of Computational Linguistics. Nealt Proceedings Series 4. Tartu: Tartu University Library. - http://hdl.handle.net/10062/9546 (16.04.10).
Antonsen, Lene - Trosterud, Trond - Wiechetek, Linda (2010): Reusing Grammatical Resources for New Languages. Proceedings of the International conference on Language Resources and Evaluation LREC 2010. Stroudsburg: The Association for Computational Linguistics. - http://www.lrec-conf.org/proceedings/lrec2010/pdf/254_Paper.pdf (10.06.10).
Beesley, Kenneth R. - Karttunen, Lauri 2003: Finite State Morphology. Stanford, California: CSLI publications in Computational Linguistics.
Gaup, Børre - Moshagen, Sjur N. - Omma, Thomas - Palismaa, Maren - Pieski, Tomi - Trosterud, Trond 2006: From Xerox to Aspell: A First Prototype of a North Sámi Speller Based on TWOL Technology. - Anssi Yli-Jyrä - Lauri Karttunen - J. Karhumäki (doaim.), Finite-State Methods and Natural Language Processing. Lecture Notes in Computer Science 4002, s. 306-307. Berlin - Heidelberg: Springer-Verlag. -http://www.springerlink.com/content/an651qt0g45k55u1/ (16.04.10). https://doi.org/10.1007/11780885_37
Karlsson, Fred - Voutilainen, Arto - Heikkilä, Juha - Anttila, Arto 1995: Constraint grammar. A language-independent system for parsing unrestricted text. Berlin - New York: Mouton de Gruyter. https://doi.org/10.1515/9783110882629
Mii dárbbašit «čuorbbi» [váldočála]. 2008. - Ávvir, 15.11.2008, Nr 144, s. 2. https://doi.org/10.61387/S.2008.34.18
Moshagen, Sjur - Omma, Thomas - Pieski, Tomi 2008: Goallosteapmi Divvun-reaidduin. Tromsø: Universitetet i Tromsø. - http://giellatekno.uit.no/background/Goallosteapmi_Divvun.pdf (30.11.2009).
Moshagen, Sjur - Sammallahti, Pekka - Trosterud, Trond 2004: Twol at work. - Antti Arppe - Lauri Carlson - Krister Lindén - Jussi Piitulainen - Mickael Suominen - Martti Vainio - Hanna Westerlund - Anssi Yli-Jyrä (doaim.), Inquiries into Words, Constraints and Contexts s. 94-105. Stanford, California: CSLI.
Tapanainen, Pasi 1996: The Constraint Grammar Parser CG-2. Publications of the Department of General Linguistics, 27. Helsinki: University of Helsinki.
Trosterud, Trond - Wiechetek, Linda 2007: Disambiguering av homonymi i nord- og lulesamisk. - Jussi Ylikoski - Ante Aikio (doaim.), Sámit, sánit, sátnehámit. Riepmočála Pekka Sammallahtii miessemánu 21. beaivve 2007 s. 401-421. Suomalais-Ugrilaisen Seuran Toimituksia 253. Helsinki: Suomalais-Ugrilainen Seura. - http://www.sgr.fi/sust/sust253/sust253_trosterudjawiechetek.pdf (16.04.10).
Tyers, Francis M. - Wiechetek, Linda - Trosterud, Trond 2009: Developing Prototypes for Machine Translation between Two Sámi Languages. Proceedings of the 13th Annual Conference of the European Association of Machine Translation, EAMT09. Allschwil: European Association for Machine Translation. - http://www.mt archive.info/EAMT-2009-Tyers-1.pdf (16.04.10).
Visl-Group 2008: Constraint Grammar. Odense: University of Southern Denmark. - http://beta.visl.sdu.dk/constraint grammar.html (30.11.09).