Low hanging fruit and the Boasian trilogy in digital lexicography of morphologically rich languages
Lessons from a survey of Indigenous language resources in Canada
DOI:
https://doi.org/10.7557/12.6441Keywords:
Indigenous Languages, electronic dictionaries, lexicography, morphology, finite-state modeling, Plains Cree, CanadaAbstract
Online lexicographical resources for the morphologically rich Indigenous languages in Canada use a wide range of strategies for conveying their language’s morphological system, i.e. how words are inflected and derived, which this paper illustrates in a survey of seventeen bilingual online resources. The strategies these resources employ boil down to two basic approaches to the underlying structure of the resource: 1) a lexical database, or 2) a computational model. Most resources we surveyed are constructed around lexical databases. These assume the word(form) as the basic unit, an assumption that makes it difficult to incorporate the language’s sub-word, morphological structure in full detail. However, one resource uses a computational morphological model to bring the language’s morphology into the core of the lexicon – this proved to be a “low-hanging fruit” in the application of language technology that had been accomplished within a reasonable time-frame, as has been advocated by Trond Trosterud. We discuss the value created and questions raised by this approach and argue that it successfully overcomes the traditional Boasian three-way partition of dictionary, grammar, and text, creating integrated language resources that meet the modern needs of low-resource endangered languages and their communities.
References
Arppe, Antti, Jordan Lachler, Trond Trosterud, Lene Antonsen, and Sjur N. Moshagen. 2016. Basic Language Resource Kits for Endangered Languages: A Case Study of Plains Cree. Proceedings of CCURL 2016 – Collaboration and Computing for Under-Resourced Languages. 1–8.
Arppe, Antti, Mari Voipio, and Malene Würtz. 2000. Creating Inflecting Electronic Dictionaries. Edited by Carl-Erik Lindberg and Steffen Nordahl Lund. Proceedings of the 17th Scandinavian Conference of Linguistics, Nyborg. 20–22.
Arppe, Antti, Christopher Cox, Mans Hulden, Jordan Lachler, Sjur N. Moshagen, Miikka Silfverberg and Trond Trosterud. 2017a. Computational Modeling of Verbs in Dene Languages: The Case of Tsuut’ina. In Alessandro Jaker (ed.), Working Papers in Athabaskan (Dene) Languages, 51-–69. Alaska Native Language Center Working Papers 13. Alaska Native Language Center, Fairbanks.
Arppe, Antti, Marie-Odile Junker, and Delasie Torkornoo. 2017b. Converting a comprehensive lexical database into a computational model: The case of East Cree verb inflection. Proceedings of the 2nd Workshop on Computational Methods for Endangered Languages (ComputEL-2). 43–47. https://doi.org/10.18653/v1/W17-0108.
Beesley, Kenneth R., and Lauri Karttunen. 2003. Finite state morphology. Studies in Computational Linguistics. Center for the Study of Language and Information, Stanford, California.
Boas, Franz. 1917. Introductory. International Journal of American Linguistics 1(1): 1–8. https://doi.org/10.1086/463708.
Bontogon, Megan, Antti Arppe, Lene Antonsen, Dorothy Thunder, and Jordan Lachler. 2018. Intelligent Computer Assisted Language Learning (ICALL) for nêhiyawêwin: An In-Depth User-Experience Evaluation. Canadian Modern Language Review 74: 337–362. https://doi.org/10.3138/cmlr.4054.
Bowern, Claire. 2015. Linguistic fieldwork: A practical guide. Springer. https://doi.org/10.1057/9781137340801.
Bowers, Dustin, Antti Arppe, Jordan Lachler, Sjur N. Moshagen, and Trond Trosterud. 2017. A Morphological Parser for Odawa. Proceedings of the 2nd Workshop on Computational Methods for Endangered Languages (ComputEL-2). 1–9. https://doi.org/10.18653/v1/W17-0101.
Bybee, Joan. 1985. Morphology: A study of the relation between meaning and form. (Vol. 9, Typological studies in language). John Benjamins, Amsterdam. https://doi.org/10.1075/tsl.9.
Granger, Sylvaine. 2012. Introduction: Electronic lexicography – from challenge to opportunity. In Electronic Lexicography, edited by Sylvaine Granger and Magali Paquot, 1–12. Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780199654864.003.0001.
Harrigan, Atticus G., Katherine Schmirler, Antti Arppe, Lene Antonsen, Trond Trosterud, and Arok Wolvengrey. 2017. Learning from the computational modelling of Plains Cree verbs. Morphology 27: 565–598. https://doi.org/10.1007/s11525-017-9315-x.
Johnson, Ryan, Lene Antonsen, and Trond Trosterud. 2013. Using Finite State Transducers for Making Efficient Reading Comprehension Dictionaries. Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013. 59–71.
Kazeminejad, Ghazaleh, Andrew Cowell, and Mans Hulden. 2017. Creating lexical resources for polysynthetic languages – the case of Arapaho. Proceedings of the 2nd Workshop on Computational Methods for Endangered Languages (ComputEL-2). 10–18. https://doi.org/10.18653/v1/W17-0102.
Lachler, Jordan, and Elizabeth Pankratz. 2017. Moving toward value-added digital repatriation in lexicography for Indigenous languages in Canada. Edited by Nicholas Ostler, Vera Ferreira and Chris Moseley. Proceedings of the 21st FEL Conference: Communities in Control: Learning tools and strategies for multilingual endangered language communities. 107–114.
Lachler, Jordan, Lene Antonsen, Trond Trosterud, Sjur N. Moshagen, and Antti Arppe. 2018. Modeling Northern Haida Morphology. Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), Miyazaki, Japan, May 7–12, 2018, 2326–2330.
Lew, Robert. 2012. How Can We Make Electronic Dictionaries More Effective? In Electronic Lexicography, edited by Sylvaine Granger, and Magali Paquot, 343–361. Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780199654864.003.0016.
Moeller, Sarah, Ghazaleh Kazeminejad, Andrew Cowell, and Mans Hulden. 2018. A neural morphological analyzer for Arapaho verbs learned from a finite state transducer. Proceedings of Workshop on Polysynthetic Languages. Association for Computational Linguistics, 12–20. https://aclanthology.org/W18-4802.
Prinsloo, D. J. 2012. Electronic lexicography for lesser-resourced languages: The South African context. In Electronic Lexicography, edited by Sylvaine Granger, and Magali Paquot, 119–143. Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780199654864.003.0007.
Rice, Karen. 2011. Documentary linguistics and community relations. Language Documentation & Conservation 5: 187–207.
Statistics Canada. 2011. Aboriginal languages in Canada. 2011 Census of Population, Catalogue no. 98-314-X2011003. https://www12.statcan.gc.ca/census-recensement/2011/as-sa/98-314-x/98-314-x2011003_3-eng.pdf (accessed April 19, 2019).
Snoek, Conor, Dorothy Thunder, Kaidi Lõo, Antti Arppe, Jordan Lachler, Sjur Moshagen, and Trond Trosterud. 2014. Modeling the Noun Morphology of Plains Cree. Proceedings of ComputEL: Workshop on the use of computational methods in the study of endangered languages, 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, 26 June 2014, 34–42. ACL Anthology. https://doi.org/10.3115/v1/W14-2205.
Spencer, Andrew. 2016. Two morphologies or one? Inflection versus word formation. In The Cambridge Handbook of Morphology, edited by Andrew Hippisley and Gregory Stump, 27–49. Cambridge University Press, Cambridge. https://doi.org/10.1017/9781139814720.002.
Stump, Gregory. 2016. Inflectional paradigms: Content and form at the syntax-morphology interface. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9781316105290.
Trosterud, Trond. 2004. Porting morphological analysis and disambiguation to new languages. First Steps in Language Documentation for Minority Languages: Proceedings of the SALTMIL Workshop at LREC 2004. 90–92.
Trosterud, Trond. 2006. Grammatically based language technology for minority languages. Lesser-Known languages of South Asia: Status and policies, case studies and applications of information technology. Mouton de Gruyter, Berlin. 293–316. https://doi.org/10.1515/9783110197785.
Wolvengrey, Arok. 2001. nêhiyawêwin: itwêwina / Cree: Words, bilingual edition. University of Regina Press, Regina.
Woodbury, Anthony C. 2011. Language documentation. In The Cambridge Handbook of Endangered Languages, edited by Peter K. Austin and Julia Sallabank, 159–186. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511975981.009.
Language resource references
Eggleston, Keri. Online Tlingit Verb Dictionary. http://ankn.uaf.edu/~tlingitverbs/ (accessed April 19, 2019).
Ellis, Doug. Spoken Cree: Moose and Swampy Cree Dictionary. http://www.spokencree.org/Glossary (accessed April 19, 2019).
First Voices. Northern St̓át̓imcets. http://www.firstvoices.com/en/Northern-Statimcets (accessed April 8, 2019).
isiZulu.net. isiZulu.net: Bilingual Zulu-English dictionary. https://isizulu.net/ (accessed April 19, 2019).
Junker, Marie-Odile, Marguerite MacKenzie, Luci Bobbish-Salt, Alice Duff, Linda Visitor, Ruth Salt, Anna Blacksmith, Patricia Diamond, and Pearl Weistche. The Eastern James Bay Cree Dictionary on the Web: English-Cree and Cree-English, French-Cree and Cree-French (Northern and Southern dialects). http://dictionary.eastcree.org/words (accessed April 19, 2019).
Mi’kmaq Online. Mi’kmaq Online. https://www.mikmaqonline.org/ (accessed April 19, 2019).
Miyo Wahkohtowin Community Education Authority. Online Cree Dictionary. http://www.creedictionary.com/ (accessed April 19, 2019).
Mother Tongues Dictionaries. Online Híɫzaqv Dictionary. https://mothertongues.org/heiltsuk/ (accessed April 19, 2019).
Ohwejagekhá: Ha'degaenage. Mohawk. http://ohwejagehka.com/mohawk/ (accessed April 19, 2019).
Omàmiwininì Pimàdjwowin/The Algonquin Way Cultural Centre. The Algonquin Way Dictionary. http://www.thealgonquinway.ca/English/dictionary-e.php (accessed April 19, 2019).
Passamaquoddy-Maliseet Language Portal. Passamaquoddy-Maliseet Dictionary. https://pmportal.org/browse-dictionary (accessed April 19, 2019).
SENĆOŦEN Classified Word List. SENĆOŦEN Classified Word List. https://itservices.cas.unt.edu/~montler/Saanich/WordList/ (accessed April 19, 2019).
The University of Northern British Columbia. Sm’algyax Living Legacy Talking Dictionary. http://web.unbc.ca/~smalgyax/ (accessed April 19, 2019).
University of Alberta ALTLab. itwêwina. https://itwewina.altlab.app/ (accessed April 19, 2019).
University of British Columbia Department of Linguistics. Gitksan/English Online Dictionary (Beta). http://gitdict.nfshost.com (accessed April 19, 2019).
University of Minnesota Department of American Indian Studies. Dakota Dictionary Online. https://filemaker.cla.umn.edu/dakota/ (accessed April 19, 2019).
University of Minnesota Department of American Indian Studies. The Ojibwe People’s Dictionary. https://ojibwe.lib.umn.edu/ (accessed April 19, 2019).
University of Victoria Linguistics Department. Tłı̨chǫ Yatıı̀ Multimedia Dictionary. http://tlicho.ling.uvic.ca/ (accessed April 19, 2019).
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Antti Arppe, Jordan Lachler, Elizabeth Pankratz
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.