Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp

En hybrid grammatikkontroll for å rette kongruensfeil

Authors

DOI:

https://doi.org/10.7557/12.6346

Keywords:

Sámi language, grammar checking, neural networks, nlp, rule-based, agreement

Abstract

Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control over the development the tools. In this article we uncover the myth of machine learning being cheaper than a rule-based approach by showing how much work there is behind data generation, either via corpus annotation or creating tools that automatically mark-up the corpus. Earlier we have shown that the correction of grammatical errors, in particular compound errors, benefit from hybrid methods. Agreement errors, on the other other hand, are to a higher degree dependent on the larger grammatical context. Our experiments show that machine learning methods for this error type, even when supplemented by rule-based methods generating massive data, can not compete with the state-of-the-art rule-based approach.

References

Arppe, Antti. 2000. Developing a grammar checker for Swedish. In Proceedings of the 12th Nordic Conference of Computational Linguistics (NoDaLiDa 1999), edited by Torbjørn Nordgård, pp. 13–27. Department of Linguistics, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.

Beesley, Kenneth R and Lauri Karttunen. 2003. Finite State Morphology. CSLI publications.

Birn, Juhani. 2000. Detecting grammar errors with Lingsoft’s Swedish grammar checker. In Proceedings of the 12th Nordic Conference of Computational Linguistics (NoDaLiDa 1999), edited by Torbjørn Nordgård, pp. 28–40. Department of Linguistics, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.

Boyd, Adriane. 2018. Using wikipedia edits in low resource grammatical error correction. In Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pp. 79–84. https://doi.org//10.18653/v1/W18-6111.

Chollampatt, Shamil and Hwee Tou Ng. 2018. A multilayer convolutional encoder-decoder neural network for grammatical error correction. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32.

Dahlmeier, Daniel, Hwee Tou Ng, and Siew Mei Wu. 2013. Building a large annotated corpus of learner English: The NUS corpus of learner English. In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 22–31. Association for Computational Linguistics, Atlanta, Georgia.

Gaup, Børre, Sjur Moshagen, Thomas Omma, Maaren Palismaa, Tomi Pieski, and Trond Trosterud. 2006. From Xerox to Aspell: A first prototype of a North Sámi speller based on TWOL technology. In Finite-State Methods and Natural Language Processing, edited by Anssi Yli-Jyrä, Lauri Karttunen, and Juhani Karhumäki, pp. 306–307. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780885_37.

Hagen, Kristin and Pia Lane. 2001. ”det er fort gjort og skrive feil.” en presentasjon av en automatisk grammatikkontroll for bokmål pp. 93–102.

Karlsson, Fred. 1990. Constraint grammar as a framework for parsing unrestricted text. In Proceedings of the 13th International Conference of Computational Linguistics, edited by H. Karlgren, vol. 3, pp. 168–173. Helsinki. https://doi.org/10.3115/991146.99117.

Klein, Guillaume, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations, pp. 67–72. Association for Computational Linguistics, Vancouver, Canada. https://doi.org/10.18653/v1/P17-4012.

Lorusso, Paolo, Matteo Greco, Cristiano Chesi, and Andrea Moro. 2019. Asymmetries in extraction from nominal copular sentences: a challenging case study for nlp tools. In Proceedings of the Sixth Italian Conference on Computational Linguistics Bari (CliC-it 2019).

Miłkowski, Marcin. 2007. Automated building of error corpora of polish. Corpus Linguistics, Computer Tools, and Applications – State of the Art. PALC pp. 631–639.

Moshagen, Sjur, Jack Rueter, Tommi Pirinen, Trond Trosterud, and Francis M Tyers. 2014. Open-source infrastructures for collaborative work on under-resourced languages. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC, pp. 71–77.

Ng, Hwee Tou, Siew Mei Wu, Yuanbin Wu, Christian Hadiwinoto, and Joel Tetreault. 2013. The CoNLL-2013 shared task on grammatical error correction. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pp. 1–12. Association for Computational Linguistics, Sofia, Bulgaria.

Nickel, Klaus Peter. 1994. Samisk grammatikk. Davvi Girji, Kárášjohka, second edn.

Pirinen, Tommi A. and Krister Lindén. 2014. State-of-the-art in weighted finite-state spell-checking. In Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 8404, CICLing 2014, pp. 519–532. Springer-Verlag, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_43.

Simons, Gary F. and Charles D. Fennig (eds.). 2018. Ethnologue: Languages of the World. SIL International, Dallas, Texas, twenty-first edn.

UiT. 2018. SIKOR uit norges arktiske universitets og det norske sametingets samiske tekstsamling, versjon 06.11.2018. http://gtweb.uit.no/korp. Accessed: 2018-11-06.

Wiechetek, Linda. 2012. Constraint Grammar based correction of grammatical errors for North Sámi. In Proceedings of the Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL 8/AFLAT 2012), edited by G. De Pauw, G-M de Schryver, M.L. Forcada, K. Sarasola, F.M. Tyers, and P.W. Wagacha, pp. 35–40. European Language Resources Association (ELRA), Istanbul, Turkey.

Wiechetek, Linda, Flammie Pirinen, Mika Hämäläinen, and Chiara Argese. 2021. Rules ruling neural networks - neural vs. rule-based grammar checking for a low resource language. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pp. 1526–1535. INCOMA Ltd., Held Online. https://doi.org/https://doi.org/10.26615/978-954-452-072-4_171.

Downloads

Published

2022-08-30