Low-Resource Active Learning of North Sámi Morphological Segmentation

Stig-Arne Grönroos; Kristiina Jokinen; Katri Hiovain; Mikko Kurimo; Sami Virpioja

doi:10.7557/5.3465

Authors

Stig-Arne Grönroos Department of Signal Processing and Acoustics Aalto University
Kristiina Jokinen Institute of Behavioural Sciences University of Helsinki
Katri Hiovain Institute of Behavioural Sciences University of Helsinki
Mikko Kurimo Department of Signal Processing and Acoustics Aalto University
Sami Virpioja Department of Information and Computer Science Aalto University

DOI:

https://doi.org/10.7557/5.3465

Abstract

Many Uralic languages have a rich morphological structure, but lack tools of morphological analysis needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications.We study how to create a statistical model for morphological segmentation of North Sámi language with a large unannotated corpus and a small amount of human-annotated word forms selected using an active learning approach. For statistical learning, we use the semi-supervised Morfessor Baseline and FlatCat methods. Aer annotating 237 words with our active learning setup, we improve morph boundary recall over 20% with no loss of precision.

Metrics

PDF views

790

|

Low-Resource Active Learning of North Sámi Morphological Segmentation

Authors

DOI:

Abstract

Metrics

Downloads

Published

How to Cite

Issue

Section

Information

Make a Submission

Current Issue