LIA-korpusa – eldre talemålsopptak for norsk og samisk gjort tilgjengelege

Authors

DOI:

https://doi.org/10.7557/12.7157

Keywords:

corpus, spoken language, Norwegian dialects, North Sámi dialects, language infrastructure

Abstract

This paper presents the results from the project Language Infrastructure made Accessible (LIA) which had as its main goal to digitize and make accessible old recordings of spoken Norwegian and Sámi from various archives, first and foremost from the four partner institutions University of Oslo, University of Bergen, The Norwegian University of Science and Technology, and UiT The Arctic University of Norway. The infra­structures resulting from the project can be summarized as 1) various language technology resources such as a morphological tagger and a parser for Norwegian dialects, upgrading of the corpus interface Glossa and a new infrastructure for file depots, 2) a file depot of Norwegian dialect recordings, 3) three corpora of spoken Norwegian and one for North Sámi as well as the LIA treebank. The paper exemplifies how the corpora can be utilized.

References

Antonsen, Lene 2021: ‘Lei niogtredve go byggiimet.’ Om unormerte lån fra norsk i samisk talespråk. I Hagen et al., s. 179–200.

Antonsen, Lene og Trond Trosterud 2017: Ord sett innafra og utafra – en datalingvistisk analyse av nordsamisk. Norsk lingvistisk tidsskrift 35:2, s. 153–185.

Bentzen, Kristine. 2021. VO – OV-variasjon i nordsamisk. Hva kan LIA Sápmi fortelle oss? I Hagen et al., s. 201–216. https://doi.org/10.5617/osla.8483 DOI: https://doi.org/10.5617/osla.8483

Hagen, Kristin, Gjert Kristoffersen, Øystein A. Vangsnes og Tor A. Åfarli. 2021. Språk i arkiva. Ny forsking om eldre talemål frå LIA-prosjektet. Novus Forlag, ISBN 9788283900811, http://omp.novus.no/index.php/novus/catalog/book/19.

Johannessen, Janne Bondi, Joel Priestley, Kristin Hagen, Tor A. Åfarli, og Øystein A. Vangsnes. 2009. The Nordic Dialect Corpus – an Advanced Research Tool. In Jokinen, Kristiina and Eckhard Bick (eds.): Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. NEALT Proceedings Series Volume 4, https://aclanthology.org/W09-4612/.

Johannessen, Janne Bondi; Kristin Hagen; André Lynum og Anders Nøklestad. 2012. OBT+stat. A combined rule-based and statistical tagger. In Andersen, Gisle (ed.): Exploring Newspaper Language. Corpus compilation and research based on the Norwegian Newspaper Corpus. John Benjamins Publishing Company, 51–65. https://doi.org/10.1075/scl.49.03joh DOI: https://doi.org/10.1075/scl.49.03joh

Kinn, Kari, Per Erik Solberg og Pål Kristian Eriksen, 2013. Retningslinjer for morfologisk og syntaktisk annotasjon i Norsk dependenstrebank. https://tekstlab.uio.no/LIA/pdf/retningslinjer_NDT_norsk.pdf eller på engelsk: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/

Kåsen, Andre, Kristin Hagen, Anders Nøklestad, Joel Priestley, Per Erik Solberg og Dag Trygve Truslew Haug. 2022. The Norwegian Dialect Corpus Treebank. I Nicoletta Calzolari et al.: Proceedings of the Thirteenth Language Resources and Evaluation Conference, https://aclanthology.org/2022.lrec-1.516/.

Nøklestad, Anders og Åshild Søfteland 2007. Tagging a Norwegian Speech Corpus. In NODALIDA 2007 Conference Proceedings, https://aclanthology.org/W07-2436/.

Stjernholm, Karine og Ingunn Indrebø Ims. 2021. Språkendring i Vika: En komparativ analyse av data fra to talespråkskorpus. I Hagen et al., 2021, s. 109–128.

Vangsnes, Øystein A. og Marit Westergaard. 2021. Ka LIA fortæll? Eit gjensyn med kv-spørsmål i norske dialektar I Hagen et al., 2021, s. 155–178.

Øvrelid, Lilja Andre Kåsen, Kristin Hagen, Anders Nøklestad, Per Erik Solberg og Janne Bondi Johannessen. 2018. The LIA Treebank of Spoken Norwegian Dialects. I Nicoletta Calzolari et al.: Proceedings of the Eleventh International Conference on Language Resources and Evaluation, https://aclanthology.org/L18-1710/.

Downloads

Published

2023-12-21