Nordlyd <p>is published by the Department of Language and Culture at UiT The Arctic University of Norway, and features articles with some connection to UiT, e.g. papers having been presented here or at events organized by members of the UiT linguistics community. Contributions are normally by invitation. All submissions are peer-reviewed.</p> Septentrio Academic Publishing en-US Nordlyd 1503-8599 <p>Authors who publish with this journal agree to the following terms:<br><br></p> <ol type="a"> <li class="show">Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a <a href="">Creative Commons Attribution-NonCommercial License</a> that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.</li> <li class="show">Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.</li> <li class="show">Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See <a href="" target="_new">The Effect of Open Access</a>).</li> </ol> Preface Sjur Moshagen Lene Antonsen Øystein Vangsnes Copyright (c) 2022 Sjur Moshagen 2022-08-30 2022-08-30 46 1 Trond Trosterud ved 60 <p style="font-weight: 400;">This edition of Nordlyd is a celebratory publication in honor of our colleague, Professor Trond Trosterud, in connection with his turning 60 on the 30<sup>th</sup> of August 2022. The edition contains 22 articles written by a total of 43 authors - mostly people who have collaborated with Trond in previous years. We have also written an introduction about Trond, and at the end of the book there is a list of Trond's publications. We are immensely happy to honor Trond with this book, which contains a lot of exciting information on many topics from various languages and which in that respect mirrors the wide and manifold interests of the jubilarian. We thank everyone who has written the articles, and we thank all the colleagues who have assessed and commented on the articles.</p> Lene Antonsen Sjur Moshagen Øystein Vangsnes Copyright (c) 2022 Sjur Moshagen, Lene Antonsen, Øystein Øystein Vangsnes 2022-08-30 2022-08-30 46 1 1–8 1–8 10.7557/12.6663 Trond Trosterud – publikasjonar 1989–2022 Sjur Moshagen Lene Antonsen Øystein Vangsnes Copyright (c) 2022 Sjur Moshagen; Lene Antonsen, Øystein Vangsnes 2022-08-30 2022-08-30 46 1 309–316 309–316 10.7557/12.6671 Mo, do, so, da – duortnussámi dovdomearkan? <p class="NL-Abstract"><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract" data-ccp-parastyle-defn="{&quot;ObjectId&quot;:&quot;5237b0a6-8597-4a5c-9438-8110c88f0ef7|35&quot;,&quot;ClassId&quot;:1073872969,&quot;Properties&quot;:[469775450,&quot;NL-Abstract&quot;,201340122,&quot;2&quot;,134233614,&quot;true&quot;,469778129,&quot;NL-Abstract&quot;,335572020,&quot;1&quot;,469777841,&quot;Times New Roman&quot;,469777842,&quot;Times&quot;,469777843,&quot;SimSun&quot;,469777844,&quot;Times New Roman&quot;,469769226,&quot;Times New Roman,Times,SimSun&quot;,268442635,&quot;18&quot;,335559705,&quot;2052&quot;,335559685,&quot;284&quot;,335559737,&quot;284&quot;,335559739,&quot;60&quot;,335551550,&quot;6&quot;,335551620,&quot;6&quot;,469778324,&quot;Normal&quot;,469778325,&quot;[\&quot;NL-AbstractHeading\&quot;]&quot;]}">In th</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">is</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> article</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">,</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> I examine</span> <span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">the dialect forms of </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">a set of North Saami </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">pronouns</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> – </span></span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">mo</span></span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">,</span></span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> do</span></span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">,</span></span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> so</span></span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">,</span> </span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">da</span> </span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">(</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">‘I, you, he/she, it’; standardized forms: </span></span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">mon, don, son, dan</span></span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">)</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">. More specifically, I investigate w</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">here</span> <span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">the forms are in use and how the fo</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">r</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">ms have developed. The material shows that the final </span></span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">-n</span></span><span class="TextRun SCXW130659574 BCX9" lang="SE-NO" xml:lang="SE-NO" data-contrast="auto"><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> has </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">changed in a number of </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">stages before it disappeared completely. I suggest that the</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">se pronominal forms are</span> <span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">a</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> dialect mark of </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">the Torne Saami </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">dialect group</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> (named after the Torne river valley on the border between Sweden and Finland)</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">. The </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">pronominal</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> forms are used throughout this </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">dialect </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">area, and the use continues north to Kvænangen</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> in Norway</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">, which </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">in turn</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> belongs to the Sea Sami dialect group. In the Kvænangen dialect there are also a couple of other characteristics that are typical </span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">for</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> some</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> of the</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract"> Torne S</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">a</span><span class="NormalTextRun SCXW130659574 BCX9" data-ccp-parastyle="NL-Abstract">ami dialects.</span></span><span class="EOP SCXW130659574 BCX9" data-ccp-props="{&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:284,&quot;335559737&quot;:284,&quot;335559739&quot;:60}"> </span></p> Lene Antonsen Copyright (c) 2022 Lene Antonsen 2022-08-30 2022-08-30 46 1 9–17 9–17 10.7557/12.6394 All that glitters... <p>Evaluation has emerged as a central concern in natural language processing (NLP) over the last few decades. Evaluation is done against a <em>gold standard</em>, a manually linguistically annotated dataset, which is assumed to provide the ground truth against which the accuracy of the NLP system can be assessed automatically. In this article, some methodological questions in connection with the creation of gold standard datasets are discussed, in particular (non-)expectations of linguistic expertise in annotators and the interannotator agreement measure standardly but unreflectedly used as a kind of quality index of NLP gold standards.</p> Lars Borin Copyright (c) 2022 Lars Borin 2022-08-30 2022-08-30 46 1 19–26 19–26 10.7557/12.6348 The Preconceptual Basis of Noun Class (Gender) <div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>Noun class is widely seen as “standing out” from other morphosyntactic categories in having a basis in ontological beliefs, or a ‘semantic core’. The consequence of this view is that noun classes in natural languages frequently do not cohere semantically. Here I motivate an aspectual alternative according to which noun class is grounded in low-level cognitive processes including the detection of agency and sex-related cues (including shape/size) and ‘mode’ of attention. This suggests a way of bringing noun class more into line with the perspectivizing contribution of morphosyntactic features in general.</p> </div> </div> </div> Patrik Bye Copyright (c) 2022 Patrik Bye 2022-08-30 2022-08-30 46 1 27–36 27–36 10.7557/12.6371 How weak are Romanian clitic pronouns? <p>In traditional linguistics, pronouns are divided into two classes: those that can bear word stress, coined “strong”, “full” or “tonal”, and those that cannot, coined “weak”, “clitic”, or “atonal”. However, in the last decades, research on this topic has shown that items generally labeled as clitics are far more complex. Somewhere between words and affixes, these hybrid linguistic entities challenge both description and modeling. As for Romanian, the debate on weak (i.e., clitic) pronouns has been dominated by the question of their categorial status: are these items clitics or affixes? In this article, I present and scrutinize different approaches that support the claim that there are differences between proclitics and enclitics, i.e., between clitics occurring before vs. after the verb; this includes not only positional, but also featural differences. I identify various types of ambiguities in Romanian that could lead to improper data interpretation, and, based on an analysis of syllabicity – the most salient feature of Romanian weak pronouns – I refute claims for treating clitics in preverbal position differently than in postverbal position. Furthermore, using evidence from both historical data and data pertaining to language varieties, I show regularities in the Romanian weak pronoun system, bringing evidence against the claim that Romanian weak pronouns show a great deal of idiosyncrasies.</p> Ciprian-Virgil Gerstenberger Copyright (c) 2022 Ciprian-Virgil Gerstenberger 2022-08-30 2022-08-30 46 1 37–57 37–57 10.7557/12.6341 Mari morpheme order revisited <p>Morpheme order in Mari declension has been extensively studied in the past but attempts to explain the large amounts of alternation found here have been constrained by the difficulty of accessing sufficient data to properly elucidate the complexities in this domain. The paper at hand examines the prospect of using the Corpus of Literary Mari, created by an international workgroup around Trond Trosterud and his colleagues and hosted by Giellatekno, and other recently published resources on Mari to efficiently access vast amounts of data to quantitatively study this subject in a manner that had not previously been possible.</p> Luan Hammer Jeremy Bradley Copyright (c) 2022 Luan Hammer, Jeremy Bradley 2022-08-30 2022-08-30 46 1 59–74 59–74 10.7557/12.6373 Den historiske utviklinga til preaspirasjon i samiske språk <p>Preaspirasjon av ustemde klusilar er eit velkjent drag i dei fonologiske systema i dei samiske språka, som i nordsamisk [jahkiː] <em>jahki</em> ‘år.Nom.Sg’. Den er til stades i alle samiske språk som vi har kjennskap til, med unnatak av berre enaresamisk. Vi forstår også godt det historiske forholdet mellom samisk preaspirasjon og stadievekslingssystemet i andre uralske språk, framfor alt den austersjøfinske greina. Dagens samiske språk viser ein del variasjon i korleis preaspirerte klusilar oppfører seg i det fonologiske systemet. I dette bidraget skisserer eg korleis denne variasjonen har oppstått gjennom utviklinga frå ursamisk til dei moderne språka. Scenarioet som eg står for botnar i teorien om l<em>ivssyklusen til fonologiske mønster</em>. Eg viser at denne teorien eignar seg særs godt for å forstå diverse fonetiske og fonologiske mønster som kjem ut av ei og same kjelde, også på samisk språkgrunn. For å underbyggje denne påstanden sporar eg utviklinga av preaspirasjon frå eit fonetisk drag i ursamisk og fram til dei diverse resultata som vi finn i dagens språk.</p> Pavel Iosad Copyright (c) 2022 Pavel Iosad 2022-08-30 2022-08-30 46 1 75–101 75–101 10.7557/12.6350 Flertalsformer af ari-ord i den færøske talesprogsbank <div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>The aim of this article is to investigate dialectal variation of plural endings of ari-words in Faroese, i.e., masculine words with ari-ending in singular. Such words for example lærari ‘teacher’ may get different plural endings in spoken Faroese: ar, ir, a, and R (lærarar, lærarir, lærara, læraR). In the written language there is only one correct plural form which is ar: lærarar. The empirical material in this article is picked up from a corpus of spoken language, Føroyskur talumálsbanki, which is a corpus management and analysis system for annotated corpora. The article is also a study of the usability of the corpus concerning dialectal variation in spoken Faroese. The result of the correlation shows that the non-standardized ir-variant is most frequent in the corpus. Here I investigate the variation by correlating them with two non-linguistic variables, place, and age.</p> </div> </div> </div> <p> </p> <p> </p> Jógvan í Lon Jacobsen Copyright (c) 2022 Jógvan í Lon Jacobsen 2022-08-30 2022-08-30 46 1 103–113 103–113 10.7557/12.6440 Temporal relations in North Sámi ECM constructions <p class="NL-Abstract"><span class="TextRun SCXW45323745 BCX9" lang="EN-US" xml:lang="EN-US" data-contrast="auto"><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract" data-ccp-parastyle-defn="{&quot;ObjectId&quot;:&quot;b8cfb3f7-4ee1-4c76-aabd-5a5933ad39ab|209&quot;,&quot;ClassId&quot;:1073872969,&quot;Properties&quot;:[469775450,&quot;NL-Abstract&quot;,201340122,&quot;2&quot;,134233614,&quot;true&quot;,469778129,&quot;NL-Abstract&quot;,335572020,&quot;1&quot;,469777841,&quot;Times New Roman&quot;,469777842,&quot;Times&quot;,469777843,&quot;Times New Roman&quot;,469777844,&quot;Times New Roman&quot;,469769226,&quot;Times New Roman,Times&quot;,268442635,&quot;18&quot;,335559705,&quot;2052&quot;,335559685,&quot;284&quot;,335559737,&quot;284&quot;,335559739,&quot;60&quot;,335551550,&quot;6&quot;,335551620,&quot;6&quot;,469778324,&quot;Normal&quot;,469778325,&quot;[\&quot;NL-AbstractHeading\&quot;]&quot;]}">The </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">embedded </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">verb in North Sámi </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">ECM-constructions </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">can appear in one of three different forms: </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">past participle, progressive and </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">infinitive. </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">The </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">existing </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">descriptions</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract"> of North Sámi say </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">that</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract"> the past participle places the embedded event before the </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">higher</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract"> event</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">, </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">that</span> <span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">the progressive (traditionally called </span></span><span class="TextRun SCXW45323745 BCX9" lang="EN-US" xml:lang="EN-US" data-contrast="auto"><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">aktio essive</span></span><span class="TextRun SCXW45323745 BCX9" lang="EN-US" xml:lang="EN-US" data-contrast="auto"><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">) expresses temporal coincidence </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">with the higher event</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">, and </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">that the infinitive normally gets a future interpreta</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">tion, but it might also coincide temporally with the higher clause</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">.</span> <span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">This paper shows that </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">although the</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">se</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract"> generalisations are mostly correct, </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">variation in the temporal interpretation of </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">ECM</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract"> complement clauses</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract"> can be caused by a number of factors. In particular, the semantics of the matrix verb and the aspectual properties of the lower verb </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">can influence the temporal relation</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract"> between the matrix event and the embedded event</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">. In addition</span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">, temporal adverbials </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">can </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">shift </span><span class="NormalTextRun SCXW45323745 BCX9" data-ccp-parastyle="NL-Abstract">or fix the time of the embedded event.</span></span><span class="EOP SCXW45323745 BCX9" data-ccp-props="{&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:284,&quot;335559737&quot;:284,&quot;335559739&quot;:60}"> </span></p> Marit Julien Copyright (c) 2022 Marit Julien 2022-08-30 2022-08-30 46 1 115–124 115–124 10.7557/12.6261 You can’t suggest that?! <p>In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them.<br>The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi.<br>The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors.<br>The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors.<br>Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail.<br>We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable.</p> Heiki-Jaan Kaalep Flammie Pirinen Sjur Moshagen Copyright (c) 2022 Sjur Nørstebø Moshagen, Heiki-Jaan Kaalep, Flammie Pirinen 2022-08-30 2022-08-30 46 1 125–139 125–139 10.7557/12.6349 Kantasaamen sensiivisen *-kše̮-johtimen kehityksestä ja edustuksesta nykysaamessa <div> <p class="NL-Abstract"><span lang="EN-US">The article discusses Saami censive verbs containing the suffixal element <em>-š-</em>, such as North Saami <em>guhkášit </em>​‘consider (too) long’ (of <em>guhkki </em>‘long’). The occurrence of individual derivatives and derivational subtypes across the Saami languages are studied on the basis of extensive dictionary data, and the outlines of the historical development of the derivational type are sketched. Considering the Inari Saami verbs of type <em>viššâlšukšâđ</em> ‘consider diligent’ and data from past centuries, it is argued that the derivational type goes back to Proto-Saami *-<em>kše̮</em>-, which, in turn, is a loan suffix from Finnic (cf. Finnish <em>kummeksua</em> ‘find something odd’ ← <em>kumma</em> ‘odd’, <em>halveksia</em> ‘despise’ ← <em>halpa</em> ‘cheap’).</span></p> </div> Eino Koponen Juha Kuokkala Copyright (c) 2022 Eino Koponen, Juha Kuokkala 2022-08-30 2022-08-30 46 1 141–158 141–158 10.7557/12.6446 Creating a corpus for Kven, a minority language in Norway <p>Language documentation, including the development and use of corpora, is frequently linked to revitalisation. This is also the case for the Kven language, a Finnic minoritised language, traditionally spoken in the two northernmost counties of Norway. Kven is a recognised minority language in Norway, protected by the European Charter for Regional or Minority Languages. This status led to increased efforts to document Kven, including the development of the Ruija Corpus, consisting of recordings of interviews in Kven. The corpus was an important tool for the standardisation of Kven. In this article we describe how the corpus was developed and account for search functions, including a discussion of the limitations of the corpus. We also discuss the role of corpora and other online tools for language revitalisation, with a particular focus on the standardisation of Kven and conclude by reflecting on how expertise also resides with the speakers of an endangered language and that they have a right to be involved in efforts of language documentation and revitalisation<strong>. </strong></p> Pia Lane Kristin Hagen Anders Nøklestad Joel Priestley Copyright (c) 2022 Pia Lane, Kristin Hagen, Anders Nøklestad, Joel Priestley 2022-08-30 2022-08-30 46 1 159–170 159–170 10.7557/12.6345 Anarâškielâ postpositioi pelni já piälán čäällim sierâ já oohtân tievâdâsâinis SIKOR-tekstâčuágálduvâst <p>Inari Saami does not have a strong written tradition. The current orthography was adopted as recently as the 1990s, and the revitalization process is beginning only now to shift its focus from increasing the number of speakers to strengthening the literacy of the language. This article studies the Inari Saami postpositions <em>pelni</em> and <em>piälán</em> as well as their shorter forms <em>peln</em>/<em>beln</em> and <em>pel</em>/<em>bel</em>. The main question is whether these postpositions are joined to the noun preceding them or stand after it as separate words. The research is based on the SIKOR Inari Saami free corpus developed by the Giellatekno team. The postpositions have been analyzed semantically taking into account the frequency with which they occur in the literature. They have been divided into four semantic groups: 1) place, 2) orientation and direction, 3) time and 4) other semantic categories. The long forms <em>pelni</em> and <em>piälán</em> are mostly written as separate words – except for when they are used to express orientation or direction – whereas the short forms <em>peln</em>/<em>beln</em> and <em>pel</em>/<em>bel</em> are usually joined to the preceding word other than in time expressions. Alternative explanations for such variation are also discussed.</p> Petter Morottaja Marja-Liisa Olthuis Fabrizio Brecciaroli Copyright (c) 2022 Marja-Liisa Olthuis, Petter Morottaja, Fabrizio Brecciaroli 2022-08-30 2022-08-30 46 1 171–180 171–180 10.7557/12.6384 Språkdokumentasjon innen fennistikken og kvensk <p>The study of the Finnish language – called Fennistics – was focused on collecting Finnish dialect material from very early on. During the 19th century interest in studying dialects was governed by the idea that dialects could be used to develop modern written Finnish. However, gradually the study of dialects also became an area of study in its own right. Collecting material on Kven dialects belonged to the larger project of Fennistic materials collection from the very beginning. Therefore, many Kven dialect materials can be found in Finnish dialect archives. The documentation which resulted from dialect collections gathered via the field of Fennistics have been used in the process of revitalizing Kven. In this article, language documentation is defined to be an activity which also includes traditional dialectology. By contrast, documentary linguistics is a field of linguistics established in the 1990s which focuses on revitalizing endangered languages. The difference between language documentation in the Fennistic tradition and documentation of endangered languages in the field of documentary linguistics today is also discussed.</p> Leena Niiranen Copyright (c) 2022 Leena Niiranen 2022-08-30 2022-08-30 46 1 181–192 181–192 10.7557/12.6347 Low hanging fruit and the Boasian trilogy in digital lexicography of morphologically rich languages <p>Online lexicographical resources for the morphologically rich Indigenous languages in Canada use a wide range of strategies for conveying their language’s morphological system, i.e. how words are inflected and derived, which this paper illustrates in a survey of seventeen bilingual online resources. The strategies these resources employ boil down to two basic approaches to the underlying structure of the resource: 1) a lexical database, or 2) a computational model. Most resources we surveyed are constructed around lexical databases. These assume the word(form) as the basic unit, an assumption that makes it difficult to incorporate the language’s sub-word, morphological structure in full detail. However, one resource uses a computational morphological model to bring the language’s morphology into the core of the lexicon – this proved to be a “low-hanging fruit” in the application of language technology that had been accomplished within a reasonable time-frame, as has been advocated by Trond Trosterud. We discuss the value created and questions raised by this approach and argue that it successfully overcomes the traditional Boasian three-way partition of dictionary, grammar, and text, creating integrated language resources that meet the modern needs of low-resource endangered languages and their communities.</p> Elizabeth Pankratz Antti Arppe Jordan Lachler Copyright (c) 2022 Antti Arppe, Jordan Lachler, Elizabeth Pankratz 2022-08-30 2022-08-30 46 1 193–204 193–204 10.7557/12.6441 Samiske barnehagers rolle i språkrevitaliseringa <p>More than a thousand children attend Sámi kindergartens daily, while quite a number of Sámi children get Sámi language instruction in other kindergartens. This activity is one of the most important arenas for transmission and acquisition of Sámi languages. A question raised in this article is how these kindergartens are used as research fields in the disciplines of language and sociology and early childhood pedagogy. Another question is what kind of language teaching models are used. The article shows that little research has been carried out on how Sámi kindergartens teach Sámi language. There is also little research done on the results from this education. A review of the Sámi kindergartens’ history shows that statistical material on Sámi kindergartens and Sámi language instruction in other kindergartens is available only from some geographical areas, and existing statistical information is only partly suitable for analysis. This makes it difficult to use existing material to monitor the vitality of Sámi languages. The article calls for more research on Sámi kindergartens and language teaching models used in them. A goal could be to create a basis for monitoring this crucial indicator of language vitality for Sámi languages: whether new generations of Sámi become Sámi-speakers or not. This should be followed up with more research on the language teaching models used in Sámi kindergartens</p> <p> </p> Torkel Rasmussen Copyright (c) 2022 Torkel Rasmussen 2022-08-30 2022-08-30 46 1 205–217 205–217 10.7557/12.6387 Cyclic feeding interactions between finite-state mal-rules <div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>Intelligent Language Tutoring Systems typically attempt to automatically diagnose learner errors in order to provide individualized feedback. One common approach is the use of mal-rules to extend normative grammars by licensing specific types of learner errors. In finite-state morphologies, mal-rules can be implemented as two-level rules or replace rules. However, unlike the phonological rules of natural languages, mal-rules do not necessarily behave as a coherent system, especially with respect to feeding interactions. Using examples from learner errors attested in the RULEC corpus of Russian learner texts, we illustrate the problem of cyclic feeding interactions that can occur between mal-rules. We then describe a formal algorithm for identifying an optimal ordering for mal-rules to be applied to a transducer.</p> </div> </div> </div> Robert Reynolds Laura Janda Tore Nesset Copyright (c) 2022 Robert Reynolds, Laura Janda, Tore Nesset 2022-08-30 2022-08-30 46 1 219–229 219–229 10.7557/12.6306 Establishing a Role for Minority Source Language in Multilingual Facilitation <p>This document is dedicated to a young man, who, despite the number of times he has traveled around the Sun, is always open to new thoughts on ways to include languages, especially the smaller ones, and the people who speak them in far-reaching and sustainable open-source development. Since Trond Trosterud in Tromsø is attributed a terrific track record in transnational and circum-polar linguistics, we try to attract his attention further afield, to languages and phenomena he has only touched. The language phenomena addressed here come from Erzya and the Zyrian variety of Komi; Erzya has issues presented but not discussed in his dissertation, whereas Komi brings in issues of adnominal and predicate number marking in conjunction with case homonymy that have been resolved thanks to the flexibility of the infrastructure. These source languages, like others, have documented new dimensions and added shape to the evergrowing infrastructure.</p> Jack Rueter Niko Partanen Khalid Alnajjar Mika Hämäläinen Copyright (c) 2022 Jack Rueter, Niko Partanen, Mika Hämäläinen, Khalid Alnajjar 2022-08-30 2022-08-30 46 1 231–240 231–240 10.7557/12.6370 Om kjønn og adjektiv <p><span class="TextRun SCXW14594075 BCX9" lang="EN-US" xml:lang="EN-US" data-contrast="none"><span class="NormalTextRun SCXW14594075 BCX9">Do female and male writers use adjectives differently? This article is a survey of the potentially gendered use of adjectives in Norwegian novels. It is also a tribute to Trond </span><span class="SpellingError SCXW14594075 BCX9">Tosteruds's</span><span class="NormalTextRun SCXW14594075 BCX9"> legendary article on grammatical gender. While our concern here is biological sex of authors and their use of adjectives, the man of the hour was concerned with rule-governed gender on nouns, albeit with a biological accent. Several readers — and listeners at MONS in 1999 — took note of the bold rule that said that oblong objects, not to mention protruding natural formations, are masculine, while natural phenomena such as pits and cavities are feminine (</span><span class="SpellingError SCXW14594075 BCX9">Trosterud</span><span class="NormalTextRun SCXW14594075 BCX9"> 2001).</span></span><span class="EOP SCXW14594075 BCX9" data-ccp-props="{&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559685&quot;:284,&quot;335559737&quot;:284}"> </span></p> Ingebjørg Tonne Helene Uri Lars Johnsen Copyright (c) 2022 Ingebjørg Tonne, Helene Uri, Lars Johnsen 2022-08-30 2022-08-30 46 1 241–257 241–257 10.7557/12.6386 Projections for Sámi in Norway <div> <p class="NL-Abstract"><a name="_Toc432763383"></a>The paper presents three different projections of the future number of Sámi language users in Norway based on the contemporary number of children receiving instruction in Sámi in the Norwegian school system, either North, Lule or South Sámi. There exist three different curricula for the subject Sámi, one for first language pupils (Sámi 1), one for second language pupils (Sámi 2), and one for pupils with no previous knowledge of the language (Sámi 3). Depending on whether only Sámi 1 pupils become future language users, or also Sámi 2 or even Sámi 3 pupils do so, a sober, moderate, and optimistic prognosis can be made, respectively. The sober prognosis predicts a dramatic decrease for North Sámi and slight decrease for the other two varieties, whereas the moderate prognosis predicts stability for North Sámi and increase for Lule and South Sámi, and the optimistic prognosis predicts an increase for all three varieties. A number of factors that are likely to modulate the prognoses are brought to attention and discussed, unveiling that more information is needed regarding a number of issues that bear on how the future of the Sámi languages in Norway can be estimated.</p> </div> Øystein Alexander Vangsnes Copyright (c) 2022 Øystein Alexander Vangsnes 2022-08-30 2022-08-30 46 1 259–272 259–272 10.7557/12.6427 Conrad Svendsens beskrivelse av norsk tegnspråk <p>This text presents an introductory investigation of Conrad Svendsen’s analysis of Norwegian Sign Language, as it appears in a set of handwritten notes that have been preserved after him, supposedly from around 1910. The notes are significant for the history of the field, since little material has been preserved about Norwegian Sign Language before the end of the 20th century. Svendsen was an important personality, at first in Christiania’s [Oslo’s] community for the education of deaf children, then in the Church’s services for the deaf community, and finally in the establishment of a «home for the deaf». Since Svendsen’s own sign language political views have paradoxical features, it is interesting to find out to what extent he had a real understanding of what sign language is. The introductory investigation concludes that Svendsen had a good understanding of many aspects of Norwegian Sign Language and also was able to articulate well his pedagogical presentation of the material. However, he seems to assume that the sign language is less conventionalized than we have a reason today to believe that it may have been in his day. And to be able to assess how autonomous his understanding was, a closer investigation is needed of how he may have been influenced by earlier authors, not least in the German-speaking world.</p> Arnfinn Muruvik Vonen Copyright (c) 2022 Arnfinn Muruvik Vonen 2022-08-30 2022-08-30 46 1 273–284 273–284 10.7557/12.6392 Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp <div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <div class="page" title="Page 1"> <div class="layoutArea"> <div class="column"> <p>Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control over the development the tools. In this article we uncover the myth of machine learning being cheaper than a rule-based approach by showing how much work there is behind data generation, either via corpus annotation or creating tools that automatically mark-up the corpus. Earlier we have shown that the correction of grammatical errors, in particular compound errors, benefit from hybrid methods. Agreement errors, on the other other hand, are to a higher degree dependent on the larger grammatical context. Our experiments show that machine learning methods for this error type, even when supplemented by rule-based methods generating massive data, can not compete with the state-of-the-art rule-based approach.</p> </div> </div> </div> </div> </div> </div> Linda Wiechetek Flammie Pirinen Børre Gaup Chiara Argese Thomas Omma Copyright (c) 2022 Linda Wiechetek, Flammie Pirinen, Børre Gaup, Chiara Argese, Thomas Omma 2022-08-30 2022-08-30 46 1 285–297 285–297 10.7557/12.6346 Čalbmi čalmmis ja suoldnečalmmit suoidnečalmmis <p class="NL-Abstract"><span lang="EN-US">North Saami čalbmi ‘eye’ (&lt; Proto-Uralic *ćilmä) has cognates in all Uralic languages, and everywhere they refer to the visual organs of humans and animals. However, scholars have barely paid attention to the grammatical functions of čalbmi in compound-like formations such as suoldnečalbmi “dew eye”, suoidnečalbmi “grass eye”, varračalbmi “blood eye”, jiekŋačalbmi “ice eye”, vuoktačalbmi “hair eye” and muorječalbmi “berry eye”. This article examines such expressions as so-called singulatives – grammatical means for individuating a single referent from a group or mass (i.e., ‘a single drop of dew’, ‘a single blade of grass’, ‘a single drop of blood’, ‘a single crystal of ice’, ‘a single human hair’ and ‘a single berry’). The article mainly discusses morphological, syntactic and semantic features of singulatives in North Saami and other present-day Saami languages, but comparable singulatives in Khanty, Mansi and Samoyed languages as well as in Hungarian suggest that singulative expressions such as *weri-ćilmä ‘a single drop of blood’, *jäŋi-ćilmä ‘a single crystal of ice; hailstone’ and *me̮rja-ćilmä ‘a single berry’ can, in principle, be reconstructed all the way back to Proto-Uralic.</span></p> Jussi Ylikoski Copyright (c) 2022 Jussi Ylikoski 2022-08-30 2022-08-30 46 1 299–307 299–307 10.7557/12.6304