Open Data for Linguists

  • Laura Janda UiT The Arctic University of Norway
Keywords: munin conference 2014


>> See video of presentation (25 min.)

The field of linguistics has taken a quantitative turn in recent years (Janda 2013). The majority of conference presentations, articles, and books in our field now involve some kind of quantitative analysis of language data, and results are often measured using statistical methods. However, best practices in terms of quantitative analysis in linguistics are still under development. Public archiving and sharing of data and statistical code are needed in order to move the field forward by providing standards and examples that can be followed.

The Tromsø Repository of Language and Linguistics, also known as “TROLLing”, at is designed to meet this need. TROLLing is an international archive of linguistic data and statistical code that is provided as a free professional service to the worldwide community of linguists. TROLLING shares the platform of the Harvard Dataverse; assigns a permanent URL to each post (currently a “handle” URL, but will convert to DOI during summer 2014); collects metadata that are searchable through the site; and is professionally managed by the university library in Tromsø and an international Steering Committee.

Authors of books and articles published in linguistics journals are welcome to deposit their data in TROLLing, along with citations of their articles. Conversely, authors can reference their data by citing their TROLLing posts in their publications. Additionally, researchers are welcome to archive completed studies on the TROLLing site regardless of whether or not the results are published in scholarly venues.

TROLLing went live for public use in the summer of 2014. We are currently working on spreading the word to our colleagues by asking editors of major scholarly journals to recommend it to authors, holding workshops at meetings of professional organizations, and using listservs.

This presentation will demonstrate how TROLLing works, what kinds of metadata it collects, how that data can be harvested and searched, and what kinds of data can be archived at this site.

Janda, Laura A. 2013. “Quantitative Methods in Cognitive Linguistics”. In Laura A. Janda, ed. Cognitive Linguistics: The Quantitative Turn. The Essential Reader, 1-32. Berlin: De Gruyter Mouton.

Author Biography

Laura Janda, UiT The Arctic University of Norway
Laura A. Janda (BA 1979 Princeton U., PhD 1984 UCLA) has been professor of Russian linguistics for 30 years (1 year at UCLA, 6 at U of Rochester, 17 at UNC Chapel Hill, and since 2008 at UiT). Her primary research interests involve cognitive linguistics and the Slavic languages, especially Russian. She has published 17 books and over 100 articles. More information is available at