Open Data for Linguists
>> See video of presentation (25 min.)
The field of linguistics has taken a quantitative turn in recent years (Janda 2013). The majority of conference presentations, articles, and books in our field now involve some kind of quantitative analysis of language data, and results are often measured using statistical methods. However, best practices in terms of quantitative analysis in linguistics are still under development. Public archiving and sharing of data and statistical code are needed in order to move the field forward by providing standards and examples that can be followed.
The Tromsø Repository of Language and Linguistics, also known as “TROLLing”, at http://opendata.uit.no/ is designed to meet this need. TROLLing is an international archive of linguistic data and statistical code that is provided as a free professional service to the worldwide community of linguists. TROLLING shares the platform of the Harvard Dataverse; assigns a permanent URL to each post (currently a “handle” URL, but will convert to DOI during summer 2014); collects metadata that are searchable through the site; and is professionally managed by the university library in Tromsø and an international Steering Committee.
Authors of books and articles published in linguistics journals are welcome to deposit their data in TROLLing, along with citations of their articles. Conversely, authors can reference their data by citing their TROLLing posts in their publications. Additionally, researchers are welcome to archive completed studies on the TROLLing site regardless of whether or not the results are published in scholarly venues.
TROLLing went live for public use in the summer of 2014. We are currently working on spreading the word to our colleagues by asking editors of major scholarly journals to recommend it to authors, holding workshops at meetings of professional organizations, and using listservs.
This presentation will demonstrate how TROLLing works, what kinds of metadata it collects, how that data can be harvested and searched, and what kinds of data can be archived at this site.Janda, Laura A. 2013. “Quantitative Methods in Cognitive Linguistics”. In Laura A. Janda, ed. Cognitive Linguistics: The Quantitative Turn. The Essential Reader, 1-32. Berlin: De Gruyter Mouton.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).