Data citation in linguistics publications
A scholar-led, community-based initiative
The creation and dissemination of reproducible research is receiving ever-growing attention in discussions on best practices in publication and education. A key element of these practices is appropriate citation of data sources. In this presentation we describe one scholar-led initiative to increase awareness of the value of data citation in scholarly communication across the discipline of linguistics.
Practices in linguistics are varied; it is primarily a data-driven social science, in which inferences about the properties of language, human cognition, cultures and societies are drawn from observations of language. The primary data sets underlying the field are records of these observations in the form of, for instance, texts, audio/video recordings and annotations. While linguists have always relied on language data, they have not always facilitated access to those data in publications (Berez-Kroeker et al. 2018). A great deal of published linguistic research is therefore not reproducible, either in principle or in practice.
A primary factor hindering reproducible research in linguistics is the lack of standards for data citation in scholarly publishing. Lacking such standards, the field continues to emphasize linguistic analyses over linguistic data, and as a result, linguists have little incentive to make the data behind research publications accessible.
Funded by the US National Science Foundation, since 2015 we have endeavored to develop and promote standards for citing data. We are an international (Norway, US, Canada, Australia) team of scholars including linguistic data practitioners, scholarly communication librarians, and digital archivists.
In this presentation we discuss our coordinated efforts over the past four years, including:
- 3 international workshops to identify technical and sociological barriers to research data citation in linguistics publications;
- The formation of the Linguistics Data Interest Group (https://rd-alliance.org/groups/linguistics-data-ig) within the Research Data Alliance, with nearly 100 members from the international linguistics scholarly community.
- Short-form technical courses and presentations offered through the Linguistic Society of America.
- An open-access position paper (Berez-Kroeker et al. 2018).
- The Austin Principles of Data Citation in Linguistics (http://linguisticsdatacitation.org), which annotates the FORCE11 Joint Declaration of Data Citation Principles (Data Citation Synthesis Group 2014) for linguistic scholarship.
- Guidelines for citing linguistic data to be shared in late 2019 with linguistics journal editors and stylesheet curators.
- The open-access Open Handbook of Linguistic Data Management (MIT Press Open, est. publication date 2020).
With this presentation, we aim to encourage practitioners in other fields to initiate similar advancements, and to encourage decision-makers and publishers to actively collaborate with and support scholar-led initiatives working toward better research practices.
Berez-Kroeker, Andrea L., Lauren Gawne, Susan Kung, Barbara Kelly, Tyler Heston, Gary Holton, Peter Pulsifer, David Beaver, Shobhana Chelliah, Stanley Dubinsky, Richard Meier, Nicholas Thieberger, Keren Rice & Anthony Woodbury. 2018. Reproducible research in linguistics: A position statement on data citation and attribution in our field. Linguistics 56(1): 1–18. https://doi.org/10.1515/ling-2017-0032
Data Citation Synthesis Group. 2014. Joint Declaration of Data Citation Principles. Martone M. (ed.). San Diego CA: FORCE11. https://doi.org/10.25490/a97f-egyk
Copyright (c) 2019 Helene N. Andreassen, Andrea Berez-Kroeker, Lauren Collister, Philipp Conzett, Christopher Cox, Koenraad De Smedt, Lauren Gawne, Bradley McDonnell
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).