DataverseNO: A National, Generic Repository and its Contribution to the Increased FAIRness of Data from the Long Tail of Research

Keywords: research data, data management, data stewardship, data curation, FAIR Data Principles, FAIRification, long tail, Dataverse, trusted repositories, sustainability, business models, open science, open data

Abstract

Research data repositories play a crucial role in the FAIR (Findable, Accessible, Interoperable, Reusable) ecosystem of digital objects. DataverseNO is a national, generic repository for open research data, primarily from researchers affiliated with Norwegian research organizations. The repository runs on the open-source software Dataverse. This article presents the organization and operation of DataverseNO, and investigates how the repository contributes to the increased FAIRness of small and medium sized research data. Sections 1 to 3 present background information about the FAIR Data Principles (section 1), how FAIR may be turned into reality (section 2), and what these principles and recommendations imply for data from the so-called long tail of research, i.e. small and medium-sized datasets that are often heterogenous in nature and hard to standardize (section 3). Section 4 gives an overview of the key organizational features of DataverseNO, followed by an evaluation of how well DataverseNO and the repository application Dataverse as such support the FAIR Data Principles (section 5). Section 6 discusses how sustainable and trustworthy the repository is. The article is rounded up in section 7 by a brief summary including a look into the future of the repository.

Author Biography

Philipp Conzett, UiT The Arctic University of Norway
Cand.philol. in Nordic Linguistics; library research and publishing support, University Library

References

Application programming interface. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Application_programming_interface&oldid=958345761

Arlitsch, K., & Grant, C. (2018). Why So Many Repositories? Examining the Limitations and Possibilities of the Institutional Repositories Landscape. Journal of Library Administration, 58(3), 264–281. https://doi.org/10.1080/01930826.2018.1436778

B2FIND. (n.d.). Retrieved 21 May 2020, from http://b2find.eudat.eu/

BASE (Bielefeld Academic Search Engine). (n.d.). Retrieved 21 May 2020, from https://www.base-search.net/

Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world (pp. XXV, 383). The MIT Press.

CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. (n.d.). Retrieved 23 May 2020, from https://creativecommons.org/publicdomain/zero/1.0/

Christian, T.-M., Gooch, A., Vision, T., & Hull, E. (2020). Journal data policies: Exploring how the understanding of editors and authors corresponds to the policies themselves. PLOS ONE, 15(3), e0230281. https://doi.org/10.1371/journal.pone.0230281

CLARIN Virtual Language Observatory. (n.d.). Retrieved 21 May 2020, from https://vlo.clarin.eu/

Conzett, P. (2019). Disciplinary Case Study: The Tromsø Repository of Language and Linguistics (TROLLing). https://doi.org/10.5281/zenodo.2668775

Conzett, P. (2020). Research Data Publishing at UiT The Arctic University of Norway (Version 1) [Dataset]. DataverseNO. https://doi.org/10.18710/JWTJJB

Conzett, P., & Østvand, L. (2018). Støttetenester for forskingsdatahandtering på UiT Noregs arktiske universitet – erfaringar og forslag til beste praksis. Nordic Journal of Information Literacy in Higher Education, 10(1), 65–80. https://doi.org/10.15845/noril.v10i1.283

CoreTrustSeal. (n.d.). Retrieved 21 May 2020, from https://www.coretrustseal.org/

Crosas, M. (2020). Fair Principles and Beyond: Implementation in Dataverse. Septentrio Conference Series, 2, Article 2. https://doi.org/10.7557/5.5334

Crosas, M., Gautier, J., Karcher, S., Kirilova, D., Otalora, G., & Schwartz, A. (2018). Data policies of highly-ranked social science journals [Preprint]. SocArXiv. https://doi.org/10.31235/osf.io/9h7ay

CURL. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=CURL&oldid=954043706

Data Documentation Initiative (DDI). (n.d.). Retrieved 23 May 2020, from https://ddialliance.org/

DataCite. (n.d.). [Website]. Retrieved 23 May 2020, from https://schema.datacite.org/

DataCite Search. (n.d.). Retrieved 21 May 2020, from https://search.datacite.org/

Dataverse. (n.d.). Retrieved 21 May 2020, from https://dataverse.org/home

Dataverse Metadata References. (n.d.). Dataverse. Retrieved 23 May 2020, from http://guides.dataverse.org/en/latest/user/appendix.html

DataverseNO Curator Guidelines. (n.d.). Info: DataverseNO. Retrieved 21 May 2020, from https://site.uit.no/dataverseno/admin-en/curatorguide/

DataverseNO Deposit Guidelines. (n.d.). Info: DataverseNO. Retrieved 21 May 2020, from https://site.uit.no/dataverseno/deposit/

DataverseNO Metadata Harvesting. (n.d.). Info: DataverseNO. Retrieved 21 May 2020, from https://site.uit.no/dataverseno/about/#metadata-harvesting

DataverseNO Policy Framework. (n.d.). Info: DataverseNO. Retrieved 21 May 2020, from https://site.uit.no/dataverseno/about/policy-framework/

Dubline Core. (n.d.). Retrieved 23 May 2020, from https://www.dublincore.org/specifications/dublin-core/dcmi-terms/

Durand, G. (2020). Dataverse’s Approach to Technical Community Engagement. Septentrio Conference Series, 2, Article 2. https://doi.org/10.7557/5.5424

European Commission. (n.d.). European Open Science Cloud (EOSC). Retrieved 4 April 2020, from https://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud

European Commission. (2018). Turning FAIR into reality: Final report and action plan from the European Commission expert group on FAIR data. Publications Office of the European Union. https://op.europa.eu/s/n1Yo

FAIRsFAIR. (n.d.). Retrieved 21 May 2020, from https://www.fairsfair.eu/

FITS (Flexible Image Transport System). (n.d.). Retrieved 21 May 2020, from https://fits.gsfc.nasa.gov/

Gammeltoft, P. (2019). The place-name Elverhøy in Norway (Version 1) [Dataset]. DataverseNO. https://doi.org/10.18710/OG9ARD

GÉANT, & UNINETT. (2019, May). Why TROLLing is the thing to do for linguists. In The Field. https://www.inthefieldstories.net/why-trolling-is-the-thing-to-do-for-linguists/

Google Dataset Search. (n.d.). Retrieved 21 May 2020, from https://datasetsearch.research.google.com/

Heidorn, P. B. (2008). Shedding Light on the Dark Data in the Long Tail of Science. Library Trends, 57(2), 280–299. https://doi.org/10.1353/lib.0.0036

Hypertext Transfer Protocol. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Hypertext_Transfer_Protocol&oldid=957536773

Jacobsen, A., de Miranda Azevedo, R., Juty, N., Batista, D., Coles, S., Cornet, R., Courtot, M., Crosas, M., Dumontier, M., Evelo, C. T., Goble, C., Guizzardi, G., Hansen, K. K., Hasnain, A., Hettne, K., Heringa, J., Hooft, R. W. W., Imming, M., Jeffery, K. G., … Schultes, E. (2019). FAIR Principles: Interpretations and Implementation Considerations. Data Intelligence, 2(1–2), 10–29. https://doi.org/10.1162/dint_r_00024

JSON-LD. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=JSON-LD&oldid=956136847

Linked data. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Linked_data&oldid=951149328

Maslow, A. H. (1943). A theory of human motivation. Psychological Review, 50(4), 370–396. https://doi.org/10.1037/h0054346

Neylon, C. (2017). Compliance Culture or Culture Change? The role of funders in improving data management and sharing practice amongst researchers. Research Ideas and Outcomes, 3, e14673. https://doi.org/10.3897/rio.3.e14673

Nosek, B. (n.d.). Shifting Incentives from Getting It Published to Getting it Right. Retrieved 4 April 2020, from https://osf.io/bxjta/

OECD. (2007). OECD Principles and Guidelines for Access to Research Data from Public Funding. OECD Publishing. https://doi.org/10.1787/9789264034020-en-fr.

OECD. (2017a). Business models for sustainable research data repositories. OECD Science, Technology and Industry Policy Papers, 47. https://doi.org/10.1787/302b12bb-en

OECD. (2017b). Co-ordination and support of international research data networks. OECD Science, Technology and Industry Policy Papers, 51. https://doi.org/10.1787/e92fa89e-en

re3data.org. (2015). TROLLing; editing status 2020-04-07. Re3data.Org - Registry of Research Data Repositories. https://doi.org/10.17616/R3834T

re3data.org. (2017). DataverseNO; editing status 2020-04-07. Re3data.Org - Registry of Research Data Repositories. https://doi.org/10.17616/R3TV17

Representational state transfer. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Representational_state_transfer&oldid=956443795

Rogers, E. M. (2003). Diffusion of innovations (5th ed., pp. XXI, 551). Free Press.

Rsync. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Rsync&oldid=956572441

Schema.org. (n.d.). Retrieved 23 May 2020, from https://schema.org/

Secure Shell. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Secure_Shell&oldid=957079117

The e-IRG Task Force on the Long Tail of Data. (2016). Long Tail of Data (Version 1.74, E-IRG Task Force Document). e-IRG. http://e-irg.eu/documents/10920/238968/LongTailOfData2016.pdf

The Global Dataverse Community Consortium. (n.d.). Retrieved 21 May 2020, from http://dataversecommunity.global/home

The Social Sciences & Humanities Open Cloud. (n.d.). Retrieved 21 May 2020, from https://www.sshopencloud.eu/

Tierney, N. J., & Ram, K. (2020). A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility. ArXiv:2002.11626 [Cs]. http://arxiv.org/abs/2002.11626

UiT Open Research Data. (n.d.). DataverseNO. Retrieved 21 May 2020, from https://opendata.uit.no/

W3C. (n.d.). PROV-Overview. Retrieved 21 May 2020, from https://www.w3.org/TR/prov-overview/

Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18

Published
2020-05-31