DataverseNO: A National, Generic Repository and its Contribution to the Increased FAIRness of Data from the Long Tail of Research
DOI:
https://doi.org/10.7557/15.5514Keywords:
research data, data management, data stewardship, data curation, FAIR Data Principles, FAIRification, long tail, Dataverse, trusted repositories, sustainability, business models, open science, open dataAbstract
Research data repositories play a crucial role in the FAIR (Findable, Accessible, Interoperable, Reusable) ecosystem of digital objects. DataverseNO is a national, generic repository for open research data, primarily from researchers affiliated with Norwegian research organizations. The repository runs on the open-source software Dataverse. This article presents the organization and operation of DataverseNO, and investigates how the repository contributes to the increased FAIRness of small and medium sized research data. Sections 1 to 3 present background information about the FAIR Data Principles (section 1), how FAIR may be turned into reality (section 2), and what these principles and recommendations imply for data from the so-called long tail of research, i.e. small and medium-sized datasets that are often heterogenous in nature and hard to standardize (section 3). Section 4 gives an overview of the key organizational features of DataverseNO, followed by an evaluation of how well DataverseNO and the repository application Dataverse as such support the FAIR Data Principles (section 5). Section 6 discusses how sustainable and trustworthy the repository is. The article is rounded up in section 7 by a brief summary including a look into the future of the repository.References
Application programming interface. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Application_programming_interface&oldid=958345761
Arlitsch, K., & Grant, C. (2018). Why So Many Repositories? Examining the Limitations and Possibilities of the Institutional Repositories Landscape. Journal of Library Administration, 58(3), 264–281. https://doi.org/10.1080/01930826.2018.1436778
B2FIND. (n.d.). Retrieved 21 May 2020, from http://b2find.eudat.eu/
BASE (Bielefeld Academic Search Engine). (n.d.). Retrieved 21 May 2020, from https://www.base-search.net/
Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world (pp. XXV, 383). The MIT Press.
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. (n.d.). Retrieved 23 May 2020, from https://creativecommons.org/publicdomain/zero/1.0/
Christian, T.-M., Gooch, A., Vision, T., & Hull, E. (2020). Journal data policies: Exploring how the understanding of editors and authors corresponds to the policies themselves. PLOS ONE, 15(3), e0230281. https://doi.org/10.1371/journal.pone.0230281
CLARIN Virtual Language Observatory. (n.d.). Retrieved 21 May 2020, from https://vlo.clarin.eu/
Conzett, P. (2019). Disciplinary Case Study: The Tromsø Repository of Language and Linguistics (TROLLing). https://doi.org/10.5281/zenodo.2668775
Conzett, P. (2020). Research Data Publishing at UiT The Arctic University of Norway (Version 1) [Dataset]. DataverseNO. https://doi.org/10.18710/JWTJJB
Conzett, P., & Østvand, L. (2018). Støttetenester for forskingsdatahandtering på UiT Noregs arktiske universitet – erfaringar og forslag til beste praksis. Nordic Journal of Information Literacy in Higher Education, 10(1), 65–80. https://doi.org/10.15845/noril.v10i1.283
CoreTrustSeal. (n.d.). Retrieved 21 May 2020, from https://www.coretrustseal.org/
Crosas, M. (2020). Fair Principles and Beyond: Implementation in Dataverse. Septentrio Conference Series, 2, Article 2. https://doi.org/10.7557/5.5334
Crosas, M., Gautier, J., Karcher, S., Kirilova, D., Otalora, G., & Schwartz, A. (2018). Data policies of highly-ranked social science journals [Preprint]. SocArXiv. https://doi.org/10.31235/osf.io/9h7ay
CURL. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=CURL&oldid=954043706
Data Documentation Initiative (DDI). (n.d.). Retrieved 23 May 2020, from https://ddialliance.org/
DataCite. (n.d.). [Website]. Retrieved 23 May 2020, from https://schema.datacite.org/
DataCite Search. (n.d.). Retrieved 21 May 2020, from https://search.datacite.org/
Dataverse. (n.d.). Retrieved 21 May 2020, from https://dataverse.org/home
Dataverse Metadata References. (n.d.). Dataverse. Retrieved 23 May 2020, from http://guides.dataverse.org/en/latest/user/appendix.html
DataverseNO Curator Guidelines. (n.d.). Info: DataverseNO. Retrieved 21 May 2020, from https://site.uit.no/dataverseno/admin-en/curatorguide/
DataverseNO Deposit Guidelines. (n.d.). Info: DataverseNO. Retrieved 21 May 2020, from https://site.uit.no/dataverseno/deposit/
DataverseNO Metadata Harvesting. (n.d.). Info: DataverseNO. Retrieved 21 May 2020, from https://site.uit.no/dataverseno/about/#metadata-harvesting
DataverseNO Policy Framework. (n.d.). Info: DataverseNO. Retrieved 21 May 2020, from https://site.uit.no/dataverseno/about/policy-framework/
Dubline Core. (n.d.). Retrieved 23 May 2020, from https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
Durand, G. (2020). Dataverse’s Approach to Technical Community Engagement. Septentrio Conference Series, 2, Article 2. https://doi.org/10.7557/5.5424
European Commission. (n.d.). European Open Science Cloud (EOSC). Retrieved 4 April 2020, from https://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud
European Commission. (2018). Turning FAIR into reality: Final report and action plan from the European Commission expert group on FAIR data. Publications Office of the European Union. https://op.europa.eu/s/n1Yo
FAIRsFAIR. (n.d.). Retrieved 21 May 2020, from https://www.fairsfair.eu/
FITS (Flexible Image Transport System). (n.d.). Retrieved 21 May 2020, from https://fits.gsfc.nasa.gov/
Gammeltoft, P. (2019). The place-name Elverhøy in Norway (Version 1) [Dataset]. DataverseNO. https://doi.org/10.18710/OG9ARD
GÉANT, & UNINETT. (2019, May). Why TROLLing is the thing to do for linguists. In The Field. https://www.inthefieldstories.net/why-trolling-is-the-thing-to-do-for-linguists/
Google Dataset Search. (n.d.). Retrieved 21 May 2020, from https://datasetsearch.research.google.com/
Heidorn, P. B. (2008). Shedding Light on the Dark Data in the Long Tail of Science. Library Trends, 57(2), 280–299. https://doi.org/10.1353/lib.0.0036
Hypertext Transfer Protocol. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Hypertext_Transfer_Protocol&oldid=957536773
Jacobsen, A., de Miranda Azevedo, R., Juty, N., Batista, D., Coles, S., Cornet, R., Courtot, M., Crosas, M., Dumontier, M., Evelo, C. T., Goble, C., Guizzardi, G., Hansen, K. K., Hasnain, A., Hettne, K., Heringa, J., Hooft, R. W. W., Imming, M., Jeffery, K. G., … Schultes, E. (2019). FAIR Principles: Interpretations and Implementation Considerations. Data Intelligence, 2(1–2), 10–29. https://doi.org/10.1162/dint_r_00024
JSON-LD. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=JSON-LD&oldid=956136847
Linked data. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Linked_data&oldid=951149328
Maslow, A. H. (1943). A theory of human motivation. Psychological Review, 50(4), 370–396. https://doi.org/10.1037/h0054346
Neylon, C. (2017). Compliance Culture or Culture Change? The role of funders in improving data management and sharing practice amongst researchers. Research Ideas and Outcomes, 3, e14673. https://doi.org/10.3897/rio.3.e14673
Nosek, B. (n.d.). Shifting Incentives from Getting It Published to Getting it Right. Retrieved 4 April 2020, from https://osf.io/bxjta/
OECD. (2007). OECD Principles and Guidelines for Access to Research Data from Public Funding. OECD Publishing. https://doi.org/10.1787/9789264034020-en-fr.
OECD. (2017a). Business models for sustainable research data repositories. OECD Science, Technology and Industry Policy Papers, 47. https://doi.org/10.1787/302b12bb-en
OECD. (2017b). Co-ordination and support of international research data networks. OECD Science, Technology and Industry Policy Papers, 51. https://doi.org/10.1787/e92fa89e-en
re3data.org. (2015). TROLLing; editing status 2020-04-07. Re3data.Org - Registry of Research Data Repositories. https://doi.org/10.17616/R3834T
re3data.org. (2017). DataverseNO; editing status 2020-04-07. Re3data.Org - Registry of Research Data Repositories. https://doi.org/10.17616/R3TV17
Representational state transfer. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Representational_state_transfer&oldid=956443795
Rogers, E. M. (2003). Diffusion of innovations (5th ed., pp. XXI, 551). Free Press.
Rsync. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Rsync&oldid=956572441
Schema.org. (n.d.). Retrieved 23 May 2020, from https://schema.org/
Secure Shell. (2020). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Secure_Shell&oldid=957079117
The e-IRG Task Force on the Long Tail of Data. (2016). Long Tail of Data (Version 1.74, E-IRG Task Force Document). e-IRG. http://e-irg.eu/documents/10920/238968/LongTailOfData2016.pdf
The Global Dataverse Community Consortium. (n.d.). Retrieved 21 May 2020, from http://dataversecommunity.global/home
The Social Sciences & Humanities Open Cloud. (n.d.). Retrieved 21 May 2020, from https://www.sshopencloud.eu/
Tierney, N. J., & Ram, K. (2020). A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility. ArXiv:2002.11626 [Cs]. http://arxiv.org/abs/2002.11626
UiT Open Research Data. (n.d.). DataverseNO. Retrieved 21 May 2020, from https://opendata.uit.no/
W3C. (n.d.). PROV-Overview. Retrieved 21 May 2020, from https://www.w3.org/TR/prov-overview/
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18