Current state of open source research data management systems

Anusha Ranganathan; Richard Jones; Steven Eardley

doi:10.7557/5.8172

Authors

Anusha Ranganathan Cottage Labs
Richard Jones Cottage Labs
Steven Eardley Cottage Labs

DOI:

https://doi.org/10.7557/5.8172

Keywords:

Research Data Management, RDM, Open-Source Repositories, Data repositories, Invenio, Hyrax, Data Repositories landscape, Research Infrastructure

Abstract

We will explore the current landscape of open-source research data management systems, focusing on platforms such as Invenio and Hyrax. Our workshop will enable users to interact with Invenio RDM and Samvera Hyrax data repositories. We will together walk through core features that are most frequently requested by researchers and research data administrators. These include:

- Flexible importers and exporters to facilitate the large-scale ingestion and sharing of research data—both in terms of individual experiment size and volume of experiments. - Customizable workflows to support data review and publication processes. - Version control to track changes and maintain data integrity. - Granular authorization and authentication mechanisms to manage access rights. - Support for persistent identifiers - Metadata capture using a variety of metadata schemas. - Advanced search capabilities and the ability to view data directly within the system.

We will also delve into emerging, nuanced features that are becoming increasingly important in modern research workflows, such as:

- Offline data capture and seamless integration with the central data management system. - Support for archiving data not intended for publication, along with intuitive interfaces for managing such content. - Soft-delete functionality, where deleted data is moved to a temporary bin and later reviewed for either permanent archival or tombstoning. - Need for external data reviews, with strict access controls - enabled by repository-native support of Notify protocols and signposting, enhancing collaboration with external agencies and systems.

Finally, we will discuss the shortcomings of current research data management systems, including:

- Scalability challenges, such as limited support for diverse storage backends and declining performance with increased user and data volume. - Lack of versatile data viewers, especially for complex or domain-specific data types. - Insufficient emphasis on user experience (UX) across interfaces. - Limited integration of AI capabilities to assist in metadata extraction, data understanding, and intelligent presentation to end users. - A narrow focus on data storage and access, without sufficient support for data analysis and reproducibility. Current systems often manage only the data that has already been used for analysis, rather than assisting with the analytical process itself. Future systems will need to support rich metadata capture around data acquisition systems and analysis methods, and provide tools that facilitate replication and validation of research results.

Author Biographies

Anusha Ranganathan, Cottage Labs

Anusha Ranganathan is a Partner at Cottage Labs LLP and a software developer with extensive experience spanning design, development, and delivery. She focuses on understanding client needs, scoping requirements, and designing and developing effective software solutions. Anusha has particular expertise in implementing data repositories for large institutions, especially using Samvera Hyrax. Before joining Cottage Labs in 2015, she worked as a software developer and team leader at the Bodleian Digital Library Systems and Services, where she contributed to a range of projects and later concentrated on the Oxford Research Archive.
Richard Jones, Cottage Labs

Richard has been working in and around Higher Education since 2001 with Digital Repositories and Research Information Systems, with extensive experience in building and deploying systems and on low-level integrations. He has worked at a number of large HE institutions and commercial organisations in the HE space, prior to founding Cottage Labs in 2011. His work at Cottage Labs includes leading the implementations of large Data Repository systems, management and contribution to open standards (especially the SWORD protocol for which he is the nominal technical lead), and he specialises in data management, analytics and visualisation.
Steven Eardley, Cottage Labs

Steve is a Partner in Cottage Labs, a UK-based software consultancy with expertise in data repositories and bespoke software for academia. He has over 10 years' DevOps experience; architecting, deploying, and maintaining global services and technology infrastructure such as the Directory of Open Access Journals and numerous InvenioRDM instances. He specialises in deployment automation and scalable infrastructure.