Current state of open source research data management systems
DOI:
https://doi.org/10.7557/5.8172Keywords:
Research Data Management, RDM, Open-Source Repositories, Data repositories, Invenio, Hyrax, Data Repositories landscape, Research InfrastructureAbstract
We will explore the current landscape of open-source research data management systems, focusing on platforms such as Invenio and Hyrax. Our workshop will enable users to interact with Invenio RDM and Samvera Hyrax data repositories. We will together walk through core features that are most frequently requested by researchers and research data administrators. These include:
- Flexible importers and exporters to facilitate the large-scale ingestion and sharing of research data—both in terms of individual experiment size and volume of experiments. - Customizable workflows to support data review and publication processes. - Version control to track changes and maintain data integrity. - Granular authorization and authentication mechanisms to manage access rights. - Support for persistent identifiers - Metadata capture using a variety of metadata schemas. - Advanced search capabilities and the ability to view data directly within the system.
We will also delve into emerging, nuanced features that are becoming increasingly important in modern research workflows, such as:
- Offline data capture and seamless integration with the central data management system. - Support for archiving data not intended for publication, along with intuitive interfaces for managing such content. - Soft-delete functionality, where deleted data is moved to a temporary bin and later reviewed for either permanent archival or tombstoning. - Need for external data reviews, with strict access controls - enabled by repository-native support of Notify protocols and signposting, enhancing collaboration with external agencies and systems.
Finally, we will discuss the shortcomings of current research data management systems, including:
- Scalability challenges, such as limited support for diverse storage backends and declining performance with increased user and data volume. - Lack of versatile data viewers, especially for complex or domain-specific data types. - Insufficient emphasis on user experience (UX) across interfaces. - Limited integration of AI capabilities to assist in metadata extraction, data understanding, and intelligent presentation to end users. - A narrow focus on data storage and access, without sufficient support for data analysis and reproducibility. Current systems often manage only the data that has already been used for analysis, rather than assisting with the analytical process itself. Future systems will need to support rich metadata capture around data acquisition systems and analysis methods, and provide tools that facilitate replication and validation of research results.
References
Published
Issue
Section
License
Copyright (c) 2025 Anusha Ranganathan, Richard Jones, Steven Eardley

This work is licensed under a Creative Commons Attribution 4.0 International License.