Monitoring Open Science beyond publications

Datasets and software as research products to be shared

Authors

DOI:

https://doi.org/10.7557/5.7077

Keywords:

French Open Science Monitor, research data, open data, open science, metrics, AI, artificial intelligence

Abstract

Watch VIDEO.

Since 2018, the French Open Science Monitor (BSO) has assessed the effectiveness of the national public policy in open science. This steering tool, developed by the French Ministry of Higher Education and Research, the University of Lorraine and Inria, measures the evolution of open science in France using reliable, open and controlled data updated every year. The result is a website presenting different dashboards, tracking for example the ratio of open access scientific publications by year, discipline or publisher.

Since its last release in March 2023, the BSO also tracks the production and openness of research datasets and software mentioned in scientific publications on a national scale. To ensure a realistic coverage, our platform relies on large-scale open source Deep Learning techniques applied to the full texts of publications with at least one co-author with a French affiliation.

DataStet identifies every mention of datasets in scholarly publications, including implicit mentions of datasets and explicitly named datasets. SoftCite recognizes any software mentions in scientific publications, using as training data the Softcite Dataset. Dataset and software mentions are then characterized automatically as used, created and shared by the research work described in the scientific document. These characterizations can be cumulative. Among 1,608,839 publications from our corpus, we were able to analyze 655,954 of them with our tool DataStet. For this subset, we found 6,511,998 mentions of datasets characterized as used, 330,062 mentions characterized as created, and 78,178 mentions characterized as shared.

With this methodology, the BSO can offer new indicators about the proportion of French publications mentioning the usage, creation and sharing of data, as well as the proportion of publications in France that include a "Data Availability Statement". Similar indicators are dedicated to code and software. In addition, these indicators are further broken down into disciplines, publishers and institutions.

The project is addressing major technical and organizational challenges: to identify French datasets and software without reference registries as for publications, thanks to artificial intelligence; to produce relevant indicators for the different scientific communities. As an enabling technology to identify research datasets and software, deep learning plays a crucial role. This presentation will be an opportunity to present the latest results of the project, to detail the methodology, and finally to underline the reusability of the project results.

Metrics

Metrics Loading ...

Author Biographies

Laetitia Bracco, University of Lorraine

Laetitia Bracco is a library curator at the University of Lorraine since 2019. She works in the Research Support Mission of the University Libraries. She is heading the research data management support services, as well as the Bibliometrics pole. At the national level, she leads the Open Science Working Group for research data at the Couperin Consortium and the French Open Science Barometer project dedicated to datasets and software.

Anne L'Hôte, French Ministry of Higher Education and Research

Anne L’Hôte is a data engineer at the French Ministry for Higher Education and Research. She joined the French Open Science Barometer in 2021 to imagine and build its future.

Additional Files

Published

2023-09-14

How to Cite

Bracco, L., & L'Hôte, A. (2023). Monitoring Open Science beyond publications: Datasets and software as research products to be shared. Septentrio Conference Series, (1). https://doi.org/10.7557/5.7077