Monitoring Open Science beyond publications
Datasets and software as research products to be shared
DOI:
https://doi.org/10.7557/5.7077Keywords:
French Open Science Monitor, research data, open data, open science, metrics, AI, artificial intelligenceAbstract
Watch VIDEO.
Since 2018, the French Open Science Monitor (BSO) has assessed the effectiveness of the national public policy in open science. This steering tool, developed by the French Ministry of Higher Education and Research, the University of Lorraine and Inria, measures the evolution of open science in France using reliable, open and controlled data updated every year. The result is a website presenting different dashboards, tracking for example the ratio of open access scientific publications by year, discipline or publisher.
Since its last release in March 2023, the BSO also tracks the production and openness of research datasets and software mentioned in scientific publications on a national scale. To ensure a realistic coverage, our platform relies on large-scale open source Deep Learning techniques applied to the full texts of publications with at least one co-author with a French affiliation.
DataStet identifies every mention of datasets in scholarly publications, including implicit mentions of datasets and explicitly named datasets. SoftCite recognizes any software mentions in scientific publications, using as training data the Softcite Dataset. Dataset and software mentions are then characterized automatically as used, created and shared by the research work described in the scientific document. These characterizations can be cumulative. Among 1,608,839 publications from our corpus, we were able to analyze 655,954 of them with our tool DataStet. For this subset, we found 6,511,998 mentions of datasets characterized as used, 330,062 mentions characterized as created, and 78,178 mentions characterized as shared.
With this methodology, the BSO can offer new indicators about the proportion of French publications mentioning the usage, creation and sharing of data, as well as the proportion of publications in France that include a "Data Availability Statement". Similar indicators are dedicated to code and software. In addition, these indicators are further broken down into disciplines, publishers and institutions.
The project is addressing major technical and organizational challenges: to identify French datasets and software without reference registries as for publications, thanks to artificial intelligence; to produce relevant indicators for the different scientific communities. As an enabling technology to identify research datasets and software, deep learning plays a crucial role. This presentation will be an opportunity to present the latest results of the project, to detail the methodology, and finally to underline the reusability of the project results.
Metrics
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Laetitia Bracco, Anne L'Hôte
This work is licensed under a Creative Commons Attribution 4.0 International License.