CODECHECK: An open-science initiative to facilitate sharing of computer programs and results presented in scientific publications

Stephen Eglen; Daniel  Nüst

doi:10.7557/5.4910

Authors

Stephen Eglen University of Cambridge https://orcid.org/0000-0001-8607-8025
Daniel Nüst University of Münster https://orcid.org/0000-0002-0024-5046

DOI:

https://doi.org/10.7557/5.4910

Abstract

Watch the VIDEO.

Analysis of data and computational modelling is central to most scientific disciplines. The underlying computer programs are complex and costly to design. However, these computational techniques are rarely checked during review of the corresponding papers, nor shared upon publication. Instead, the primary method for sharing data and computer programs today is for authors to state "data available upon reasonable request", although the actual code and data is the only sufficiently detailed description of a computational workflow that allows reproduction and reuse. Despite best intentions, these programs and data can quickly disappear from laboratories. Furthermore, there is a reluctance to share: only 8% of papers in recent top-tier AI conferences shared code relating to their publications (Gundersen et al. 2018). This low-rate of code sharing is seen in other fields, e.g. computational physics (Stodden et al. 2018). Given that code and data are rich digital artefacts that can be shared relatively easily, and that funders and journal publishers increasingly mandate sharing of resources, we should be sharing more and follow best practices for data and software publication. The permanent archival of valuable code and datasets would allow other researchers to make use of these resources in their work, and improve the reliability of reporting as well as the quality of tools.

We are building a computational platform, called CODECHECK (http://www.codecheck.org.uk), to enhance the availability, discovery and reproducibility of published computational research. Researchers that provide code and data will have their code independently run to ensure the computational parts of a workflow can be reproduced. The results from our independent run will then be shared freely post-publication in an open repository. The reproduction is attributed to the person perfoming the check. Our independent runs will act as a "certificate of reproducible computation". These certificates will be of use to several parties at different times during the generation of a scientific publication.

Prior to peer review, the researchers themselves can check that their code runs on a separate platform.
During peer review, editors and reviewers can check if the figures in the certificate match those presented in manuscripts for review without cumbersome download and installation procedures.
Once published, any interested reader can download the software and even data that was used to generate the results shown in the certificate.

The code and results from papers are shared according to the principles we recently outlined (Eglen et al. 2017). To ensure our system scales to large numbers of papers and is trustworthy, our system will be as automated as possible, fully open itself, and rely on open source software and open scholarly infrastructure. This presentation will discuss the challenges faced to date in building the system and in connecting it with existing peer-review principles, and plans for links with open access journals.

Acknowledgements

This work has been funded by the UK Software Sustainability Institute, a Mozilla Open Science Mini grant and the German Research Foundation (DFG) under project number PE 1632/17-1.

Metrics

PDF views

611

|

Twitter

24

|

Author Biographies

Stephen Eglen, University of Cambridge

SJE is a Reader in Computational Neuroscience, in the Department of Applied Mathematics and Theoretical Physics, University of Cambridge. He has a long-standing interest in open science and reproducible research.

Daniel Nüst, University of Münster

DN is a researcher at Spatio-temporal Modelling Lab at the Institute for Geoinformatics (ifgi) at the University of Münster. He works on open tools to enable computational reproducibility in the project Opening Reproducible Research (o2r, <https://o2r.info/about/>).

References

Gundersen OE, Gil Y, Aha DW (2018) On Reproducible AI: Towards Reproducible Research, Open Science, and Digital Scholarship in AI Publications. AIMag 39:56–68 [Article]. https://www.aaai.org/ojs/index.php/aimagazine/article/view/2816

Stodden V, Krafczyk MS, Bhaskar A (2018) Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility. In: Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems, 3. ACM. [Article] https://dl.acm.org/citation.cfm?doid=3214239.3214242

Eglen SJ, Marwick B, Halchenko YO, Hanke M, Sufi S, Gleeson P, Angus Silver R, Davison AP, Lanyon L, Abrams M, Wachtler T, Willshaw DJ, Pouzat C, Poline J-B (2017) Toward standard practices for sharing computer code and programs in neuroscience. Nat Neurosci 20:770–773 [Article]. https://doi.org/10.1038/nn.4550