CODECHECK: An open-science initiative to facilitate sharing of computer programs and results presented in scientific publications
Analysis of data and computational modelling is central to most scientific disciplines. The underlying computer programs are complex and costly to design. However, these computational techniques are rarely checked during review of the corresponding papers, nor shared upon publication. Instead, the primary method for sharing data and computer programs today is for authors to state "data available upon reasonable request", although the actual code and data is the only sufficiently detailed description of a computational workflow that allows reproduction and reuse. Despite best intentions, these programs and data can quickly disappear from laboratories. Furthermore, there is a reluctance to share: only 8% of papers in recent top-tier AI conferences shared code relating to their publications (Gundersen et al. 2018). This low-rate of code sharing is seen in other fields, e.g. computational physics (Stodden et al. 2018). Given that code and data are rich digital artefacts that can be shared relatively easily, and that funders and journal publishers increasingly mandate sharing of resources, we should be sharing more and follow best practices for data and software publication. The permanent archival of valuable code and datasets would allow other researchers to make use of these resources in their work, and improve the reliability of reporting as well as the quality of tools.
We are building a computational platform, called CODECHECK (http://www.codecheck.org.uk), to enhance the availability, discovery and reproducibility of published computational research. Researchers that provide code and data will have their code independently run to ensure the computational parts of a workflow can be reproduced. The results from our independent run will then be shared freely post-publication in an open repository. The reproduction is attributed to the person perfoming the check. Our independent runs will act as a "certificate of reproducible computation". These certificates will be of use to several parties at different times during the generation of a scientific publication.
- Prior to peer review, the researchers themselves can check that their code runs on a separate platform.
- During peer review, editors and reviewers can check if the figures in the certificate match those presented in manuscripts for review without cumbersome download and installation procedures.
- Once published, any interested reader can download the software and even data that was used to generate the results shown in the certificate.
The code and results from papers are shared according to the principles we recently outlined (Eglen et al. 2017). To ensure our system scales to large numbers of papers and is trustworthy, our system will be as automated as possible, fully open itself, and rely on open source software and open scholarly infrastructure. This presentation will discuss the challenges faced to date in building the system and in connecting it with existing peer-review principles, and plans for links with open access journals.
This work has been funded by the UK Software Sustainability Institute, a Mozilla Open Science Mini grant and the German Research Foundation (DFG) under project number PE 1632/17-1.
Gundersen OE, Gil Y, Aha DW (2018) On Reproducible AI: Towards Reproducible Research, Open Science, and Digital Scholarship in AI Publications. AIMag 39:56–68 [Article]. https://www.aaai.org/ojs/index.php/aimagazine/article/view/2816
Stodden V, Krafczyk MS, Bhaskar A (2018) Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility. In: Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems, 3. ACM. [Article] https://dl.acm.org/citation.cfm?doid=3214239.3214242
Eglen SJ, Marwick B, Halchenko YO, Hanke M, Sufi S, Gleeson P, Angus Silver R, Davison AP, Lanyon L, Abrams M, Wachtler T, Willshaw DJ, Pouzat C, Poline J-B (2017) Toward standard practices for sharing computer code and programs in neuroscience. Nat Neurosci 20:770–773 [Article]. https://doi.org/10.1038/nn.4550
Copyright (c) 2019 Stephen Eglen, Daniel Nüst
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).