Meeting the challenges of reproducibility and transparency in analyses of cohort and registry data with open-source software: examples from the PsychGen Centre for Genetic Epidemiology and Mental Health
DOI:
https://doi.org/10.7557/5.8265Keywords:
open source software, reproducibility, transparencyAbstract
Modern day research -- particularly among the Scandinavian countries -- increasingly relies on data from a combination of large cohort studies and nationwide registry sources. These data are extremely valuable, with vast potential for analysis. Researchers can investigate new questions, revisit old ones using innovative methods, and seek to replicate or extend existing findings, and more. This cumulative reuse maximises the return on public investment and on the active, voluntary contributions of cohort participants. However, to fully realise this potential, researchers must have not only access to the data, but also tools and practices that facilitate efficient, robust, and transparent preparation and usage of it.
Here, we present and describe two examples of such tools. The phenotools R package is an open source software package designed to facilitate efficient and reproducible use of data from the Norwegian Mother, Father and Child Cohort study sample (MoBa). The regtools R package is also an open source software designed to facilitate transparent and reproducible diagnostic trend and prevalence analysis. It utilises data from the Norwegian Patient Registry (NPR) with stratification according to linked demographic data using microdata from other Norwegian health and administrative registers, including information like income and education.
The motivation for developing these packages will be presented alongside an overview of their contribution to facilitating replicable and transparent science, illustrated with real-life use cases and examples. As developers and researchers, we reflect on the process of creating these tools for use by our scientific peers and outline our understanding of how open source software can be a flexible solution to many of the reproducibility and transparency challenges currently facing research.
References
Hannigan, L., Corfield, E., Askelund, A., Askeland, R., Hegemann, L., Jensen, P., Pettersen, J., Rayner, C., Ayorech, Z., & Bakken, N. (2021). phenotools: An R package to facilitate efficient and reproducible use of phenotypic data from MoBa and linked registry sources in the TSD environment. https://doi.org/10.17605/OSF.IO/6G8BJ
Martinez Sanchez, A., Pettersen, J., Bang, L., Bjuland, K., Scheiene, M., Aase, H., & Havdahl, A. (2025). Thematic Issue of the Public Health Report 2025 – Mental Health of Children and Adolescents (p. 85). Norwegian Institute of Public Health. https://www.fhi.no/contentassets/b5b3603ec4794c5cb0c8651589b359f8/temautgave-barn-og-unges-psykiske-helse_2025.pdf
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Alejandra Martinez Sanchez, Laurie Hannigan

This work is licensed under a Creative Commons Attribution 4.0 International License.