Meeting the challenges of reproducibility and transparency in analyses of cohort and registry data with open-source software: examples from the PsychGen Centre for Genetic Epidemiology and Mental Health

Authors

DOI:

https://doi.org/10.7557/5.8265

Keywords:

open source software, reproducibility, transparency

Abstract

Modern day research -- particularly among the Scandinavian countries --  increasingly relies on data from a combination of large cohort studies and nationwide registry sources. These data are extremely valuable, with vast potential for analysis. Researchers can investigate new questions, revisit old ones using innovative methods, and seek to replicate or extend existing findings, and more. This cumulative reuse maximises the return on public investment and on the active, voluntary contributions of cohort participants. However, to fully realise this potential, researchers must have not only access to the data, but also tools and practices that facilitate efficient, robust, and transparent preparation and usage of it.

Here, we present and describe two examples of such tools. The phenotools R package is an open source software package designed to facilitate efficient and reproducible use of data from the Norwegian Mother, Father and Child Cohort study sample (MoBa). The regtools R package is also an open source software designed to facilitate transparent and reproducible diagnostic trend and prevalence analysis. It utilises  data from the Norwegian Patient Registry (NPR) with stratification according to linked demographic data using microdata from other Norwegian health and administrative registers, including information like income and education.

The motivation for developing these packages will be presented alongside an overview of their contribution to facilitating replicable and transparent science, illustrated with real-life use cases and examples. As developers and researchers, we reflect on the process of creating these tools for use by our scientific peers and outline our understanding of how open source software can be a flexible solution to many of the reproducibility and transparency challenges currently facing research.

Author Biographies

  • Alejandra Martinez Sanchez, Norwegian Institute of Public Health

    Alejandra Martinez Sanchez is a PhD Fellow at the PsychGen Centre for Genetic Epidemiology and Mental Health, at the Norwegian Institute of Public Health. Her research focusses on using high dimentional data and registry linkage to explore mental health diagnostic trends in Norway.

  • Laurie Hannigan, Norwegian Institute of Public Health

    Laurie Hannigan is a senior researcher based at the Lovisenberg Diaconal Hospital and the Norwegian Institute of Public Health (NIPH), Oslo, Norway. He completed an undergraduate degree in Psychology at the University of Southampton, in the UK, followed by a master’s in Social, Genetic, and Developmental Psychiatry at King’s College London. He obtained his PhD in Behavior Genetics from King’s in 2018 under the supervision of Prof. Thalia Eley and Dr. Tom McAdams. After a short postdoctoral position at the University of Glasgow’s Institute of Health and Wellbeing in 2018, he moved to Oslo to focus on genetic epidemiological work with the Norwegian Mother, Father, and Child Cohort study (MoBa). He now co-leads the Psychiatric Genetic Epidemiology (PaGE) research group at Lovisenberg Diaconal Hospital and is a member of the PsychGen Centre for Genetic Epidemiology and Mental Health at NIPH. He also holds an honorary research associate position at the MRC Intergrative Epidemiology Unit at the University of Bristol. His research interests include studying within-family transmission of risk for psychiatric disorders, the aetiology and development of emotional and behavioural problems, factors influencing the emergence of neurodevelopmental conditions, patterns and consequences of comorbidity and multimorbidity, and methodological issues in the application of developmental genetic epidemiological approaches to birth cohort and population registry data sources.

References

Hannigan, L., Corfield, E., Askelund, A., Askeland, R., Hegemann, L., Jensen, P., Pettersen, J., Rayner, C., Ayorech, Z., & Bakken, N. (2021). phenotools: An R package to facilitate efficient and reproducible use of phenotypic data from MoBa and linked registry sources in the TSD environment. https://doi.org/10.17605/OSF.IO/6G8BJ

Martinez Sanchez, A., Pettersen, J., Bang, L., Bjuland, K., Scheiene, M., Aase, H., & Havdahl, A. (2025). Thematic Issue of the Public Health Report 2025 – Mental Health of Children and Adolescents (p. 85). Norwegian Institute of Public Health. https://www.fhi.no/contentassets/b5b3603ec4794c5cb0c8651589b359f8/temautgave-barn-og-unges-psykiske-helse_2025.pdf

Downloads

Published

2025-09-16

How to Cite

Martinez Sanchez, A., & Hannigan, L. (2025). Meeting the challenges of reproducibility and transparency in analyses of cohort and registry data with open-source software: examples from the PsychGen Centre for Genetic Epidemiology and Mental Health. Septentrio Conference Series, (2). https://doi.org/10.7557/5.8265