Encouraging reusability of computational research through Data-to-Knowledge Packages - A hydrological use case
- 1. 52°North Spatial Information Research, Münster, 48155, Germany
- 2. Institute of Aquatic Ecology, Agency of Daugavpils University, Riga, LV-1007, Latvia
- 3. Forschungsverbund Berlin e.V., Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Berlin, 12587, Germany
Description
The growing demand for reproducible research is based on the expectation that publishing research in this form will enable its reuse and the generation of new knowledge. However, reproducibility alone does not guarantee these benefits. Users still need to make considerable efforts to understand the data and analysis code before they can reuse these components in other contexts. To address this challenge, we introduce the Data-to-Knowledge Package (D2K-Package), a collection of research materials including source code and open FAIR data, virtual labs, web API services, and computational workflows. The D2K-Package's core is the reproducible basis composed of the data and source code on which an analysis is based. This core is designed such that the other components can be derived from it. The main goal of the package is to help researchers generate new knowledge by facilitating the understanding and encouraging the reuse of reproducible research. We demonstrate the applicability of the D2K-Package with a hydrological use case which can be also used for testing, and discuss its seamless integration into the research cycle.
Researchers often collect data and analyse it using appropriate methods in order to answer a specific research question. The research results are then published in scientific articles. If other researchers have access to the data and the analysis used and can achieve the same results as in the article, this is called "reproducible research". One of the main advantages of reproducible research is that other scientists can continue the work and reuse the materials. However, publishing the materials in a reproducible manner is a challenge for the developer of the analysis. In addition, reusing the material for others is a difficult task as the analysis can become very complex.
In this paper, we introduce the Data-to-Knowledge package (D2K-Package), which contains the data and analysis in a reproducible manner, as well as other components that help other researchers understand and reuse the analysis. One of these components is a workflow that visualizes the steps of the analysis pipeline as a first entry point. To demonstrate the feasibility of the D2K-Package, we applied the idea to a real use case in hydrology.
Files
openreseurope-5-22545.pdf
Files
(2.7 MB)
| Name | Size | Download all |
|---|---|---|
|
Checksum: md5:14dcc589bd90d13a63799a4e0f826a25
PID: http://hdl.handle.net/11304/3f451637-6e59-4937-84e7-f5b21f0e21f3 |
2.7 MB | Preview Download |
Additional details
References
- Tollefson J, Kozlov M, Witze A (2025). Trump's siege of science: how the first 30 days unfolded and what's next. Nature. doi:10.1038/d41586-025-00525-1
- Matthews D (2024). Far-right governments seek to cut billions of euros from research in Europe. Nature. doi:10.1038/d41586-024-03506-y
- Costello MJ (2009). Motivating online publication of data. BioScience. doi:10.1525/bio.2009.59.5.9
- Goodman SN, Fanelli D, Ioannidis JPA (2016). What does research reproducibility mean?. Sci Transl Med. doi:10.1126/scitranslmed.aaf5027
- Stodden V, Bailey DH, Borwein J (2013). Setting the default to reproducible: reproducibility in computational and experimental mathematics.
- Konkol M, Kray C, Pfeiffer M (2019). Computational reproducibility in geoscientific papers: insights from a series of studies with geoscientists and a reproduction study. Int J Geogr Inf Sci. doi:10.1080/13658816.2018.1508687
- Culina A, van den Berg I, Evans S (2020). Low availability of code in ecology: a call for urgent action. PLoS Biol. doi:10.1371/journal.pbio.3000763
- Hutson M (2018). Artificial Intelligence faces reproducibility crisis. Science. doi:10.1126/science.359.6377.725
- Baker M (2016). 1,500 scientists lift the lid on reproducibility. Nature. doi:10.1038/533452a
- McCullough BD, McGeary KA, Harrison TD (2008). Do economics journal archives promote replicable research?. Can J Econ. doi:10.1111/j.1540-5982.2008.00509.x
- Collberg C, Proebsting TA (2016). Repeatability in computer systems research. Commun ACM. doi:10.1145/2812803
- Gil Y, David CH, Demir I (2016). Toward the geoscience paper of the future: best practices for documenting and sharing research from data to software to provenance. Earth Space Sci. doi:10.1002/2015EA000136
- Bahaidarah L, Hung E, de Melo Oliveira AF (2022). Toward reusable science with readable code and reproducibility. International Conference on e-Science. doi:10.1109/eScience55777.2022.00079
- (null). Open science in horizon Europe.
- Stark PB (2018). Before reproducibility must come preproducibility. Nature. doi:10.1038/d41586-018-05256-0
- (2024). Trust but verify. Nat Mater. doi:10.1038/s41563-023-01790-z
- Wilkinson M, Dumontier M, Aalbersberg JJ (2016). The FAIR guiding principles for scientific data management and stewardship. Sci Data. doi:10.1038/sdata.2016.18
- Barker M, Hong NPC, Katz DS (2022). Introducing the FAIR Principles for research software. Sci Data. doi:10.1038/s41597-022-01710-x
- Sandve GK, Nekrutenko A, Taylor J (2013). Ten simple rules for reproducible computational research. PLoS Comput Biol. doi:10.1371/journal.pcbi.1003285
- Gentleman R, Lang DT (2007). Statistical analyses and reproducible research. J Comput Graph Stat. doi:10.1198/106186007X178663
- Chirigati F, Rampin R, Shasha D (2016). ReproZip: computational reproducibility with ease. doi:10.1145/2882903.2899401
- (2022). Seamless sharing and peer review of code. Nat Comput Sci. doi:10.1038/s43588-022-00388-w
- Alves AN, Oliveira MM, Koyama T (2022). Ecdysone coordinates plastic growth with robust pattern in the developing wing. eLife. doi:10.7554/eLife.72666
- Bechhofer S, Buchan I, De Roure D (2013). Why linked data is not enough for scientists. Future Gener Comput Syst. doi:10.1016/j.future.2011.08.004
- Soiland-Reyes S, Sefton P, Goble C (2022). Packaging research artefacts with RO-Crate. Data Sci. doi:10.3233/DS-210053
- (null). Knowledge package.
- Shakil A, Lutteroth C, Weber G (2019). CodeGazer: making code navigation easy and natural with Gaze input. doi:10.1145/3290605.3300306
- Nordmann E, McAleer P, Toivo W (2022). Data visualization using R for researchers who do not use R. Adv Methods Pract Psychol Sci. doi:10.1177/25152459221074654
- Di Tommaso P, Chatzou M, Floden EW (2017). Nextflow enables reproducible computational workflows. Nat Biotechnol. doi:10.1038/nbt.3820
- Mölder F, Jablonski KP, Letcher B (2021). Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. F1000Res. doi:10.12688/f1000research.29032.2
- (2024). The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res. doi:10.1093/nar/gkae410
- (2025). R: a language and environment for statistical computing.
- Nüst D, Konkol M, Schutzeichel M (2017). Opening the publication process with executable research compendia. D-Lib Magazine. doi:10.1045/january2017-nuest
- Nosek BA, Alter G, Banks GC (2015). SCIENTIFIC STANDARDS. Promoting an open research culture. Science. doi:10.1126/science.aab2374
- Higman R, Bangert D, Jones S (2019). Three camps, one destination: the intersections of research data management, FAIR and open. Insights: the UKSG Journal. doi:10.1629/uksg.468
- Hrynaszkiewicz I, Cockerill MJ (2012). Open by default: a proposed copyright license and waiver agreement for open access research and data in peer-reviewed journals. BMC Res Notes. doi:10.1186/1756-0500-5-494
- (null). Attribution-NoDerivatives 4.0 International.
- Piccolo SR, Frampton MB (2016). Tools and techniques for computational reproducibility. GigaScience. doi:10.1186/s13742-016-0135-4
- Garijo D, Kinnings S, Xie L (2013). Quantifying reproducibility in computational biology: the case of the tuberculosis drugome. PLoS One. doi:10.1371/journal.pone.0080278
- Aigars J, Suhareva N, Cepite-Frisfelde D (2024). From green to brown: two decades of darkening coastal water in the Gulf of Riga, the Baltic Sea. Front Mar Sci. doi:10.3389/fmars.2024.1369537
- Konkol M (2025). Encouraging reusability of computational research through Data-to-Knowledge Packages - A hydrological use case (Input data). Zenodo.
- (2025). Latvian Secchi depth and water colour.
- (2022). HELCOM subbasins with coastal WFD waterbodies or watertypes 2022 (level 4a).
- Merkel D (2014). Docker: lightweight Linux containers for consistent development and deployment. Linux J.
- Boettiger C (2015). An introduction to Docker for reproducible research. SIGOPS Oper Syst Rev. doi:10.1145/2723872.2723882
- Bussonnier M, Forde J, Freeman J (2018). Binder 2.0 - Reproducible, interactive, sharable environments for science at scale. Proceedings of the 17th Python in Science Conference. doi:10.25080/Majora-4af1f417-011
- (null). OGC API - Processes.
- Kralidis T, Webb B, Tzotsos A (2025). geopython/pygeoapi: 0.19.0 (0.19.0). Zenodo. doi:10.5281/zenodo.14592499
- La Rosa M (2025). Describo.
- (2022). The turing way: a handbook for reproducible, ethical and collaborative research (1.0.2). Zenodo. doi:10.5281/zenodo.7625728
- Alston JM, Rick JA (2021). A beginner's guide to conducting reproducible research. Bull Ecol Soc Am. doi:10.1002/bes2.1801
- Konkol M, Buurman M (2025). User guide: creating a Data-to-Knowledge package. Zenodo. doi:10.5281/zenodo.15772478
- Gronenschild EHBM, Habets P, Jacobs HIL (2012). The effects of FreeSurfer version, workstation type, and macintosh operating system version on anatomical volume and cortical thickness measurements. PLoS One. doi:10.1371/journal.pone.0038234
- Wickham H (2011). testthat: get started with testing. The R Journal.
- Group G (2017). GraphicsMagick image processing system.
- Di Cosmo R, Gruenpeter M, Zacchiroli S (2022). Identifiers for digital objects: the case of software source code preservation. doi:10.17605/OSF.IO/KDE56
- Labuce A, Konkol M, Buurman M (2025). Data-to-Knowledge package for a reproducible spatiotemporal trend detection analysis (1.4). Zenodo.
- Labuce A, Konkol M, Buurman M (2025). A toolbox for spatiotemporal trend detection analysis (1.0.1). Zenodo.
- Labuce A, Konkol M, Buurman M (2025). A workflow for spatiotemporal trend detection analysis (1.0.1). Zenodo.
- Konkol M (2025). AquaINFRA interaction platform v1.0.1 (1.0.1). Zenodo.
- Konkol M (2025). AquaINFRA OGC API processes to galaxy (1.0). Zenodo.