By Stephen Turner, VP Nagraj

R packages are collections of functions, documentations, tests, sample data, and dependency declarations, which come together to promote modularity, reproducibility, and adherence to good coding practices. Containerization using technologies such as Docker allow for building a specific computing environment with all required dependencies, configuration files, and system libraries needed to run any computing process reproducibly and at scale. Further, Docker allows computing environments to be specified in code, such that the instructions for building the infrastructure itself can be specified in code. Many data science and bioinformatics tasks involve running some domain-specific tools for initial analysis, followed by postprocessing using R. The rpdd package provides a demonstration on how to create a Docker image with an embedded R package and domain-specific tools. The rpdd package comes with a build script, which first builds the R package, then builds a Docker image containing that package, along with domain-specific tools and dependencies. When the container is instantiated, it runs a script that uses domain-specific tools to preprocess input data, and runs functions from the R package to postprocess the output from the first step. Code and further documentation is available at https://github.com/stephenturner/rpdd.

Comments

Please log in to add a comment.
Authors

Stephen Turner, VP Nagraj

Metadata

Zenodo.7662885

Published: 2 Feb, 2023

Cc by