Reproducible Research with End-to-end Machine Inference Using Deep Learning and Bayesian Statistics
By Junpeng Lao
The conventional statistical inference based on hypothesis testing and p-value is fundamentally flawed. The general practice of data analysis involves too many post hoc decisions makings based on p-value, which unavoidably violates the assumptions of frequentist statistics, and worse, leaks to p-hacking and "garden-of-forking-paths". It is especially true for data that requires multiple preprocessing steps. For example, it had been observed that virtually no two fMRI papers contain identical analysis pipeline. Indeed, researchers often facing many arbitrary choices such as what kind of smoothing should be used, which algorithm and what parameter set should be applied as motion correction, etc.
With the advance in deep learning and computational Bayesian approach, it is easier than ever to fit (very) large model and apply the Box's loop to perform model criticism and inference. It is thus possible to apply an end-to-end inference framework, package the whole data analysis and statistical inference workflow into a pipeline. It allows researchers to quantitatively evaluate the human part in data analysis (e.g., different parameter setting), by directly modelling these arbitrary choices as hyperparameters in the model using Bayesian inferences. It is also possible to develop many pipelines with different models and parameters, then evaluated these pipelines using cross-validation and prediction, etc. Combined with open data policy, this framework can greatly improve research reproducibility.