The Journal of Brief Ideas

By Alberto Cereser

A large scale facility can be described as an object producing, as an output, datasets D_i, that scientists analyse to obtain results R_i. The ideal data analysis trajectory for an experiment is thus D_i-->R-i-->P_i, where P_i denotes the desired output: a publication.

Most of the times, something goes wrong along the way: there is no time to analyse the data, the analysis is harder than expected... To speed up the process and make it more efficient, I suggest using a new approach, including machine learning methods. Once developed, it could be applied to various industries.

With time, a large set of algorithms A_ij is developed to analyse the data collected at the considered large scale facility. I propose to profile the algorithms, translating them to a high-level formal language, so to create a platform that

If the data collected during an experiment is similar to a previous dataset, suggests in a concise format which steps to follow.
If the experiment is new, it searches how similar data have been treated, and uses machine learning techniques to suggest a possible data analysis approach.

The system would help to speed up data analysis both for well developed and new techniques.