Algorithms learning for large scale facilities

A large scale facility can be described as an object producing, as an output, datasets `D_i`, that scientists analyse to obtain results `R_i`. The ideal data analysis trajectory for an experiment is thus `D_i-->R-i-->P_i`, where `P_i` denotes the desired output: a publication. Most of the tim...

By Alberto Cereser

Detecting the delimiter in CSVs that lie

File extensions for data sharing sometimes lie about their contents. Here is an algorithm to infer the actual delimiter of a CSV, TSV or any related format: - Assume that alpha-numeric characters (A-Z, a-z, 0-9) and the period/full stop (.) are cannot be delimiters. - Begin with input te...

By Tim McNamara