Ideas tagged with data-manipulation

Detecting the delimiter in CSVs that lie

File extensions for data sharing sometimes lie about their contents. Here is an algorithm to infer the actual delimiter of a CSV, TSV or any related format: - Assume that alpha-numeric characters (A-Z, a-z, 0-9) and the period/full stop (.) are cannot be delimiters. - Begin with input te...

By Tim McNamara

Algorithms learning for large scale facilities

A large scale facility can be described as an object producing, as an output, datasets `D_i`, that scientists analyse to obtain results `R_i`. The ideal data analysis trajectory for an experiment is thus `D_i-->R-i-->P_i`, where `P_i` denotes the desired output: a publication. Most of the tim...

By Alberto Cereser