Create standalone simulation tools to facilitate collaboration between HEP and machine learning community
By Kyle Cranmer, Tim Head, jean-roch vlimant, Vladimir Gligorov, Maurizio Pierini, Gilles Louppe, Andrey Ustyuzhanin, Balázs Kégl, Peter Elmer, Juan Pavez, Amir Farbin, Sergei Gleyzer, Steven Schramm, Lukas Heinrich, Michael Williams, Christian Lorenz Müller, Daniel Whiteson, Peter Sadowski, Pierre Baldi
Discussions at recent workshops have made it clear that one of the key barriers to collaboration between high energy physics and the machine learning community is access to training data. Recent successes in data sharing through the HiggsML and Flavours of Physics Kaggle challenges have borne much fruit, but required significant effort to coordinate.
While static simulated datasets are useful for challenges, in the course of investigating new machine learning techniques it is advantageous to be able to generate training data on demand (e.g. Refs. 1, 2, 3 ).
Therefore we recommend efforts be made to produce the ingredients required to facilitate such collaboration:
Kyle Cranmer, Tim Head, jean-roch vlimant, Vladimir Gligorov, Maurizio Pierini, Gilles Louppe, Andrey Ustyuzhanin, Balázs Kégl, Peter Elmer, Juan Pavez, Amir Farbin, Sergei Gleyzer, Steven Schramm, Lukas Heinrich, Michael Williams, Christian Lorenz Müller, Daniel Whiteson, Peter Sadowski, Pierre Baldi
Published: 26 Feb, 2016
Comment from Sebastien Binet, who is having technical problems:
C++ frameworks are notoriously difficult to compile, install and
distribute (and are a pain to setup and/or time consuming b/c one has
to track hidden dependencies, find the right compiler, etc...)
python frameworks are relatively easy to deploy (pip install foo,
conda install bar, etc...) but slow (and I don't think there is a
python(2|3) framework that does (fast) simulation, because python.
Kyle Cranmer · 31 Mar, 2016
perhaps consider a Go-based (fast) simulation application? (such as,
e.g., fads)
Go packages are easy to install (go get github.com/foo) and Go
binaries are fast.
Also: what is the exchange data format in vogue in ML? ARFF? CSV? NPy? HDF5?
Kyle Cranmer · 31 Mar, 2016