When to worry about negative weights in MVA training for HEP analysis
A discussion at DS@LHC2015 concerned the treatment of negative weights in MVA training. This is relevant for HEP since the advent of MC generators at NLO in QCD, as the matching of Matrix-Element to Parton-Shower generators assigns negative weights to a fraction of the events with non-uniform probability.
MVA experts remarked the technical difficulty of sign-aware training, although preliminary procedures exist for some tools.
When should a HEP analyst worry?
Treating all events as positive is a mismodeling; that does not make MVA outcome "incorrect", but possibly sub-optimal (similarly to using fast detector simulations for training and GEANT for testing, as O(10%)-level mismodelings are usually tolerated in the former). Systematic uncertainties (e.g., QCD scale) may lead to worse mismodeling.
Before investing time on addressing negative weights treatment, an analyst should check if ignoring the sign would lead to large mismodeling in some important kinematic region, with "large" defined as "larger than any systematic uncertainty"; e.g., by comparing the normalized distributions of positive-only and negative-only events for the most discriminating MVA inputs: if the difference is larger than between typical systematic variations, it is legitimate to worry. Alternatively, after a sign-blind training, this test can be done on the MVA output.
Published: 7 Jan, 2016
In retrospect, I should have linked the agenda of the DS@LHC workshop, to make the context clear to visitors coming from the Journal itself and not from the DS@LHC proceedings. Once submitted, the article cannot be edited. (In case you, reader, are a future submitter of DS@LHC proceedings, I suggest to take that into account...)
Anyway, here is the link:
Andrea Giammanco · 7 Jan, 2016
Andrea, this is a nice topic to sort out. One avenue comes from Section 6.4 of http://arxiv.org/abs/1506.02169 . In the context of +/- 1 weighted events, you can separate the distribution into the +1 & -1 weights, absorb the -1 into the mixture coefficient, and then train a classifier between examples that all have positive weights. This applies to any classification technique that produces an output that is 1-to-1 with the likelihood ratio (eg. the squared loss function)
Kyle Cranmer · 17 Feb, 2016