When to worry about negative weights in MVA training for HEP analysis
A discussion at DS@LHC2015 concerned the treatment of negative weights in MVA training. This is relevant for HEP since the advent of MC generators at NLO in QCD, as the matching of Matrix-Element to Parton-Shower generators assigns negative weights to a fraction of the events with non-uniform probability.
MVA experts remarked the technical difficulty of sign-aware training, although preliminary procedures exist for some tools.
When should a HEP analyst worry?
Treating all events as positive is a mismodeling; that does not make MVA outcome "incorrect", but possibly sub-optimal (similarly to using fast detector simulations for training and GEANT for testing, as O(10%)-level mismodelings are usually tolerated in the former). Systematic uncertainties (e.g., QCD scale) may lead to worse mismodeling.
Before investing time on addressing negative weights treatment, an analyst should check if ignoring the sign would lead to large mismodeling in some important kinematic region, with "large" defined as "larger than any systematic uncertainty"; e.g., by comparing the normalized distributions of positive-only and negative-only events for the most discriminating MVA inputs: if the difference is larger than between typical systematic variations, it is legitimate to worry. Alternatively, after a sign-blind training, this test can be done on the MVA output.