Reweighting methods
This page describes the reweighting backends implemented in mcreweight and
how each method computes new MC event weights. Where relevant, differences with
hep_ml.reweight are noted.
Overview
mcreweight exposes nine user-facing training modes. They fall into four
main families:
hep_ml-native methodsGB: direct use ofhep_ml.reweight.GBReweighter.Folding: direct use ofhep_ml.reweight.FoldingReweighteraroundGB.
- ONNX-exportable gradient-boosting methods
ONNXGB: custom tree-based reweighter that reproduces the signed-weight logic ofhep_mlwhile remaining exportable to ONNX.ONNXFolding: K-fold ensemble ofONNXGBmodels.
- Iterative classifier-ratio methods
XGB: iterative reweighter that trains anxgboost.XGBClassifierat each stage and converts classifier probabilities into multiplicative weight updates.XGBFolding: K-fold ensemble ofXGBmodels.NN: iterative reweighter that uses asklearn.neural_network.MLPClassifierat each stage.NNFolding: K-fold ensemble ofNNmodels.
- Histogram method
Bins: N-dimensional histogram ratio reweighter with neighbor smoothing.
Quick selection guide:
GB/Folding: closest to the originalhep_mlpackage;ONNXGB/ONNXFolding: same boosting logic ashep_mlbut ONNX-exportable;XGB/NN: iterative classifier-ratio correction;XGBFolding/NNFolding/ONNXFolding: K-fold variants to reduce bias;Bins: non-parametric histogram-ratio baseline, best for low dimensions.
All methods follow the same high-level workflow:
split MC and data into training and testing subsets;
fit the selected reweighter on the training subset;
predict new MC weights;
optionally clip very large predicted weights to the 99th percentile (see below);
save both the trained model and the produced weight arrays.
Clipping behavior differs by method. For GB, ONNXGB, Bins,
GBFolding, and ONNXFolding, clipping is applied only when
--clip-weights (YAML: reweighting.clip_weights) is enabled.
For XGB, NN, XGBFolding, and NNFolding, clipping is always
applied as part of the iterative update.
The training entry points live in src/mcreweight/train.py and the ONNX-based
implementations live in src/mcreweight/models/onnxreweighter.py and
src/mcreweight/models/onnxfolding.py.
Method-by-method behavior
GB
GB is a thin wrapper around hep_ml.reweight.GBReweighter. All loss and
tree-update logic comes from hep_ml; the trained object is serialized with
joblib and weights are predicted via hep_ml’s own predict_weights.
Use this when compatibility with the original hep_ml implementation is the
primary requirement.
ONNXGB
ONNXGB reimplements the GBReweighter logic with plain scikit-learn
regression trees so that every stage can be exported to ONNX. It is not a
generic classifier-to-ratio method: it mirrors the signed-weight boosting
strategy of hep_ml directly.
At each stage, MC and data are concatenated, a regression tree is fit on signed
residuals (MC label 1, data label 0, with per-class weight normalization),
and the leaf values are replaced with the log ratio of target to original
weighted occupancies. The final event weight is original_weight * exp(score).
The leaf update is regularized as follows:
where lambda is loss_regularization. Adding lambda prevents infinite
updates in empty or nearly empty leaves and keeps the correction well-behaved.
The key differences from the other methods:
vs.
GB: same intent, different implementation —ONNXGBuses scikit-learn trees instead of the externalhep_mlestimator, enabling ONNX export;vs.
XGB/NN: keeps the signed-weight boosting logic rather than converting classifier probabilities into log-ratio updates.
XGB
XGB estimates the density ratio between data and MC through an iterative
sequence of binary classifiers, rather than reproducing the hep_ml loss.
A single classifier often captures only the dominant separation; by refitting
after each weight update, the method progressively corrects the residual
mismatch in the already-reweighted sample.
At each iteration \(t\):
MC events carry their current weights \(w_t(x)\);
data events keep fixed target weights;
an
xgboost.XGBClassifieris trained to distinguish MC from data;its output probability \(p_t(x)\) for the MC class is converted into a log-ratio correction;
MC weights are updated multiplicatively.
The stage update is
followed by clipping and learning-rate damping:
and the final weights are
where eta = mixing_learning_rate, c = clip_delta, and
m = max_log_weight.
Intuitively, \(\delta_t(x)\) is positive when the classifier finds the event
more data-like (weight should increase) and negative when it finds it more
MC-like (weight should decrease). The learning rate eta and clip bounds
prevent any single stage from making an extreme correction.
At each stage scale_pos_weight is updated to reflect the current weighted
class balance, and negative training weights are clipped to zero for
estimator compatibility.
NN
NN uses exactly the same iterative log-ratio update as XGB, with an
sklearn.neural_network.MLPClassifier as the stage classifier instead of
XGBClassifier. All clipping and damping parameters work identically.
If the installed scikit-learn version does not accept sample_weight in
MLPClassifier, the implementation falls back to unweighted stage fits and
prints a warning. Use this method when smooth, non-tree decision boundaries are
preferred.
Bins
Bins computes the density ratio as a direct N-dimensional histogram ratio
in transformed feature space:
fit the configured feature transform on the combined MC+data sample;
define per-variable bin edges from target-data quantiles;
fill weighted MC and data histograms;
smooth both histograms by averaging with immediate neighbors;
compute
H_data / H_mcwith epsilon regularization to avoid division by zero;assign each event the ratio value of its bin.
This is the most transparent method in the package. Because bin counts grow exponentially with the number of dimensions, it is only reliable for a small number of training variables. In practice it is strongest in one or two dimensions, can still be useful up to roughly four with enough population, and should otherwise be treated as a rough baseline rather than the default choice.
Folding variants
The Folding variants (Folding, ONNXFolding, XGBFolding,
NNFolding) wrap a base reweighter in a K-fold procedure. Each fold is
trained on n_folds - 1 subsets and applied to the held-out subset, so that
every event receives a weight from a model that was not trained on it. This
reduces the bias that arises when weights are predicted on the same data used
for training.
The folding variants differ in how fold predictions are aggregated:
hep_mlfolding (Folding)Delegates to
hep_ml.reweight.FoldingReweighter; predictions are effectively out-of-fold when the same dataset is passed back in order.mcreweightONNX folding (ONNXFolding,XGBFolding,NNFolding)Trains one model per fold and combines predictions across folds. Available aggregation modes:
weighted_geometric(default): geometric mean weighted by the inverse of each fold’s validation error;geometric: unweighted geometric mean;median: per-event median across folds.
Data visualization and diagnostics
The training and application pipelines produce a set of standard plots under
plots/. These figures are meant to answer slightly different questions:
are MC and data already mismatched before training;
does reweighting improve the agreement on the variables used for training;
does the improvement transfer to variables that were not used for training;
are the learned weights numerically well behaved;
can an independent classifier still distinguish reweighted MC from data;
where in phase space the remaining mismodelling is concentrated;
which input variables drive the learned correction.
Input and monitoring distributions
The one-dimensional histogram outputs are the most direct validation plots.
input_features_training.pngandinput_features_testing.pngThese show the distributions of the training variables before reweighting, separately for the train and test splits. They are the baseline mismatch plots. Large pull structures here indicate the differences that the reweighter is expected to learn.
input_features_training_transformed.pngandinput_features_testing_transformed.pngThese show the same variables after the optional preprocessing transform (for example
yeo-johnsonorquantile). They are useful to verify what representation the ONNX-capable methods actually see during training.other_vars_training.pngandother_vars_testing.pngThese correspond to the monitoring variables, called
other_varsin the output filenames. They are not used to train the reweighter. Instead they are held out as a transfer test: if reweighting improves these variables too, the correction is more likely to reflect genuine phase-space mismodelling rather than simple overfitting of the training inputs.input_features_<method>_weighted.pngThese show the training variables after applying the weights predicted by a given method. This is the main post-training check. A good result is one in which the reweighted MC moves closer to the data histogram and the pull panel becomes more centered around zero.
other_vars_<method>_weighted.pngThese show the same post-training comparison for the monitoring variables. Improvements here are especially informative because these variables were not part of the direct optimization target.
When applying an already trained model, the corresponding output names are
input_features_reweighted.png and other_vars_reweighted.png. They play
the same role, but now for the separately processed output sample.
Correlation matrices
corr_mc.png and corr_data.png display the pairwise correlation matrices
of the training variables before reweighting.
These plots are useful because one-dimensional agreement is not enough: two samples can match marginal distributions and still differ strongly in their joint structure. The correlation matrices give a compact first view of whether important linear relationships differ between MC and data before training.
Weight distributions
weight_distributions.png shows the distribution of the predicted event
weights for each trained method.
This plot is primarily a stability diagnostic:
a narrow distribution centered near one usually indicates a mild correction;
a broad tail can be acceptable, but may signal that the method must strongly upweight a small region of phase space;
extremely long tails or spikes at very large weights are warning signs for statistical instability and for downstream analyses that reuse the weights.
Classifier-based diagnostics
Several plots are built from a fresh classifier trained after reweighting to separate reweighted MC from data. These are not the reweighters themselves. They are a common external probe of how distinguishable the two samples remain.
roc_curve.pngThis shows the ROC curve of that diagnostic classifier for each method. If reweighting is effective, the classifier should struggle to separate the two samples, and the curve should move closer to the diagonal. Equivalently, the AUC should move closer to 0.5.
classifier_output.pngThis shows the classifier-score distributions for reweighted MC and for data. It is often easier to interpret than the ROC curve because it directly shows whether the diagnostic classifier assigns similar scores to both samples. The plot also reports a weighted KS statistic, which summarizes the mismatch between the two score distributions.
The term “output distribution” in this context therefore refers to the distribution of the diagnostic classifier output score, not to the final physics variables themselves.
2D score and pull maps
score_map_<method>.pngThis plot shows the mean diagnostic-classifier score in two-dimensional bins of all pairs of training variables. It answers the question: in which regions of phase space does the diagnostic classifier still find the reweighted MC more MC-like or more data-like? Structured hot spots indicate localized residual mismodelling even when one-dimensional projections look acceptable.
pull_map_<method>.pngThis plot shows the two-dimensional pull,
\[\frac{\rho_{\mathrm{data}} - \rho_{\mathrm{MC}}} {\sqrt{\sigma^2_{\mathrm{data}} + \sigma^2_{\mathrm{MC}}}},\]in bins of every pair of training variables. A value near zero means local agreement within uncertainty, while large positive or negative values point to regions where the reweighted MC is still under- or over-populated with respect to data.
The difference between the two diagnostics is:
the score map is classifier-based and tells you where residual separation is still easy for a learned discriminator;
the pull map is histogram-based and tells you where the weighted local event densities still disagree.
SHAP feature-importance plots
feature_importance_<method>.png shows SHAP summary values for non-folding
methods when shap: true is enabled.
SHAP stands for SHapley Additive exPlanations. In this context it measures how much each input variable contributes to the model’s predicted log weight for an event, relative to a reference expectation.
The SHAP beeswarm plot should be read as follows:
each point is one event;
the horizontal position is the SHAP value, meaning the signed contribution of that feature to increasing or decreasing the predicted log weight;
the color encodes whether the feature value itself is low or high;
features higher in the plot have larger overall impact on the model output.
These plots do not by themselves tell you whether a model is “good” or “bad”. They tell you which variables the reweighter is using most strongly to build its correction and in which direction they influence the learned weights.
Loss function and update mechanics
Two distinct loss families are used across the methods.
hep_ml-style signed boosting
Used by GB and reimplemented by ONNXGB.
The goal is to fit an additive model for \(\log(p_{\text{data}}/p_{\text{MC}})\), the logarithm of the density ratio. At each stage:
event weights are normalized separately per class;
the current event importance is updated as
sample_weight * exp(y * score), whereyis+1for MC and-1for data;trees are fit on the absolute normalized weights;
leaf values are rewritten from the ratio of target to original weighted occupancies.
The tree structure captures where the samples differ; the leaf rewrite converts that structure into a direct density-ratio correction.
Classifier-ratio iterative updates
Used by XGB and NN.
Instead of a custom boosting loss, these methods solve a weighted binary classification problem between MC and data at each stage and convert the classifier output into a log density-ratio estimate. The sign is intuitive:
p(x) > 0.5for the MC class → event looks too MC-like → weight decreases;p(x) < 0.5for the MC class → event looks more data-like → weight increases.
The three numerical controls in the update equations serve distinct purposes:
clip_delta: prevents any single stage from making an overconfident jump;max_log_weight: caps the total accumulated log-weight globally;mixing_learning_rate: dampens each stage correction to stabilize training.
Because each stage is trained on the currently reweighted MC, it targets only the residual mismatch left by previous updates, rather than re-learning the same dominant discrepancy.
Validation and early stopping
The iterative ONNX methods (ONNXGB, XGB, NN) add a validation loop
that is not part of the original hep_ml API.
At each stage, the mean weighted Kolmogorov-Smirnov distance across all
training variables is computed on a held-out validation subset. Training stops
early when this mean KS fails to improve for reweight_early_stopping_rounds
consecutive checks.
This provides a physics-motivated stopping criterion: the model stops when it no longer reduces observable MC-to-data mismatches, not just when classifier loss plateaus.
Optuna hyperparameter optimization
When n_trials > 0, mcreweight runs an Optuna study before the final
training step, supporting GB, ONNXGB, XGB, and NN.
For each trial, the package trains the candidate reweighter, predicts new MC
weights, then measures how well a fresh classifier can still separate the
reweighted MC from data. The objective is the AUC of that diagnostic classifier:
lower is better, since a well-reweighted sample should be harder to distinguish
from data. Studies are run with Optuna’s TPE sampler (seed=42).
The sampler is Optuna’s TPE sampler with seed=42, and the study direction
is minimize.
Cached studies
Optuna studies are cached under weightsdir as:
optuna_study_<classifier_type>_<flattened_training_vars>.pkl
If that file already exists, the study is loaded instead of recomputed.
Seed trials
Before optimization starts, one manually chosen initial trial is enqueued:
GBgb_n_estimators=100,gb_learning_rate=0.1,gb_max_depth=5,min_samples_leaf=200,subsample=1.0ONNXGBgb_n_estimators=100,gb_learning_rate=0.1,gb_max_depth=4,min_samples_leaf=200,loss_regularization=5.0,subsample=1.0XGBn_iterations=5,mixing_learning_rate=0.1,xgb_learning_rate=0.1,max_depth=6,subsample=0.9,reg_alpha=1.0,reg_lambda=5.0NNn_iterations=5,mixing_learning_rate=0.1,hidden1=64,hidden2=32,alpha=1e-4,nn_learning_rate_init=1e-3,batch_size=1024
Search spaces
The current Optuna intervals are:
GB search space
gb_n_estimators: integer in[50, 150]with step10gb_learning_rate: log-uniform float in[0.05, 0.3]gb_max_depth: integer in[3, 8]with step1min_samples_leaf: integer in[200, 1200]with step200subsample: float in[0.3, 1.0]with step0.1
ONNXGB search space
gb_n_estimators: integer in[50, 150]with step10gb_learning_rate: log-uniform float in[0.05, 0.3]gb_max_depth: integer in[3, 8]with step1min_samples_leaf: integer in[200, 1200]with step200loss_regularization: log-uniform float in[1.0, 20.0]subsample: float in[0.3, 1.0]with step0.1
XGB base-estimator search space
xgb_learning_rate: log-uniform float in[0.05, 0.3]max_depth: integer in[4, 8]with step1subsample: float in[0.6, 1.0]with step0.1reg_alpha: float in[0.0, 5.0]with step0.5reg_lambda: float in[1.0, 10.0]with step1
NN base-estimator search space
hidden1: integer in[32, 128]with step16hidden2: integer in[16, 64]with step16alpha: log-uniform float in[1e-6, 1e-2]nn_learning_rate_init: log-uniform float in[1e-4, 5e-3]batch_size: categorical choice among256,512,1024max_iter: integer in[50, 180]with step10
How tuned parameters are reused
After the study finishes, the final training functions read study.best_params
and map them onto the concrete training backends:
GBuses the tuned boosting parameters directly inhep_ml.reweight.GBReweighter;ONNXGBruns its own native Optuna objective withONNXGBReweighterand reuses the tuned tree/update parameters directly in the final ONNX-exportable training pass;XGBcombines tuned iterative parameters with tuned XGBoost base-estimator parameters inONNXIXGBReweighter;NNcombines tuned iterative parameters with tuned MLP base-estimator parameters inONNXINNReweighter.
Feature transformations
All custom ONNX-capable methods can apply an optional feature transform before training:
quantile;yeo-johnson;signed-log;scaler.
The transform is always fitted once on the combined MC+data sample, then reused for both training and inference. This is important because it prevents the MC and data samples from being mapped into different feature spaces.
Main differences with hep_ml
Relative to the algorithms documented at
hep_ml.reweight, the
main differences in mcreweight are:
GBandFoldingare directhep_mlwrappers, whileONNXGB,XGB,NN, and the ONNX folding classes are package-native implementations.ONNXGBaims at behavioral compatibility withhep_ml.GBReweighterbut is implemented with exportable stage trees so the trained model can be served through ONNX Runtime.XGBandNNare nothep_mlalgorithms. They use iterative classifier-based log-ratio updates instead of the custom signed boosting loss described forGBReweighterinhep_ml.Binsis conceptually close tohep_ml.BinsReweighterbut the smoothing implementation differs.hep_mldocuments a Gaussian filter, whilemcreweightuses repeated averaging with immediate neighbors.The ONNX folding classes use built-in fold scoring and support weighted geometric aggregation. The
hep_mlfolding interface instead exposes a user-provided vote function.The iterative ONNX methods add validation-driven early stopping based on the mean weighted KS distance across features. This is not part of the
hep_ml.reweightpage API.mcreweightstandardizes model persistence across methods and saves ONNX-exported stage models for deployment, which is outside the scope of thehep_mlreweighter documentation.
Which method to use
As a rule of thumb:
use
GBif you want the closest behavior to the originalhep_mlimplementation;use
ONNXGBif you want similar boosting logic but need ONNX export;use
XGBif you want a powerful tree-based iterative classifier reweighter;use
NNif a neural iterative classifier is a better inductive bias for the problem;use folding variants when you will predict on the same sample used for training and want less biased event-by-event weights;
use
Binsonly for low-dimensional problems where interpretability matters more than flexibility.