API reference

This page documents the main public classes and training helpers exposed by mcreweight.

Pipelines

mcreweight.core.run_reweighting_pipeline(args, plotdir, weightsdir)

Main function to run the reweighting pipeline.

mcreweight.core.apply_weights_pipeline(args, plotdir, weightsdir, out_weightsdir)

Main function to apply weights to the data using the trained model.

Training helpers

mcreweight.train.train_and_test(mc, data, mcweights, sweights, columns, test_size)

Split the data into training and testing sets.

Args:

mc (pd.DataFrame): MC data. data (pd.DataFrame): Data to reweight to. mcweights (np.ndarray): Weights for the MC data. sweights (np.ndarray): Sweights for the data. columns (list): List of column names to use for training. test_size (float): Proportion of the dataset to include in the test split.

Returns:

sample (dict): Dictionary containing the training and testing splits for MC and data, along with their weights.

mcreweight.train.gbreweight(args, sample, columns, study, weightsdir)

Train a GradientBoostingClassifier reweighter and predict weights for the test MC data.

Args:

args: Command-line arguments containing configuration options. sample (dict): Dictionary containing training and test data along with their weights. columns (list): List of column names to use for training. study (optuna.study.Study): Optuna study object containing the best hyperparameters. weightsdir (str): Directory to save the model and weights.

mcreweight.train.onnxgbreweight(args, sample, columns, weightsdir, study=None)

Train an ONNX-exportable GB reweighter that mirrors hep_ml’s signed-weight loss.

mcreweight.train.xgbreweight(args, sample, columns, weightsdir, study=None)

Train an iterative XGBoost reweighter using best hyperparameters from Optuna, then predict weights on mc_test.

Args:

args: Command-line arguments containing configuration options. sample (dict): Dictionary containing training and test data along with their weights. columns (list): List of column names to use for training. study (optuna.study.Study): Optuna study object containing the best hyperparameters. weightsdir (str): Directory to save the model and weights.

mcreweight.train.nnreweight(args, sample, columns, weightsdir, study=None)

Train an iterative NN reweighter using best hyperparameters from Optuna, then predict weights on mc_test.

Args:

args: Command-line arguments containing configuration options. sample (dict): Dictionary containing training and test data along with their weights. columns (list): List of column names to use for training. study (optuna.study.Study): Optuna study object containing the best hyperparameters. weightsdir (str): Directory to save the model and weights.

Returns:

nn (ONNXINNReweighter): Trained NN reweighter model. new_mc_weights (np.ndarray): Predicted weights for the MC data using the trained NN reweighter.

mcreweight.train.binning_reweight(
args,
sample,
columns,
n_bins,
n_neighs,
weightsdir,
)

Train an ONNXBinsReweighter and predict weights for the test MC data. Mirrors the behavior of the original BinsReweighter version, with ONNX support.

Args:

args: Command-line arguments containing configuration options. sample (dict): Dictionary containing training and test data along with their weights. columns (list): List of column names to use for training. n_bins (int): Number of bins to use for the reweighter. n_neighs (float): Number of neighbors for smoothing (ignored in this function). weightsdir (str): Directory to save the model and weights.

mcreweight.train.gbfolding(args, gb, sample, columns, n_folds, weightsdir)

Train a folding reweighter using the base GB model, then predict weights on mc_test.

Args:

args: Command-line arguments containing configuration options. gb (GBReweighter): Base GB reweighter model to use for folding. sample (dict): Dictionary containing training and test data along with their weights. columns (list): List of column names to use for training. n_folds (int): Number of folds for k-folding. weightsdir (str): Directory to save the model and weights.

mcreweight.train.onnxfolding(args, onnxgb, sample, columns, n_folds, weightsdir)

Train an ONNXFoldingReweighter using the ONNXGB base model.

mcreweight.train.xgbfolding(
args,
xgb,
sample,
columns,
n_folds,
weightsdir,
n_iterations=15,
)

Train an ONNXIXGBFoldingReweighter using the base XGB model, then predict weights on mc_test.

Args:

args: Command-line arguments containing configuration options. xgb (ONNXIXGBReweighter): Base XGB reweighter model to use for folding. sample (dict): Dictionary containing training and test data along with their weights. columns (list): List of column names to use for training. n_folds (int): Number of folds for k-folding. weightsdir (str): Directory to save the model and weights. n_iterations (int): Number of iterations for the base XGB model.

mcreweight.train.nnfolding(
args,
nn,
sample,
columns,
n_folds,
weightsdir,
n_iterations=5,
)

Train an ONNXINNFoldingReweighter using the base NN model, then predict weights on mc_test.

Args:

args: Command-line arguments containing configuration options. nn (ONNXINNReweighter): Base NN reweighter model to use for folding. sample (dict): Dictionary containing training and test data along with their weights. columns (list): List of column names to use for training. n_folds (int): Number of folds for k-folding. weightsdir (str): Directory to save the model and weights. n_iterations (int): Number of iterations for the base NN model.

Main reweighter classes

class mcreweight.models.onnxreweighter.ONNXGBReweighter(
transform=None,
verbosity=1,
n_estimators=40,
learning_rate=0.2,
max_depth=3,
min_samples_leaf=200,
loss_regularization=5.0,
subsample=1.0,
min_samples_split=2,
max_features=None,
max_leaf_nodes=None,
splitter='best',
update_tree=True,
random_state=42,
store_dir=None,
eps=1e-06,
)

ONNX-exportable implementation of hep_ml’s GBReweighter logic.

This mirrors the signed-weight handling of hep_ml’s ReweightLossFunction:
  • tree targets are class/sign-based residuals

  • tree fit weights use abs(normalized signed weights)

  • leaf updates are log(target_weight + reg) - log(original_weight + reg)

class mcreweight.models.onnxreweighter.ONNXIXGBReweighter(
transform=None,
verbosity=1,
n_iterations=30,
mixing_learning_rate=0.05,
clip_delta=2.0,
max_log_weight=3.0,
mixing_subsample=1.0,
random_state=42,
store_dir=None,
reweight_validation_fraction=0.2,
reweight_early_stopping_rounds=5,
reweight_metric_every=1,
**xgb_params,
)
get_params()

Return the parameters of the reweighter, including both the ONNXReweighterMixin parameters and the XGBoost parameters. This can be useful for logging, debugging, or reproducing the model configuration.

set_params(**params)

Set the parameters of the reweighter, allowing updates to both the ONNXReweighterMixin parameters and the XGBoost parameters. This can be useful for tuning the model configuration after initialization.

Args:

params: Arbitrary keyword arguments corresponding to the parameters to be updated.

class mcreweight.models.onnxreweighter.ONNXINNReweighter(
transform=None,
verbosity=1,
n_iterations=30,
mixing_learning_rate=0.1,
clip_delta=3.0,
max_log_weight=3.0,
mixing_subsample=1.0,
random_state=42,
store_dir=None,
reweight_validation_fraction=0.2,
reweight_early_stopping_rounds=5,
reweight_metric_every=1,
**nn_params,
)
get_params()

Return the parameters of the reweighter, including both the ONNXReweighterMixin parameters and the Neural Network parameters. This can be useful for logging, debugging, or reproducing the model configuration.

set_params(**params)

Set the parameters of the reweighter, allowing updates to both the ONNXReweighterMixin parameters and the Neural Network parameters. This can be useful for tuning the model configuration after initialization.

Args:

params: Arbitrary keyword arguments corresponding to the parameters to be updated.

class mcreweight.models.onnxreweighter.ONNXBinsReweighter(
transform=None,
verbosity=1,
n_bins=50,
n_neighs=2,
min_in_bin=1.0,
eps=1e-06,
)

N-dimensional histogram reweighter with simple neighbor smoothing.

It relies on the common helpers provided by BaseONNXReweighter for feature transformations, input validation, and per-class weight normalization.

fit(original, target, ow=None, tw=None)

Fit the reweighter by computing the ratio of N-dimensional histograms of original and target datasets.

Args:

original (array-like): Original dataset (e.g., MC samples). target (array-like): Target dataset (e.g., real data samples). ow (array-like, optional): Sample weights for the original dataset. If None, uniform weights are used. tw (array-like, optional): Sample weights for the target dataset. If None, uniform weights are used.

predict_weights(X, ow=None)

Predict reweighting factors for the input samples by looking up the ratio in the corresponding histogram bin.

Args:

X (array-like): Input samples for which to predict weights. ow (array-like, optional): Original weights for the input samples. If None, uniform weights are assumed.

Returns:

np.ndarray: Predicted reweighting factors for the input samples.

save(prefix)

Save the histogram edges and ratio to disk, along with meta information.

Args:
prefix (str): Prefix for the saved files. The edges and ratio are

saved to <prefix>_edges.npy and <prefix>_ratio.npy.

Folding classes

class mcreweight.models.onnxfolding.ONNXFoldingReweighter(
n_folds=5,
shuffle=True,
random_state=42,
transform=None,
verbosity=1,
**gb_params,
)

k-folding ensemble of ONNXGBReweighter models.

class mcreweight.models.onnxfolding.ONNXIXGBFoldingReweighter(
n_folds=5,
shuffle=True,
random_state=42,
transform=None,
verbosity=1,
**xgb_params,
)

k-folding ensemble of ONNXIXGBReweighter models.

class mcreweight.models.onnxfolding.ONNXINNFoldingReweighter(
n_folds=5,
shuffle=True,
random_state=42,
transform=None,
verbosity=1,
**nn_params,
)

k-folding ensemble of ONNXINNReweighter models.

Plotting utilities

mcreweight.utils.plotting_utils.set_lhcb_style(grid=True, size=12, usetex=False)

Set matplotlib plotting style close to “official” LHCb style (TeX Gyre Termes serif font, inward ticks on all sides, minor ticks, light grid).

mcreweight.utils.plotting_utils.plot_correlation_matrix(
args,
df,
columns,
weights,
x_labels,
title,
output_file,
)

Plot a correlation matrix for the given DataFrame columns.

Args:

args (argparse.Namespace): Command line arguments containing verbosity flag. df (pd.DataFrame): DataFrame containing the data. columns (list): List of column names to include in the correlation matrix. weights (np.ndarray, optional): Weights for the correlation calculation. If None, unweighted correlation is used. x_labels (dict): Mapping of column names to x-axis labels for the plot. title (str): Title of the plot. output_file (str): Path to save the output plot.

mcreweight.utils.plotting_utils.plot_distributions(
args,
mc,
data,
mc_weights,
data_weights,
columns,
x_labels,
output_file,
transform=None,
x_edges=None,
pull_clip=5,
)

Plot distributions with pull plots, handling MC and Data with different statistics. Histograms are normalized as densities, and pulls are correctly computed.

Args:

args (argparse.Namespace): Command line arguments containing verbosity flag. mc, data (pd.DataFrame): MC and Data samples. mc_weights, data_weights (np.ndarray): Weights for MC and Data. columns (list): Columns to plot. x_labels (dict): Mapping column names -> x-axis labels. output_file (str): Path to save figure. transform (callable, optional): Transformation function to apply to the data. x_edges (dict, optional): Column -> bin edges mapping. pull_clip (float): Maximum absolute value for pull display.

mcreweight.utils.plotting_utils.plot_mc_distributions(
mc,
original_mc_weights,
new_mc_weights,
columns,
x_labels,
output_file,
x_edges=None,
)

Plot distributions of MC data with weights.

Args:

mc (pd.DataFrame): MC data. original_mc_weights (np.ndarray): Original weights for the MC data. new_mc_weights (np.ndarray): Weights for the MC data. columns (list): List of column names to plot. x_labels (dict): Dictionary mapping column names to x-axis labels. output_file (str): Path to save the output plot. x_edges (dict, optional): Dictionary mapping column names to bin edges for histogramming.

mcreweight.utils.plotting_utils.plot_training_throughput(throughput, output_file)

Plot training throughput metrics for each method.

Args:

throughput (dict): Mapping method -> throughput metric dictionary. output_file (str): Output file path.

mcreweight.utils.plotting_utils.plot_training_memory(memory_profile, output_file)

Plot training memory metrics for each method.

Args:

memory_profile (dict): Mapping method -> memory metric dictionary. output_file (str): Output file path.

mcreweight.utils.plotting_utils.plot_roc_curve(sample, weights, methods, columns, output_file)

Plot ROC curve for the different reweighting methods.

Args:

sample (dict): Dictionary containing MC and Data samples and their weights. weights (dict): Dictionary containing weights for each method (GB, Folding, XGB, k-Folding, Bins, NN). methods (list): List of methods to include in the plot. columns (list): List of column names to use for plotting. output_file (str): Path to save the output plot.

Returns:

scores: Dictionaries containing classifier scores for each method.

mcreweight.utils.plotting_utils.plot_classifier_output(
scores,
weights,
methods,
output_file,
min_score=0.0,
max_score=1.0,
)

Produce two sets of classifier output plots.

  1. output_file: all methods’ MC score distributions overlaid on a single axes — no Target line — for a quick visual comparison.

  2. output_file with _{method} inserted before the extension: one file per method showing MC (solid) and Target (dashed) from the same per-method classifier, so the KS comparison is self-consistent.

mcreweight.utils.plotting_utils.plot_weight_distributions(
weights,
output_file,
bins=50,
xlim=(0, 10),
)

Plot histograms of weight distributions.

Args:

weights (dict): Dictionary where keys are labels and values are arrays of weights. output_file (str): Output file path for the plot. bins (int): Number of histogram bins. xlim (tuple or None): Limit for the x-axis, e.g., (0, 5). Default: (0, 10).

mcreweight.utils.plotting_utils.plot_2d_score_maps(
sample,
weights,
classifier_scores,
method,
vars,
output_file,
x_labels,
n_bins=40,
)

Plot 2D heatmaps of mean classifier score vs all possible pairs of variables.

Args:

sample (dict): Dictionary containing MC and Data samples. weights (dict): Dictionary of weights for each sample. classifier_scores (dict): Dictionary of classifier scores for each sample. method (str): Reweighter method name. vars (list): List of variables to consider for 2D plots. output_file (str): Path to save the figure. x_labels (dict): Dictionary mapping column names to x-axis labels. n_bins (int): Number of bins for the 2D histogram.

mcreweight.utils.plotting_utils.plot_feature_importance(
shap_values,
feature_names,
mc,
x_labels,
method,
output_file,
max_display=None,
)

Plot SHAP beeswarm (summary) plot for a reweighter.

Args:

shap_values (dict): Dictionary of SHAP values for each method. Keys should match method names. feature_names (list): Feature names mc (pd.DataFrame): MC sample used for SHAP value computation (for feature values) x_labels (dict): Dictionary mapping column names to x-axis labels. method (str): Reweighter method name. output_file (str): Path to save figure max_display (int): Max number of features to show

mcreweight.utils.plotting_utils.plot_2d_pull_maps(
mc,
data,
mc_weights,
data_weights,
columns,
x_labels,
method,
output_file,
n_bins=40,
pull_clip=5,
)

Plot 2D pull maps for all variable pairs.

The pull in each bin is computed as (data_density - mc_density) / sqrt(var_data + var_mc).

Both MC and Data are normalized to densities so that different statistics are handled correctly.

Args:

mc, data (pd.DataFrame): MC and Data samples. mc_weights, data_weights (np.ndarray): Weights for MC and Data. columns (list): List of column names to consider for the pull maps. x_labels (dict): Dictionary mapping column names to x-axis labels. method (str): Reweighting method name (for plot title). output_file (str): Path to save figure n_bins (int): Number of bins for the 2D histograms pull_clip (float): Maximum absolute value for pull map clipping