Configuration reference
mcreweight exposes two entry-point commands, run-reweight and
apply-weights. Both accept a YAML configuration file and support CLI
overrides for every option. The CLI always takes precedence over the YAML.
run-reweight
Trains one or more reweighting models and produces diagnostic plots.
Invoke as:
run-reweight --config run.yaml [overrides ...]
run-reweight --dry-run --config run.yaml # validate config without running
YAML skeleton
input:
mc:
path: ["/path/to/mc.root"] # required
tree: DecayTree # default: DecayTree
mcweights_name: null # branch name; null → uniform weights of 1
mcweights_tree: null # separate tree for mcweights_name; null → same tree
label: MC # label used in plots
data:
path: ["/path/to/data.root"] # required
tree: DecayTree
sweights_name: sweight_sig # default: sweight_sig
sweights_tree: null # separate tree for sweights_name; null → same tree
label: Data
path_xlabels: null # path to YAML of axis labels; null → package defaults
variables:
training_vars: # required; list of branch names or expressions
- B_DTF_Jpsi_P
- B_DTF_Jpsi_PT
- nPVs
- nLongTracks
monitoring_vars: null # extra variables to plot but not train on
reweighting:
sample: bd_jpsikst_ee # subdirectory name under weightsdir and plotdir
methods: # one or more of the values below
- GB
- Folding
- ONNXGB
- ONNXFolding
- XGB
- XGBFolding
- NN
- NNFolding
- Bins
transform: null # quantile | yeo-johnson | signed-log | scaler | null
n_trials: 10 # Optuna trials; set to 1 to skip tuning
test_size: 0.30 # fraction of events held out for testing
n_folds: 10 # number of folds for Folding variants
n_bins: 10 # bins per axis for the Bins method
n_neighs: 3 # neighbor-smoothing radius for Bins
reweight_validation_fraction: 0.20 # validation split for iterative early stopping
reweight_early_stopping_rounds: 5 # patience (consecutive checks without improvement)
reweight_metric_every: 1 # evaluate validation metric every N stages
clip_weights: true # clip predicted weights at the 99th percentile
folding_aggregation: weighted_geometric # weighted_geometric | geometric | median
max_log_weight: 3.0 # max |log-weight| per event during iterative training
shap: false # compute SHAP feature-importance values
output:
weightsdir: null # root directory for models and weight arrays;
# falls back to $MCREWEIGHTS_DATA_ROOT if unset
plotdir: plots # root directory for plots
plotting:
style: plain # plain | LHCb
sample_label: null # text in the top-right of each plot frame (LHCb style only)
extra_label: null # italic text after "LHCb", e.g. Simulation or Preliminary
Key descriptions
Key (YAML path) |
Description |
|---|---|
|
List of paths to the MC ROOT files. Multiple files are concatenated. |
|
Name of the TTree inside each MC file. Default: |
|
Branch name to read per-event MC weights from. Accepts a plain branch
name or a mathematical expression built from branch names (e.g.
|
|
Name of a separate TTree from which |
|
Display label used in all plots. Default: |
|
List of paths to the data ROOT files. |
|
Name of the TTree inside each data file. Default: |
|
Branch name for per-event sWeights (or any data-side weight). Accepts
plain names or expressions. Default: |
|
Separate TTree from which |
|
Display label used in all plots. Default: |
|
Path to a YAML file mapping branch names to human-readable axis labels.
When |
|
List of feature names or expressions used to train the reweighter.
Expressions involving |
|
Additional variables plotted before and after reweighting but not used
for training. |
|
Subdirectory name appended to both |
|
Ordered list of reweighting backends to train. Folding variants require
the corresponding base method to also be present (e.g. |
|
Optional feature transform applied before training by all ONNX-capable
methods. Choices: |
|
Number of Optuna trials for hyperparameter search. Supported for
|
|
Fraction of events reserved for testing (not used during training).
Default: |
|
Number of K-folds used by the |
|
Number of histogram bins per axis for the |
|
Neighbor-smoothing radius (in bins) for the |
|
Fraction of the training set used as a validation sample for early
stopping in |
|
Number of consecutive validation checks without improvement before
iterative training halts. Default: |
|
Evaluate the validation KS metric every N stages. Default: |
|
When |
|
Maximum absolute log-weight allowed per event during iterative training
for |
|
How fold-level predictions are combined for |
|
When |
|
Root directory where trained models and weight arrays are written. A
|
|
Root directory for diagnostic plots. A |
|
Plot style. |
|
Text placed in the top-right of each plot frame when |
|
Italic text rendered immediately after |
CLI reference
All options below override their YAML counterparts when supplied on the command line.
run-reweight [--config YAML] [options]
General
--config PATH YAML configuration file
--dry-run Validate config and print resolved settings; do not train
--verbosity {1,2,3,4} Logging level (default: 1)
MC input
--path-mc PATH [PATH …] Path(s) to MC ROOT file(s)
--tree-mc TREE MC TTree name
--mcweights-name BRANCH MC weights branch or expression
--mcweights-tree TREE Separate tree for MC weights
--mc-label LABEL MC label for plots
Data input
--path-data PATH [PATH …] Path(s) to data ROOT file(s)
--tree-data TREE Data TTree name
--sweights-name BRANCH Data sWeights branch or expression; pass
``none`` to use uniform data weights
--sweights-tree TREE Separate tree for sWeights
--data-label LABEL Data label for plots
Variables
--training-vars VAR [VAR …] Training feature names or expressions
--monitoring-vars VAR [VAR …] Monitoring variable names (not trained on)
Reweighting
--sample NAME Sample subdirectory name
--methods METHOD [METHOD …] Backends to train; see methods above
--transform {quantile,yeo-johnson,signed-log,scaler}
Feature transform
--n_trials INT Optuna trials (1 = no tuning)
--test_size FLOAT Test-split fraction
--n_folds INT Number of K-folds
--n_bins INT Bins per axis (Bins method)
--n_neighs INT Neighbor-smoothing radius (Bins method)
--reweight-validation-fraction FLOAT Validation fraction for early stopping
--reweight-early-stopping-rounds INT Early-stopping patience
--reweight-metric-every INT Validate every N stages
--clip-weights / --clip-weight Enable weight clipping (flags; default on)
--max-log-weight FLOAT Max |log-weight| per event for XGB/NN (default 3.0)
--folding-aggregation {weighted_geometric,geometric,median}
Fold-prediction aggregation strategy
--shap Compute SHAP feature importances
Output
--weightsdir DIR Root directory for model artifacts
--plotdir DIR Root directory for plots
--path-xlabels PATH YAML file of axis labels
Plotting
--style {plain,LHCb} Plot style
--sample-label TEXT Top-right frame label (LHCb style only)
--extra-label TEXT Italic text after "LHCb", e.g. Simulation or Preliminary
apply-weights
Applies a previously trained model to a (possibly different) MC sample and writes the predicted weights back to a ROOT file.
Invoke as:
apply-weights --config apply.yaml [overrides ...]
apply-weights --dry-run --config apply.yaml
YAML skeleton
input:
mc:
path: ["/path/to/mc_apply.root"] # required
tree: DecayTree
mcweights_name: null
mcweights_tree: null
label: MC # label used in plots
data: # optional; enables comparison plots
path: ["/path/to/data.root"]
tree: DecayTree
sweights_name: sweight_sig
sweights_tree: null
label: Data # label used in plots
path_xlabels: null
variables:
application_vars: # variables in the application MC file
- B_DTF_Jpsi_P
- B_DTF_Jpsi_PT
- nPVs
- nLongTracks
training_vars: # variable names used during training
- B_DTF_Jpsi_P # (must have the same length as application_vars)
- B_DTF_Jpsi_PT
- nPVs
- nLongTracks
monitoring_vars: null
reweighting:
method: XGB # single method to apply
training_sample: bd_jpsikst_ee # subdirectory where the trained model lives
application_sample: bd_jpsikst_ee # subdirectory where output weights are written
weightsdir: null # falls back to $MCREWEIGHTS_DATA_ROOT
plotdir: plots
output:
output_path: "/path/to/output.root" # required
output_ntuple: TTree # TTree | RNTuple
output_tree: DecayTree
weights_name: weights # branch name written to the output file
Key descriptions
Key (YAML path) |
Description |
|---|---|
|
List of paths to the MC ROOT files to apply weights to. |
|
TTree name. Default: |
|
Prior MC weight branch or expression. |
|
Separate TTree for |
|
Display label used in comparison plots. Default: |
|
Optional data files. When provided, comparison distributions are plotted. |
|
Data TTree name. Default: |
|
Data sWeights branch or expression. Default: |
|
Separate TTree for |
|
Display label used in comparison plots. Default: |
|
Path to an axis-label YAML file. |
|
Branch names (or expressions) in the application MC file to feed into
the model. Must have the same length as |
|
Feature names used when the model was trained. These are the column names expected by the saved model. |
|
Extra variables to include in the output plots. |
|
The single backend whose saved model is loaded. Must match a method
that was trained in a prior |
|
Subdirectory under |
|
Subdirectory under |
|
Root directory containing trained artifacts. Falls back to
|
|
Root directory for application plots. Default: |
|
Path of the ROOT file to write. The file is created from scratch; all branches from the input MC tree plus the new weights branch are written. |
|
Output format: |
|
TTree or RNTuple name in the output file. Default: |
|
Name of the branch that holds the predicted weights in the output file.
Default: |
CLI reference
apply-weights [--config YAML] [options]
General
--config PATH YAML configuration file
--dry-run Validate config and print resolved settings; do not run
--verbosity {1,2,3,4} Logging level (default: 1)
MC input
--path-mc PATH [PATH …] Path(s) to MC ROOT file(s)
--tree-mc TREE MC TTree name
--mcweights-name BRANCH MC weights branch or expression
--mcweights-tree TREE Separate tree for MC weights
--mc-label LABEL MC label for plots
Data input (optional; enables comparison plots)
--path-data PATH [PATH …] Path(s) to data ROOT file(s)
--tree-data TREE Data TTree name
--sweights-name BRANCH Data sWeights branch or expression; pass
``none`` to use uniform data weights
--sweights-tree TREE Separate tree for sWeights
--data-label LABEL Data label for plots
Variables
--vars VAR [VAR …] Application variable names (alias: --vars)
--training-vars VAR [VAR …] Training variable names the model expects
--monitoring-vars VAR [VAR …] Monitoring variable names
Reweighting
--method METHOD Backend to apply (see choices above)
--training-sample NAME Subdirectory with trained model artifacts
--application-sample NAME Subdirectory for output weights and plots
--weightsdir DIR Root directory for model artifacts
--plotdir DIR Root directory for plots
Output
--output-path PATH Path for the output ROOT file
--output-ntuple FORMAT Output ntuple format: TTree (default) or RNTuple
--output-tree TREE Output tree name
--weights-name BRANCH Branch name for the predicted weights in the output
--path-xlabels PATH YAML file of axis labels
Environment variables
MCREWEIGHTS_DATA_ROOTFallback value for
weightsdirwhen it is not set in the YAML or on the CLI. Bothrun-reweightandapply-weightsraise an error ifweightsdiris ultimately unresolvable.
Legacy key aliases
The following YAML keys are recognized for backwards compatibility but are superseded by the canonical names above:
Legacy key |
Canonical equivalent |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|