madminer.fisherinformation module¶
-
class
madminer.fisherinformation.
FisherInformation
(filename, include_nuisance_parameters=True, debug=False)¶ Bases:
object
Functions to calculate expected Fisher information matrices.
After inializing a FisherInformation instance with the filename of a MadMiner file, different information matrices can be calculated:
- FisherInformation.calculate_fisher_information_full_truth() calculates the full truth-level Fisher information. This is the information in an idealized measurement where all parton-level particles with their charges, flavours, and four-momenta can be accessed with perfect accuracy.
- FisherInformation.calculate_fisher_information_full_detector() calculates the full Fisher information in realistic detector-level observations, estimated with neural networks. In addition to the MadMiner file, this requires a trained SALLY or SALLINO estimator as well as an unweighted evaluation sample.
- FisherInformation.calculate_fisher_information_rate() calculates the Fisher information in the total cross section.
- FisherInformation.calculate_fisher_information_hist1d() calculates the Fisher information in the histogram of one (parton-level or detector-level) observable.
- FisherInformation.calculate_fisher_information_hist2d() calculates the Fisher information in a two-dimensional histogram of two (parton-level or detector-level) observables.
- FisherInformation.histogram_of_fisher_information() calculates the full truth-level Fisher information in different slices of one observable (the “distribution of the Fisher information”).
Finally, don’t forget that in the presence of nuisance parameters the constraint terms also affect the Fisher information. This term is given by FisherInformation.calculate_fisher_information_nuisance_constraints().
Parameters: - filename : str
Path to MadMiner file (for instance the output of madminer.delphes.DelphesProcessor.save()).
- include_nuisance_parameters : bool, optional
If True, nuisance parameters are taken into account. Default value: True.
- debug : bool, optional
If True, additional detailed debugging output is printed. Default value: False.
Methods
calculate_fisher_information_full_detector
(…)Calculates the full Fisher information in realistic detector-level observations, estimated with neural networks. calculate_fisher_information_full_truth
(theta)Calculates the full Fisher information at parton / truth level. calculate_fisher_information_hist1d
(theta, …)Calculates the Fisher information in the one-dimensional histogram of an (parton-level or detector-level, depending on how the observations in the MadMiner file were calculated) observable. calculate_fisher_information_hist2d
(theta, …)Calculates the Fisher information in a two-dimensional histogram of two (parton-level or detector-level, depending on how the observations in the MadMiner file were calculated) observables. calculate_fisher_information_nuisance_constraints
()Builds the Fisher information term representing the Gaussian constraints on the nuisance parameters calculate_fisher_information_rate
(theta, …)Calculates the Fisher information in a measurement of the total cross section (without any kinematic information). extract_observables_and_weights
(thetas)Extracts observables and weights for given parameter points. extract_raw_data
([theta])Returns all events together with the benchmark weights (if theta is None) or weights for a given theta. histogram_of_fisher_information
(theta, …)Calculates the full and rate-only Fisher information in slices of one observable. -
calculate_fisher_information_full_detector
(theta, model_file, unweighted_x_sample_file=None, luminosity=300000.0, include_xsec_info=True, mode='score', uncertainty='ensemble', ensemble_vote_expectation_weight=None, batch_size=100000, test_split=0.5)¶ Calculates the full Fisher information in realistic detector-level observations, estimated with neural networks. In addition to the MadMiner file, this requires a trained SALLY or SALLINO estimator.
Nuisance parameter are taken into account automatically if the SALLY / SALLINO model was trained with them.
Parameters: - theta : ndarray
Parameter point theta at which the Fisher information matrix I_ij(theta) is evaluated.
- model_file : str
Filename of a trained local score regression model that was trained on samples from theta (see madminer.ml.MLForge).
- unweighted_x_sample_file : str or None
Filename of an unweighted x sample that is sampled according to theta and obeys the cuts (see madminer.sampling.SampleAugmenter.extract_samples_train_local()). If None, the Fisher information is instead calculated on the full, weighted samples (the data in the MadMiner file). Default value: None.
- luminosity : float, optional
Luminosity in pb^-1. Default value: 300000.
- include_xsec_info : bool, optional
Whether the rate information is included in the returned Fisher information. Default value: True.
- mode : {“score”, “information”}, optional
How the ensemble uncertainty on the kinematic Fisher information is calculated. If mode is “information”, the Fisher information for each estimator is calculated individually and only then are the sample mean and covariance calculated. If mode is “score”, the sample mean is calculated for the score for each event. Default value: “score”.
- uncertainty : {“ensemble”, “expectation”, “sum”}, optional
How the covariance matrix of the Fisher information estimate is calculated. With “ensemble”, the ensemble covariance is used. With “expectation”, the expectation of the score is used as a measure of the uncertainty of the score estimator, and this uncertainty is propagated through to the covariance matrix. With “sum”, both terms are summed. Default value: “ensemble”.
- ensemble_vote_expectation_weight : float or list of float or None, optional
For ensemble models, the factor that determines how much more weight is given to those estimators with small expectation value. If a list is given, results are returned for each element in the list. If None, or if EnsembleForge.calculate_expectation() has not been called, all estimators are treated equal. Default value: None.
- batch_size : int, optional
Batch size. Default value: 100000.
- test_split : float or None, optional
If unweighted_x_sample_file is None, this determines the fraction of weighted events used for evaluation. If None, all events are used (this will probably include events used during training!). Default value: 0.5.
Returns: - fisher_information : ndarray or list of ndarray
Estimated expected full detector-level Fisher information matrix with shape (n_parameters, n_parameters). If more then one value ensemble_vote_expectation_weight is given, this is a list with results for all entries in ensemble_vote_expectation_weight.
- fisher_information_uncertainty : ndarray or list of ndarray or None
Covariance matrix of the Fisher information matrix with shape (n_parameters, n_parameters, n_parameters, n_parameters). If more then one value ensemble_vote_expectation_weight is given, this is a list with results for all entries in ensemble_vote_expectation_weight.
-
calculate_fisher_information_full_truth
(theta, luminosity=300000.0, cuts=None, efficiency_functions=None, include_nuisance_parameters=True)¶ Calculates the full Fisher information at parton / truth level. This is the information in an idealized measurement where all parton-level particles with their charges, flavours, and four-momenta can be accessed with perfect accuracy, i.e. the latent variables z_parton can be measured directly.
Parameters: - theta : ndarray
Parameter point theta at which the Fisher information matrix I_ij(theta) is evaluated.
- luminosity : float
Luminosity in pb^-1.
- cuts : None or list of str, optional
Cuts. Each entry is a parseable Python expression that returns a bool (True if the event should pass a cut, False otherwise). Default value: None.
- efficiency_functions : list of str or None
Efficiencies. Each entry is a parseable Python expression that returns a float for the efficiency of one component. Default value: None.
- include_nuisance_parameters : bool, optional
If True, nuisance parameters are taken into account. Default value: True.
Returns: - fisher_information : ndarray
Expected full truth-level Fisher information matrix with shape (n_parameters, n_parameters).
- fisher_information_uncertainty : ndarray
Covariance matrix of the Fisher information matrix with shape (n_parameters, n_parameters, n_parameters, n_parameters), calculated with plain Gaussian error propagation.
-
calculate_fisher_information_hist1d
(theta, luminosity, observable, nbins, histrange=None, cuts=None, efficiency_functions=None, n_events_dynamic_binning=100000)¶ Calculates the Fisher information in the one-dimensional histogram of an (parton-level or detector-level, depending on how the observations in the MadMiner file were calculated) observable.
Parameters: - theta : ndarray
Parameter point theta at which the Fisher information matrix I_ij(theta) is evaluated.
- luminosity : float
Luminosity in pb^-1.
- observable : str
Expression for the observable to be histogrammed. The str will be parsed by Python’s eval() function and can use the names of the observables in the MadMiner files.
- nbins : int
Number of bins in the histogram, excluding overflow bins.
- histrange : tuple of float or None, optional
Minimum and maximum value of the histogram in the form (min, max). Overflow bins are always added. If None, variable-width bins with equal cross section are constructed automatically. Default value: None.
- cuts : None or list of str, optional
Cuts. Each entry is a parseable Python expression that returns a bool (True if the event should pass a cut, False otherwise). Default value: None.
- efficiency_functions : list of str or None
Efficiencies. Each entry is a parseable Python expression that returns a float for the efficiency of one component. Default value: None.
- n_events_dynamic_binning : int, optional
Number of events used to calculate the dynamic binning (if histrange is None). Default value: 100000.
Returns: - fisher_information : ndarray
Expected Fisher information in the histogram with shape (n_parameters, n_parameters).
- fisher_information_uncertainty : ndarray
Covariance matrix of the Fisher information matrix with shape (n_parameters, n_parameters, n_parameters, n_parameters), calculated with plain Gaussian error propagation.
-
calculate_fisher_information_hist2d
(theta, luminosity, observable1, nbins1, observable2, nbins2, histrange1=None, histrange2=None, cuts=None, efficiency_functions=None, n_events_dynamic_binning=100000)¶ Calculates the Fisher information in a two-dimensional histogram of two (parton-level or detector-level, depending on how the observations in the MadMiner file were calculated) observables.
Parameters: - theta : ndarray
Parameter point theta at which the Fisher information matrix I_ij(theta) is evaluated.
- luminosity : float
Luminosity in pb^-1.
- observable1 : str
Expression for the first observable to be histogrammed. The str will be parsed by Python’s eval() function and can use the names of the observables in the MadMiner files.
- nbins1 : int
Number of bins along the first axis in the histogram, excluding overflow bins.
- observable2 : str
Expression for the first observable to be histogrammed. The str will be parsed by Python’s eval() function and can use the names of the observables in the MadMiner files.
- nbins2 : int
Number of bins along the first axis in the histogram, excluding overflow bins.
- histrange1 : tuple of float or None, optional
Minimum and maximum value of the first axis of the histogram in the form (min, max). Overflow bins are always added. If None, variable-width bins with equal cross section are constructed automatically. Default value: None.
- histrange2 : tuple of float or None, optional
Minimum and maximum value of the first axis of the histogram in the form (min, max). Overflow bins are always added. If None, variable-width bins with equal cross section are constructed automatically. Default value: None.
- cuts : None or list of str, optional
Cuts. Each entry is a parseable Python expression that returns a bool (True if the event should pass a cut, False otherwise). Default value: None.
- efficiency_functions : list of str or None
Efficiencies. Each entry is a parseable Python expression that returns a float for the efficiency of one component. Default value: None.
- n_events_dynamic_binning : int, optional
Number of events used to calculate the dynamic binning (if histrange is None). Default value: 100000.
Returns: - fisher_information : ndarray
Expected Fisher information in the histogram with shape (n_parameters, n_parameters).
- fisher_information_uncertainty : ndarray
Covariance matrix of the Fisher information matrix with shape (n_parameters, n_parameters, n_parameters, n_parameters), calculated with plain Gaussian error propagation.
-
calculate_fisher_information_nuisance_constraints
()¶ Builds the Fisher information term representing the Gaussian constraints on the nuisance parameters
-
calculate_fisher_information_rate
(theta, luminosity, cuts=None, efficiency_functions=None, include_nuisance_parameters=True)¶ Calculates the Fisher information in a measurement of the total cross section (without any kinematic information).
Parameters: - theta : ndarray
Parameter point theta at which the Fisher information matrix I_ij(theta) is evaluated.
- luminosity : float
Luminosity in pb^-1.
- cuts : None or list of str, optional
Cuts. Each entry is a parseable Python expression that returns a bool (True if the event should pass a cut, False otherwise). Default value: None.
- efficiency_functions : list of str or None
Efficiencies. Each entry is a parseable Python expression that returns a float for the efficiency of one component. Default value: None.
- include_nuisance_parameters : bool, optional
If True, nuisance parameters are taken into account. Default value: True.
Returns: - fisher_information : ndarray
Expected Fisher information in the total cross section with shape (n_parameters, n_parameters).
- fisher_information_uncertainty : ndarray
Covariance matrix of the Fisher information matrix with shape (n_parameters, n_parameters, n_parameters, n_parameters), calculated with plain Gaussian error propagation.
-
extract_observables_and_weights
(thetas)¶ Extracts observables and weights for given parameter points.
Parameters: - thetas : ndarray
Parameter points, with shape (n_thetas, n_parameters).
Returns: - x : ndarray
Observations x with shape (n_events, n_observables).
- weights : ndarray
Weights dsigma(x|theta) in pb with shape (n_thetas, n_events).
-
extract_raw_data
(theta=None)¶ Returns all events together with the benchmark weights (if theta is None) or weights for a given theta.
Parameters: - theta : None or ndarray, optional
If None, the function returns the benchmark weights. Otherwise it uses morphing to calculate the weights for this value of theta. Default value: None.
Returns: - x : ndarray
Observables with shape (n_unweighted_samples, n_observables).
- weights : ndarray
If theta is None, benchmark weights with shape (n_unweighted_samples, n_benchmarks_phys) in pb. Otherwise, weights for the given parameter theta with shape (n_unweighted_samples,) in pb.
-
histogram_of_fisher_information
(theta, luminosity, observable, nbins, histrange, cuts=None, efficiency_functions=None)¶ Calculates the full and rate-only Fisher information in slices of one observable.
Parameters: - theta : ndarray
Parameter point theta at which the Fisher information matrix I_ij(theta) is evaluated.
- luminosity : float
Luminosity in pb^-1.
- observable : str
Expression for the observable to be sliced. The str will be parsed by Python’s eval() function and can use the names of the observables in the MadMiner files.
- nbins : int
Number of bins in the slicing, excluding overflow bins.
- histrange : tuple of float
Minimum and maximum value of the slicing in the form (min, max). Overflow bins are always added.
- cuts : None or list of str, optional
Cuts. Each entry is a parseable Python expression that returns a bool (True if the event should pass a cut, False otherwise). Default value: None.
- efficiency_functions : list of str or None
Efficiencies. Each entry is a parseable Python expression that returns a float for the efficiency of one component. Default value: None.
Returns: - bin_boundaries : ndarray
Observable slice boundaries.
- sigma_bins : ndarray
Cross section in pb in each of the slices.
- rate_fisher_infos : ndarray
Expected rate-only Fisher information for each slice. Has shape (n_slices, n_parameters, n_parameters).
- full_fisher_infos_truth : ndarray
Expected full truth-level Fisher information for each slice. Has shape (n_slices, n_parameters, n_parameters).
-
madminer.fisherinformation.
profile_information
(fisher_information, remaining_components, covariance=None, error_propagation_n_ensemble=1000, error_propagation_factor=0.001)¶ Calculates the profiled Fisher information matrix as defined in Appendix A.4 of arXiv:1612.05261.
Parameters: - fisher_information : ndarray
Original n x n Fisher information.
- remaining_components : list of int
List with m entries, each an int with 0 <= remaining_compoinents[i] < n. Denotes which parameters are kept, and their new order. All other parameters or profiled out.
- covariance : ndarray or None, optional
The covariance matrix of the original Fisher information with shape (n, n, n, n). If None, the error on the profiled information is not calculated. Default value: None.
- error_propagation_n_ensemble : int, optional
If covariance is not None, this sets the number of Fisher information matrices drawn from a normal distribution for the Monte-Carlo error propagation. Default value: 1000.
- error_propagation_factor : float, optional
If covariance is not None, this factor multiplies the covariance of the distribution of Fisher information matrices. Smaller factors can avoid problems with ill-behaved Fisher information matrices. Default value: 1.e-3.
Returns: - profiled_fisher_information : ndarray
Profiled m x m Fisher information, where the i-th row or column corresponds to the remaining_components[i]-th row or column of fisher_information.
- profiled_fisher_information_covariance : ndarray
Covariance matrix of the profiled Fishere information matrix with shape (m, m, m, m).
-
madminer.fisherinformation.
project_information
(fisher_information, remaining_components)¶ Calculates projections of a Fisher information matrix, that is, “deletes” the rows and columns corresponding to some parameters not of interest.
Parameters: - fisher_information : ndarray
Original n x n Fisher information.
- remaining_components : list of int
List with m entries, each an int with 0 <= remaining_compoinents[i] < n. Denotes which parameters are kept, and their new order. All other parameters or projected out.
Returns: - projected_fisher_information : ndarray
Projected m x m Fisher information, where the i-th row or column corresponds to the remaining_components[i]-th row or column of fisher_information.