Sensitivity indices computation

Correlation coefficients

gsa_framework.sensitivity_methods.correlations.correlation_coefficients(filepath_Y, filepath_X_rescaled, cpus=None, selected_iterations=None)

Compute estimates of Pearson and Spearman correlation coefficients between vector Y and all columns of X.

Parameters

filepath_Y (Path or str) – Filepath to model outputs saved in .hdf5 format.
filepath_X_rescaled (Path or str) – Filepath to rescaled model inputs sampling in .hdf5 format.

Returns

sa_dict – Dictionary that contains Pearson and Spearman correlation coefficients.

Return type

dict

gsa_framework.sensitivity_methods.correlations.get_corrcoef_interval_width(theta=None, iterations=100, confidence_level=0.95)

Computes confidence interval width given number of iterations, “true” value of correlation coefficient theta and confidence_level.

Parameters

theta (float) – “True” correlation coefficient value that the estimator should approach. Can be Pearson, Kendall or Spearman.
iterations (int) – Number of iterations.
confidence_level (float) – Desired confidence level.

Returns

interval_width_dict – Dictionary with analytical confidence interval width for Pearson, Kendall and Spearman coefficients.

Return type

dict

References

Paper:: Bonett and Wright [BW00]

gsa_framework.sensitivity_methods.correlations.get_corrcoef_num_iterations(theta=None, interval_width=0.01, confidence_level=0.95)

Computes number of iterations for confident estimation of correlation coefficient with true value equal to theta.

Parameters

theta (float) – “True” correlation coefficient value that the estimator should approach. Can be Pearson, Kendall or Spearman.
interval_width (float) – Desired width of the confidence interval.
confidence_level (float) – Desired confidence level.

Returns

iterations_dict – Dictionary with number of iterations for Pearson, Kendall and Spearman coefficients.

Return type

dict

References

Paper:: Bonett and Wright [BW00]
Remark for testing:: num_iterations should agree with the values from Table 1 of the paper. Part of the table is tested in tests. Sometimes there is a difference of +-1 iteration. I think this is due to minor numerical imprecision.

Sobol indices

Saltelli estimators

gsa_framework.sensitivity_methods.saltelli_sobol.sobol_indices(filepath_Y, num_params, selected_iterations=None)

Compute estimations of Sobol’ first and total order indices.

High values of the Sobol first order index signify important parameters, while low values of the total indices point to non-important parameters. First order computes main effects only, total order takes into account interactions between parameters.

Parameters

filepath_Y (Path or str) – Filepath to model outputs y in .hdf5 format obtained by running model according to Saltelli samples.
num_params (int) – Number of model inputs.
selected_iterations (array of ints) – Iterations that should be included to compute Sobol indices.

Returns

sa_dict – Dictionary that contains computed first and total order Sobol indices.

Return type

dict

References

Paper:: Saltelli, Annoni, Azzini, Campolongo, Ratto, and Tarantola [SAA+10]
Link to the original implementation:: https://github.com/SALib/SALib/blob/master/src/SALib/analyze/sobol.py

Extended Fourier Amplitude Sensitivity Test (eFAST)

gsa_framework.sensitivity_methods.extended_FAST.eFAST_indices(filepath_Y, num_params, M=4, selected_iterations=None)

Compute estimations of Sobol’ first and total order indices with extended Fourier Amplitude Sensitivity Test (eFAST).

High values of the Sobol first order index signify important parameters, while low values of the total indices point to non-important parameters. First order computes main effects only, total order takes into account interactions between parameters.

Parameters

filepath_Y (Path or str) – Filepath to model outputs y in .hdf5 format obtained by running model according to eFAST samples.
num_params (int) – Number of model inputs.
M (int) – Interference factor, usually 4 or higher, should be consistent with eFAST sampling.
selected_iterations (array of ints) – Iterations that should be included to compute eFAST Sobol indices.

Returns

sa_dict – Dictionary that contains computed first and total order Sobol indices.

Return type

dict

References

Paper:: Saltelli, Tarantola, and Chan [STC99]
Link to the original implementation:: https://github.com/SALib/SALib/blob/master/src/SALib/analyze/fast.py

Delta moment-independent indices

gsa_framework.sensitivity_methods.delta.delta_indices(filepath_Y, filepath_X_rescaled, num_resamples=1, conf_level=0.95, seed=None, cpus=None)

Compute estimations of delta moment-independent indices.

Parameters

filepath_Y (Path or str) – Filepath to model outputs y in .hdf5 format.
filepath_X_rescaled (Path or str) – Filepath to rescaled model inputs sampling in .hdf5 format.
num_resamples (int) – Number of bootstrap resamples to employ bias reduction bootstrap approach.
confidence_level (float) – Desired confidence level.
seed (int) – Random seed.
cpus (int) – Number of cpus for parallel computation of delta indices with multiprocessing library.

Returns

sa_dict – Dictionary that contains computed delta indices with their confidence intervals.

Return type

dict

References

Paper:: Borgonovo [Bor07]
Link to the original implementation:: https://github.com/SALib/SALib/blob/master/src/SALib/analyze/delta.py

Feature importance with gradient boosting

gsa_framework.sensitivity_methods.gradient_boosting.xgboost_indices(filepath_Y, filepath_X, tuning_parameters=None, test_size=0.2, xgb_model=None, importance_types=None, flag_return_xgb_model=True)

Compute fscores obtained from the gradient boosted trees regression using XGBoost library.

Parameters

filepath_Y (Path or str) – Filepath to model outputs y in .hdf5 format.
filepath_X (Path or str) – Filepath to unitcube or rescaled model inputs sampling in .hdf5 format.
tuning_parameters (dict) – Dictionary with XGBoost tuning parameters.
test_size (float) – Fraction of samples for test set.
xgb_model (Path or Booster object) – Model that can be used as warm start.
importance_types (list) – List of feature importance types to compute, by default computes everything.
flag_return_xgb_model (Bool) – Specify whether Booster model should be saved after training.

Returns

sa_dict – Dictionary that contains computed sensitivity indices.

Return type

dict

References

Paper:: Chen and Guestrin [CG16]
Link to XGBoost library:: https://xgboost.readthedocs.io/en/latest/index.html