Command Line Interface

Introduction

The file settings.json contains the configuration of BATMAN. It can be devided into 2 mandatory blocks and 3 optionnal block. There is no specific order to respect.

Note

A prefilled example is shown in settings.json located in test_cases/Snippets.

Help of the CLI can be triggered with:

batman -h

usage: BATMAN [-h] [--version] [-v] [-c] [-s] [-o OUTPUT] [-r] [-n] [-u] [-q]
          settings

BATMAN creates a surrogate model and perform UQ.

positional arguments:
  settings              path to settings file

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         set verbosity from WARNING to DEBUG, [default: False]
  -c, --check           check settings, [default: False]
  -s, --save-snapshots  save the snapshots to disk when using a function,
                        [default: False]
  -o OUTPUT, --output OUTPUT
                        path to output directory, [default: ./output]
  -r, --restart         restart pod, [default: False]
  -n, --no-surrogate    do not compute surrogate but read it from disk,
                        [default: False]
  -u, --uq              Uncertainty Quantification study, [default: False].
  -q, --q2              estimate Q2 and find the point with max MSE, [default:
                        False]

Note

Fields in square brackets are optionnals.

Block 1 - Space of Parameters

First of all, we define the parameter space using an hypercube. Taking the minimal and the maximal value along all coordinates allow to describe it.

_images/hypercube.pdf

3-dimentionnal hypercube

"space": {
    "corners": [
        [15.0, 2500.0],
        [60.0, 6000.0]
    ],
    "sampling": {
        "init_size": 4,
        "method": "halton"
    },
    "resampling":{
        "delta_space": 0.08,
        "resamp_size": 0,
        "method": "sigma",
        "hybrid": [["sigma", 4], ["loo_sobol", 2]],
        "q2_criteria": 0.9
    }
}
  • corners: Required array, define the space using the two corners of the hypercube [[min], [max]],

  • sampling: Define the configuration of the sample. This can either be; a list of sample as an array_like of shape (n_samples, n_features); or a dictionary with the following:

    • init_size: Required integer, define the initial number of snapshots,

    • method: Required string, method to create the DoE, can be halton, sobol, sobolscrample, lhs (Latin Hypercube Sampling), lhsc (Latin Hypercube Sampling Centered), olhs (optimized LHS), faure, uniform, saltelli

    • distributions: Optional array, a list of distributions. Ex for two input variables: ["Uniform(15., 60.)", "Normal(4035., 400.)"].

  • resampling: Optional, to do resampling, fill this dictionary
    • delta_space: Optional number, the percentage of space to shrink to not resample close to boundaries. For 0.08, the available space for resampling will be shrinked by 8%.

    • resamp_size: Required integer, number of point to add in the parameter space.

    • method: Required string, to be choosen from discrepancy, ego_discrepancy, sigma_discrepancy, sigma_distance, sigma, loo_sigma, loo_sobol, extrema, hybrid or optimization (ressampling method are only compatible with specific surrogate prediction method see :ref:’Space <space>’.

    • hybrid: if method is hybrid. You have to define a generator which is a list [["method", n_snapshot]]

    • q2_criteria: optional number, stopping criterion based on the quality estimation of the model.

    • extremum: optional string, Minimization or maximization objective: ‘min’, ‘max’.

    • weights: optional array, when the optimisation problem is composed (ex: sigma_distance), a weight factor is used to balance the influence of each function.

    • delta_space: optional number, shriking factor for the parameter space.

The method used to create the DoE is paramount. It ensures that that the physics will be captured correclty all over the domain of interest, see Space. All faure, halton and sobol methods are low discrepancy sequences with good filling properties. saltelli is particular as it will create a DoE for the computation of Sobol’ indices using Saltelli’s formulation.

When distribution is set, a join distribution is built an is used to perform an inverse transformation (inverse CDF) on the sample. This allows to have a low discrepancy sample will still following some distribution.

Regarding the resampling, all methods need a good initial sample. Meanning that the quality is about \(Q_2\sim0.5\). loo_sigma, loo_sobol work better than sigma in high dimentionnal cases (>2).

Warning

If using a PC surrogate model, the only possibilities are discrepancy and extrema. Furthermore, sampling method must be set as a list of distributions.

Block 2 - Snapshot provider

A snapshot defines a simulation.

"snapshot": {
    "max_workers": 10,
    "plabels": ["x1", "x2"],
    "flabels": ["F"],
    "provider": {
        "type": "job",
        "command": "python function.py",
        "context_directory": "data",
        "coupling": {
            "coupling_directory": "batman-coupling",
            "input_fname": "sample-space.npy",
            "input_format": "npy",
            "output_fname": "sample-data.npy",
            "output_format": "npy"
        },
        "clean": false
    },
    "io": {
        "space_fname": "sample-space.npy",
        "space_format": "npy",
        "data_fname": "sample-data.npy",
        "data_format": "npy"
    }
}
  • max_workers: Required integer, maximum number of simultaneous running snapshot

  • plabels: Required array, input parameter names (for space)

  • flabels: Required array, output feature names (for data)

  • psizes: Optional array, number of components of parameters

  • fsizes: Optional array, number of components of output features

  • provider: The provider defines what is a simulation
    • type: Required string, define the type of provider can be function, job or command

    If type is function:
    • module: Required string, python module to load

    • function: Required string, function in module to execute for generating data

    • discover: Optional string, UNIX-style patterns for directories with pairs of sample files to import

    If type is job:
    • command: Required string, command to use to launch the script

    • context_directory: Required string, store every ressource required for executing a job

    • coupling_directory: Optional string, sub-directory in context_directory that will contain input parameters and output file

    • coupling: Optional, definition of the snapshots IO files:
      • coupling_directory: Optional string, sub-directory in context_directory that will contain input parameters and output file

      • input_fname: Optional string, basename for files storing the point coordinates plabels

      • input_format: Optional string, json, csv, npy, npz or any Antares format if installed, for speed reason preferred the use of npy

      • output_fname: Optional string, basename for files storing values associated to flabels

      • output_format: Optional string, json, csv, npy, npz or any Antares format if installed, for speed reason preferred the use of npy

    • hosts: Optional, definition of the remote HOSTS if any:
      • hostname: Required string, remote host to connect to

      • remote_root: Required string, remote folder to create and store data

      • username: Optional string, username

      • password: Optional string, password

    • clean: Optional boolean, delete working directory after run

    • discover: Optional string, UNIX-style patterns for directories with pairs of sample files to import

    If type is file:
    • file_pairs: Required array, list of paires (space_file, data_file)

    • discover: Optional string, UNIX-style patterns for directories with pairs of sample files to import

  • io: Optional input output information
    • space_fname: Required string, file format for space

    • space_format: Optional string, json, csv, npy, npz or any Antares format if installed, for speed reason preferred the use of npy

    • data_fname: Required string, file name for data

    • data_format: Optional string, json, csv, npy, npz or any Antares format if installed, for speed reason preferred the use of npy

Optionnal Block 3 - Surrogate

Set up the surrogate model strategy to use. See Surrogate.

"prediction": {
    "method": "kriging",
    "predictions": [[30, 4000], [35, 3550]]
}
  • predictions: set of points to predict.

  • n_jobs: Optional int, the number of jobs to run in parallel. If not passed, n_jobs will be the result of: psutil.cpu_count() => can cause problem.

  • method: method used to generate a snapshot one of rbf (Radial Basic Function), kriging, pc (polynomial chaos expension), evofusion, mixture, LinearRegression, LogisticRegression, LogisticRegressionCV, PassiveAggressiveRegressor, SGDRegressor, TheilSenRegressor, DecisionTreeRegressor, GradientBoostingRegressor, AdaBoostRegressor, RandomForestRegressor or ExtraTreesRegressor method.

For kriging:
  • kernel: Optional string, kernel to use. Ex: "ConstantKernel() + Matern(length_scale=1., nu=1.5)"

  • noise: Optional number or boolean, noise level as boolean or as a float

  • global_optimizer: Optional boolean, whether to do global optimization, or gradient based optimization to estimate hyperparameters

For pc:
  • strategy: Required string, either using quadrature or least square one of Quad or LS

  • degree: Required integer, the polynomial degree

  • sparse_param: Optional object, Parameters for the Sparse Cleaning Truncation Strategy and/or hyperbolic truncation of the initial basis.

    • max_considered_terms: Optional integer, maximum considered terms

    • most_significant: Optional integer, most siginificant number to retain

    • significance_factor: Optional number, fignificance factor

    • hyper_factor: Optional number, factor for hyperbolic truncation strategy

Note

When using pc, the sampling must be set to a list of distributions.

For evofusion:
  • cost_ratio: Required number, cost ratio in terms of function evaluation between high and low fidelity models

  • grand_cost: Required integer, total cost of the study in terms of number of function evaluation of the high fidelity model

For mixture:
  • local_method: Optional list of dict, List of local surrrogate models for clusters or None for Kriging local surrogate models.

  • classifier: Optional string, classifier from sklearn (supervised machine learning)

  • clusterer: Optional string, clusterer from sklearn (unsupervised machine learning)

  • pca_percentage: Optional number, percentage of information kept for PCA (minimum 0, maximum 1)

For LinearRegression, LogisticRegression, LogisticRegressionCV, PassiveAggressiveRegressor, SGDRegressor, TheilSenRegressor, DecisionTreeRegressor, GradientBoostingRegressor, AdaBoostRegressor, RandomForestRegressor or ExtraTreesRegressor:
  • regressor_options: Optional string, parameter of the associated sci-kit learn regressor

Note

We can fill directly the number of points into the brackets or indirectly using the script prediction.py located in test_cases/Snippets.

Optionnal Block 4 - UQ

Uncertainty Quantification (UQ), see UQ.

"uq": {
    "test": "Channel_Flow"
    "sample": 1000,
    "method": "sobol"
    "pdf": ["Normal(4035., 400)", "Uniform(15, 60)"],
    "type": "aggregated",
}
  • test: Optional string;, use a test method for indices comparison and quality calculation. Use one of: Rosenbrock, Michalewicz, Ishigami, G_Function, Channel_Flow

  • sample: Required integer, number of points per sample to use for SA

  • method: Required string, type of Sobol analysis: sobol, FAST (Fourier Amplitude Sensitivity Testing). If FAST, no second-order indices are computed and defining a surrogate model is mandatory

  • type: Required string, type of indices: aggregated or block

  • pdf: Required array, Probability density function for uncertainty propagation. Enter the PDF of the inputs, as list of openturns distributions. Ex: x1-Normal(mu, sigma), x2-Uniform(inf, sup) => ["Uniform(15., 60.)", "Normal(4035., 400.)"]

Optionnal Block 5 - POD

POD (or Proper Orthogonal Decomposition) is a approach to help reduce amount of data.

"pod": {
   "dim_max": 100,
   "tolerance": 0.99,
   "type": "static"
}
  • tolerance: Required number, tolerance of the modes to be kept. A percentage of the sum of the singular values, values that account for less than this tolerance are ignored,

  • dim_max: Required integer, maximum number of modes to be kept,

  • type: required string, type of POD to perform: static or dynamic.

The dynamic POD allows to update the POD once a snapshot is availlable. Hence a POD can be restarted when doing resampling for example.

Optionnal Block 6 - Visualization

Set up for the visualization options. Batman creates a response function (1 input parameter), response surfaces (2 to 4 input parameters) or a Kiviat graph (more than 4 input parameters). All settings presented here are optional. See Visualization.

"visualization": {
   "doe": true,
   "resampling": true,
   "axis_disc": [20, 20],
   "flabel": "Cost function",
   "plabels": ["X", "Y"],
   "feat_order": [1, 2],
   "ticks_nbr": 14,
   "range_cbar": [0.0, 2.3],
   "contours": [0.5, 1.0, 1.5],
}
  • bounds: Array, sample boundaries

  • doe: Boolean, if true, the Design of Experiment is represented on the response surface by black dots. Defaults value is false,

  • resampling: Boolean, if true, Design of Experiment corresponding to the resampling points are displayed in a different color. Such points are represented by red triangles. Only activates if doe is true,

  • xdata: Array, 1D discretization of the function (n_features,)

  • axis_disc: Integers, discretisation of each axis. Indicated value for the x and the y axis modify the surface resolution, while values corresponding the the 3rd and 4th parameters impact the frame number per movie and the movie number,

  • flabel: String, name of the cost function,

  • xlabels: Strings,

  • plabels: Strings, name of the input parameters to be plotted on each axis,

  • feat_order: Integers, associate each input parameter to an axis, the first indicated number corresponding to the parameter to be plotted on the x-axis, etc… A size equal to the input parameter number is expected, all integers from 1 to the parameter number should be used. Default is [1, 2, 3, 4],

  • ticks_nbr: Integer, number of ticks on the colorbar (Display n-1 colors). Default is 10,

  • range_cbar: Floats, minimum and maximum values on the colorbar,

  • contours: Floats, values of the iso-contours to be plotted on the response surface,

  • kiviat_fill: Boolean, wether to plot kiviat chart or not

  • 2D_mesh: Visualization of specific variable on a user provided 2D meshVisualization of specific variable on a user provided 2D mesh
    • fname: String, name of mesh file

    • format: String, format of the mesh file

    • xlabel: String, name of the x-axis

    • ylabel: String, name of the y-axis

    • flabels: String, names of the variables

    • vmins: String, value of the minimal output for data filtering

Driver module