Welcome to BATMANās documentation!Ā¶
Contents:
BatmanĀ¶
Batman stands for Bayesian Analysis Tool for Modelling and uncertAinty quaNtification. It is a Python module distributed under the opensource CECILLB license (MIT/BSD compatible).
batman seamlessly allows to do statistical analysis (sensitivity analysis, Uncertainty Quantification, moments) based on nonintrusive ensemble experiment using any computer solver. It relies on open source python packages dedicated to statistics (OpenTURNS and scikitlearn).
Main features are:
Design of Experiment (LHS, low discrepancy sequences, MC),
Resample the parameter space based on the physic and the sample,
Surrogate Models (Gaussian process, Polynomial Chaos, RBF, scikitlearnās regressors),
Optimization (Expected Improvement),
Sensitivity/Uncertainty Analysis (SA, UA) and Uncertainty Quantification (UQ),
Visualization in ndimensions (HDR, Kiviat, PDF),
POD for database optimization or data reduction,
Automatically manage code computations in parallel.
Full documentation is available at:
Getting startedĀ¶
A detailled example can be found in
tutorial. The folder test_cases
contains examples that you can adapt to you needs. You can find more information
about the cases within the respectives README.rst
file.
Shoud you be interested by batmanās implementation, consider reading the technical documentation.
If you encounter a bug (or have a feature request), please report it via GitLab. Or it might be you falling but āWhy do we fall sir? So we can learn to pick ourselves upā.
Last but not least, if you consider contributing checkout contributing.
Happy batman.
How to install BATMAN?Ā¶
The sources are located on GitLab:
Latest releaseĀ¶
Install and update using pip
:
pip install U otbatman
batman is also distributed through conda
, on the condaforge
channel.
To install conda:
wget https://repo.continuum.io/miniconda/Miniconda3latestLinuxx86_64.sh
bash Miniconda3latestLinuxx86_64.sh
To create a new environment and install batman through conda
:
conda create n bat_env c condaforge batman
All the above dependencies are automatically handled by conda
,
except Antares. For more information, refer
to its documentation.
From sourcesĀ¶
Using the latest python version is prefered! Then to install:
git clone git@gitlab.com:cerfacs/batman.git
cd batman
python setup.py install
python setup.py test
python setup.py build_sphinx
The latter is optionnal as it build the documentation. The testing part is also optionnal but is recommanded. (<30mins depending on your configuration).
Note
If you donāt have install priviledge, add user
option after install.
But the simplest way might be to use pip or a conda environment.
If batman has been correctly installed, you should be able to call it simply:
batman h
Warning
Depending on your configuration, you might have to export your local path:
export PATH=$PATH:~/.local/bin
. Care to be taken with both your PATH
and PYTHONPATH
environment variables. Make sure you do not call different
installation folders. It is recommanded that you leave your PYTHONPATH
empty.
DependenciesĀ¶
The required dependencies are:
Python >= 2.7 or >= 3.4
OpenTURNS >= 1.10
scikitlearn >= 0.18
numpy >= 1.13
scipy >= 0.15
pathos >= 0.2
matplotlib >= 2.1
Paramiko >= 2.4
Appart from OpenTURNS, required dependencies are satisfied by the installer. Optionnal dependencies are:
Antares for extra IO options
sphinx >= 1.4 for documentation
ffmpeg for movie visualizations (n_features > 2)
Testing dependencies are:
Extra testing flavours:
Note
OpenTURNS and ffmpeg are available on conda through the condaforge channel.
Help and SupportĀ¶
CommunityĀ¶
If you use batman, come and say hi at https://batmancerfacs.zulipchat.com. Or send us an email. We would really appreciate that as we keep record of the users!
Release historyĀ¶
Version 1.9  PennyworthĀ¶
New featuresĀ¶
Sobolā indices on a 2D map with
visualization.mesh_2D()
, by Robin Campet,Stacking and coloring option for
visualization.Kiviat
with keywordsstack_order
andcbar_order
, by Pamphile Roy,MST criteria with
space.Space.mst()
, by Pamphile Roy,
visualization.density
for moment independant sensitivity measures, by Pamphile Roy,COSI indices
uq.cosi()
, by Pamphile Roy,
space.gp_sampler.GpSampler
, by Matthias De Lozzo.
surrogate.Mixture
to construct mixture of expert models, by Remi Macadre,
functions.DbGeneric
to sample from a generic dataset.
EnhancementsĀ¶
Ensure sample/data pairs on reading, by Pamphile Roy,
surrogate.SurrogateModel
I/O consistency, by Pamphile Roy,Different options for discrepancy computation (MD, WD), by Pamphile Roy,
Sobolā indices polar visualization with keyword
polar
, by Pamphile Roy,Refactor
pod.Pod
, by Pamphile Roy.
Bug fixesĀ¶
Driver with multifidelity, by Pamphile Roy,
Hypercube for refinement, by Pamphile Roy,
POD averaging, by Pamphile Roy.
Version 1.8  FalconeĀ¶
New featuresĀ¶
Quantile dot plot visualization option for the PDF, by Pamphile Roy,
Sparce methods and hyperbolic truncation for PC with new keyword
SparseLS
, by Andrea Trucchia,New
space.Sample
container. Samples are storred collectivelly. by Cyril Fournier,New provider with
tasks.remote_executor.RemoteExecutor
, which enable snapshots to be remotelly computed. To be set with new keywordhosts
, by Cyril Fournier,And
tasks.remote_executor.MasterRemoteExecutor
which allows multiple remote hosts. It handles load balancing with keywordweight
, by Pamphile Roy.
EnhancementsĀ¶
Write DoE only once in the driver, by Robin Campet,
Replace OpenTURNSā POD with uptodate methods, by Julien Schueller.
Bug fixesĀ¶
Set seed for OpenTURNS, by Pamphile Roy,
Pickling issues with LS, by Pamphile Roy,
GP pickling in functional cases, by Pamphile Roy,
Restart from file using
discover
, by Robin Campet.
Version 1.7.3  LuciusĀ¶
Note
This version includes all comments from JOSSā reviewers. See the review.
New featuresĀ¶
Refactor
input_output
. Remove fortran and greatly simplify IO handling. by Cyril Fournier,Add
extremum
option in settings for resampling, by Pamphile Roy,Add
surrogate.SklearnRegressor
as an interface to all ScikitLearn regressors. Available throughmethod
in surrogateās settings. by Pamphile Roy.
EnhancementsĀ¶
Do not compute quality for optimization and discrepancy, by Pamphile Roy,
Reduce bounds amplitude and add warning for convergence, by Pamphile Roy.
Bug fixesĀ¶
Remove documentation from gitlab pages, by Pamphile Roy.
Version 1.7.2  LuciusĀ¶
New featuresĀ¶
Refactor
tasks.snapshot
. Settings have been simplified and code maintenance has been eased. by Cyril Fournier,Add new visualization
visualization.Tree
for 2D, by Pamphile Roy,Add
global_optimizer
option in settings for Kriging, by Pamphile Roy,Move documentation to read the docs, by Pamphile Roy,
Add documentation for MASCARET and PCE, by Matthias De Lozzo.
EnhancementsĀ¶
Export
visualization.Kiviat
as a mesh, by Pamphile Roy,Some visualization for MASCARET, by Sophie Ricci.
Bug fixesĀ¶
Point mixing when snapshots already exists, by Pamphile Roy,
Outlier are computed only once in fHOPs, by Pamphile Roy,
PDF scaling, by Pamphile Roy,
Legend list for forward compatibility with MPL, by Pamphile Roy,
Range color bar, visualization ticks default, by Pamphile Roy,
Driver exceptions, by Pamphile Roy,
Encoding errors in schema with python 3.5, by Pamphile Roy,
Settings checking was not effective, by Pamphile Roy.
Version 1.7.1  LuciusĀ¶
New featuresĀ¶
Add a
fill
option invisualization.Kiviat
, by Pamphile Roy,Add
bounds
option in visualization settings, by Pamphile Roy,
visualization.Kiviat
automatically used by driver if dim > 4, by Robin Campet,Allow duplicate points in
space.Space
, by Pamphile Roy.
EnhancementsĀ¶
Visualization settings taken into account for PDF, legend outside, by Pamphile Roy,
Refactor
space.Space
error handling, by Pamphile Roy,Documentation reorganization, by JeanChristophe Jouhaud.
Bug fixesĀ¶
visualization.Kiviat
filling and ordering, by Pamphile Roy,Maths in documentation as PNG, by Pamphile Roy,
Projection strategy in
surrogate.PC
, by Pamphile Roy,Circular imports from
functions.utils.multi_eval()
, by Sophie Ricci,Variance in LOO Q2, by Romain Dupuis,
surrogate.PC
restart with LS strategy, by Andrea Trucchia.
Version 1.7  LuciusĀ¶
New featuresĀ¶
Add
space.Space.discrepancy()
function inSpace
, by Pamphile Roy,Refactor
space.Space
,uq.UQ
, by Pamphile Roy,space.Refiner
initiate without dictionnary, by Pamphile Roy,Refactor
surrogate.PC
and add options in settings:degree
,strategy
(Quad or LS), by Pamphile Roy,Add
space.Refiner.discrepancy()
, andspace.Refiner.sigma_discrepancy()
, by Pamphile Roy,Add quality for every surrogate model, by Pamphile Roy,
Be able to bypass POD and surrogate in settings, by Pamphile Roy,
Surrogate facultative for UQ, by Pamphile Roy,
Add
visualization
with: Kiviat, DoE, HDR by Pamphile Roy,and response_surface with block
visualization
in settings, by Robin Campet,Add
distributions
in settings to set a distribution per parameter, by Pamphile Roy,Add
discrete
in settings to tell the indice of the discrete paramter, by Pamphile Roy,Add
functions.Data
for datasets with some new ones,Add optimized LHS, by Vincent Baudoui,
Add noise and kernel for Kriging in settings, by Andrea Trucchia,
Header is now a JSON file, by Cyril Fournier,
Concurrent CI, by Cyril Fournier,
pylint/pycodestyle for CI and Python2 on develop and master branches, by Pamphile Roy,
Add about section in doc, by Pamphile Roy.
EnhancementsĀ¶
Remove loops in predictors,
zip
, by Pamphile Roy,Backend overwright for matplotlib removed, by Pamphile Roy,
Remove
otwrapy
, by Pamphile Roy,JSON schema constrained for surrogate and sampling, by Pamphile Roy,
Refactor
pod.Pod
, by Pamphile Roy,Sobolā indices with ensemble, by Pamphile Roy,
Remove support for OpenTURNS < 1.8, by Pamphile Roy,
Add some options for
functions.MascaretApi
, by Pamphile Roy,Coverage and tests raised to 90%, by Pamphile Roy.
Bug fixesĀ¶
Quality with multimodes with POD, by Pamphile Roy,
List in sampling settings, by Pamphile Roy,
Restart and restart from files, by Pamphile Roy,
Other file read with restart, by Cyril Fournier,
Variance and FAST, by Pamphile Roy,
Double prompt in python 2.7, by Vincent Baudoui,
DoE as list, by Vincent Baudoui,
Inputs mocking in tests, by Pamphile Roy,
DoE diagonal scaling, by Pamphile Roy,
functions.MascaretApi
multi_eval
, by Pamphile Roy,Block indices, by Pamphile Roy,
Installation without folder being a git repository, by Cyril Fournier,
Fortran compilation, by Cyril Fournier,
Normalize output in
surrogate.Kriging
, by Pamphile Roy.
Version 1.6  SelinaĀ¶
New featuresĀ¶
Add
functions.MascaretApi
, by Pamphile Roy,Add Evofusion with
surrogate.Evofusion
, by Pamphile Roy,Add Expected Improvement with
space.Refiner.optimization()
, by Pamphile Roy,Be able to have a discrete parameter, by Pamphile Roy.
EnhancementsĀ¶
Allow
*args
and**kwargs
in@multi_eval
, by Pamphile Roy,Add some analytical functions for optimization and multifidelity tests, by Pamphile Roy,
Do not use anymore
.size
for space sizing, by Pamphile Roy,Add test for DoE, by Pamphile Roy,
Add PDFs of references to doc, by Pamphile Roy,
Refinements methods work with discrete values using an optimizer decorator, by Pamphile Roy,
Changed some loops in favor of list comprehensions, by Pamphile Roy,
Clean UI by removing prediction option, by Pamphile Roy,
Remove MPI dependencie, by Pamphile Roy.
Bug fixesĀ¶
Sensitivity indices with ndimensional output changing
Martinez
, by Pamphile Roy,A copy of the space is done for scaled points for surrogate fitting, by Pamphile Roy,
Uniform sampling was not set properly, by Pamphile Roy,
Backend for
matplotlib
is now properly switched, by Pamphile Roy,POD quality was not computed in case of varying number of modes, by Pamphile Roy.
Version 1.5  OswaldĀ¶
New featuresĀ¶
Python3 support, by Pamphile Roy,
Add
surrogate.surrogate_model.SurrogateModel
, by Pamphile Roy,Add progress bar during quality computation, by Pamphile Roy,
Use pathos for multiprocessing during LOO and Kriging. New
misc.nested_pool
allow nested pool. by Pamphile Roy,Unittests and functionnal tests using Pytest, by Pamphile Roy,
Antares wrapper used for IO, by Pamphile Roy,
OT1.8 support and use of new SA classes, by Pamphile Roy,
Add plot of aggregated indices, by Pamphile Roy,
Add snipets, by Pamphile Roy,
Add correlation and covariance matrices, by Pamphile Roy,
Add DoE visualization in ndimension, by Pamphile Roy,
Hypercube for refinement created using discrete and global optimization, by Pamphile Roy,
Merge some
PyUQ
functions and addsurrogate.PC
, by Pamphile Roy.
EnhancementsĀ¶
Rewrite
settings.json
, by Pamphile Roy,POD is now optional, by Pamphile Roy,
Use a wrapper for OT evaluations with
otwrapy
, by Pamphile Roy,Comment capability to
settings.json
, by Pamphile Roy,Doc cleanning, by Pamphile Roy,
Use
functions
to test model error, by Pamphile Roy,Remove some MPI functions, by Pamphile Roy,
Simplify hybrid navigator using generator, by Pamphile Roy.
Bug fixesĀ¶
Use of timeout option, by Pamphile Roy,
Remove
snapshots.tar
, by Pamphile Roy,FAST indices for aggregated indices, by Pamphile Roy,
Update keyword for POD, by Pamphile Roy,
Verbosity with quality, by Pamphile Roy,
Setup dependencies, by Pamphile Roy,
Some RBF cleanning, by Pamphile Roy,
Term MSE changed to sigma, by Pamphile Roy,
Snapshot
repr
, by Pamphile Roy,Add .so when packaging, by Pamphile Roy.
Version 1.4Ā¶
New featuresĀ¶
Enhance
surrogate.kriging
: adimentionize input parameters, use anisotropic kernel and use genetic algorithm for parameters optimization, by Pamphile Roy,Settings are now written in JSON and checked using a schema, by Pamphile Roy,
Ask for confirmation of output if exists: if no, ask for restarting from files, by Pamphile Roy,
Add posttreatment example scripts in
test_cases/Posttreatment
, by Pamphile Roy.
EnhancementsĀ¶
Save points of the DOE as human readable file, by Pamphile Roy,
Add branch and commit information to log, by Pamphile Roy,
Add doc for tutorial, space, surrogate and pod, by Pamphile Roy,
Change ScikitLearn to stable 0.18, by Pamphile Roy,
Restart option
r
now working properly, by Pamphile Roy,Create a
misc
which contains logging and json schema, by Pamphile Roy.
Bug fixesĀ¶
Refiner navigator loops correctly, by Pamphile Roy,
LOOCV working for multimodes, by Pamphile Roy,
Revert Q2 variance to use
eval_ref
, by Pamphile Roy,Avoid extra POD quality when using LOOCV strategies, by Pamphile Roy,
Popping space was not working properly, by Pamphile Roy.
Version 1.3Ā¶
New featuresĀ¶
Add resampling strategies with
space.refiner
. Possibilities are:None, MSE, loo_mse, loo_sobol, hybrid
, by Pamphile Roy,Computation of the error of the pod Q2 with option
q2
. Uses Kriging, by Pamphile Roy,Aggregated and block Sobolā indices are computed using a set of keywords:
aggregated
andblock
, by Pamphile Roy,Add the possibility to chose the PDF for propagation. (settings), by Pamphile Roy,
Sobolā map are computed using the keyword
aggregated
, by Pamphile Roy,A Sphinx documentation is available in:
/doc
, by Pamphile Roy.
EnhancementsĀ¶
Change command line interface parsing with
argparse
. Also removeplot
option and add output default repository, by Pamphile Roy,Installation is more Pythonic has it uses now a
setup.py
script, by Pamphile Roy,The project can be imported:
import jpod
, by Pamphile Roy,Settings are defined ones as an attribute of
driver
, by Pamphile Roy,Logger is now simpler and configuration can be changed prior installation in:
/misc/logging.json
, by Pamphile Roy,When defining a sample size for UQ, the value is used for indices and propagation, by Pamphile Roy,
The keyword
pod['quality']
correspond now to the targeted Q2, by Pamphile Roy,Add Python3 compatibility, by Pamphile Roy.
Bug fixesĀ¶
Kriging was not working with several modes, by Pamphile Roy,
Output folder for
uq
was not working, by Pamphile Roy,
NaN
for uncertainty propagation, by Pamphile Roy,Remove auto keyword from
pod['type']
, by Pamphile Roy.
Version 1.2Ā¶
New featuresĀ¶
Add uncertainty quantification capability with
uq
and the optionu
.sobol
orFAST
indices are computed on a defined sample size. Configuration is done within settings dictionnary file. Test functions are available. An output folderuq
is created and contains indices and propagation data, by Pamphile Roy,New test case
Function_3D
used to demonstrate UQ capabilities of the tool, by Pamphile Roy,Sampling is now done using the package OpenTURNS, by Pamphile Roy,
New test case
Channel_Flow
used to demonstrate 1D vector output capabilities, by Pamphile Roy,
EnhancementsĀ¶
Kriging is now done using the module
sklearn.gaussian_process
from the package ScikitLearn, by Pamphile Roy.
ContributingĀ¶
If you are reading this, first of all, thank you and welcome to this community. For everyone to have fun, every good python projects requires some guidelines to be observed.
Every code seek to be performant, usable, stable and maintainable. This can only be acheave through high test coverage, good documentation and coding consistency. Isnāt it frustrating when you cannot understand some code just because there is no documentation nor any test to assess that the function is working nor any comments in the code itself? How are you supposed to code in these conditions?
If you wish to contribute, you must comply to the following rules for your pull request to be considered.
InstallĀ¶
The procedure is similar to the enduser one but if you plan to modify the sources, you need to install it with:
python setup.py develop
This will create a simlink to your python install folder. Thus you wonāt have to reinstall the package after you modified it.
Make sure you have installed the testing dependencies as detailed in the
README
. If using conda
, you can install all dependencies with:
conda create n bat_ci c condaforge python=3 openturns matplotlib numpy pandas scipy scikitlearn pathos jsonschema paramiko sphinx sphinx_rtd_theme pytest pytestrunner mock ffmpeg pycodestyle pylint coverage
PythonĀ¶
This is a python project, not some C or Fortran code. You have to adapt your
thinking to the python style. Otherwise, this can lead to performance issues.
For example, an if
is expensive, you would be better off using a try except
construction. It is better to ask forgiveness than permission. Also, when
performing computations, care to be taken with for
loops. If you can, use
numpy operations for huge performance impact (sometimes x1000!).
Thus developers must follow guidelines from the Python Software Foundation. As a quick reference:
And for a more Pythonic code: PEP 20 Last but not least, avoid common pitfalls: Antipatterns
LinterĀ¶
Appart from normal unit and integration tests, you can perform a static analysis of the code using pylint:
pylint batman rcfile=setup.cfg ignorepatterns='gp_1d_sampler.py','RBFnet.py','TreeCut.py','resampling.py'
This allows to spot naming errors for example as well as style errors.
TestingĀ¶
Testing your code is paramount. Without continuous integration, you cannot
guaranty the quality of the code. Some minor modification on a module can have
unexpected implications. With a single commit, everything can go south!
The master
branch, and normally the develop
branch, are always on a
passing state. This means that you should be able to checkout from them an use
BATMAN without any errors.
The library pytest is used. It is simple and powerfull. Checkout their doc and replicate constructs from existing tests. If you are not already in love with it, you will soon be. All tests can be launched using:
coverage run m pytest basetemp=./TMP_CI batman/tests test_cases
This command fires coverage at the same time. The output consists in tests results and coverage report.
Note
Tests will be automatically launched when you will push your branch to the server. So you only have to run locally your new tests or the one you think you should.
GITĀ¶
You can find the development model at http://nvie.com/posts/asuccessfulgitbranchingmodel/
Please read this page and stick to it.
The master
and develop
branches are protected and dedicated to the manager only.
Release and hotfix branches are mandatory.
If you want to add a modification, create a new branch branching off develop
.
Then you can create a merge request on gitlab. From here, the fun beggins.
You can commit any change you feel, start discussions about it, etc.
Clone this copy to your local disk:
$ git clone git@gitlab.com:cerfacs/batman.git
Create a branch to hold your changes:
$ git checkout b myfeature
and start making changes. Never work in the
master
branch!Work on this copy, on your computer, using Git to do the version control. When youāre done editing, do:
$ git add modified_files $ git commit
to record your changes in Git, then push them to GitHub with:
$ git push u origin myfeature
Finally, follow these instructions to create a merge request from your fork. This will send an email to the committers.
Note
For every commit you push, the linter is launched. After that, if you want to launch all tests, you have to manually run them using the interface button.
Your request will only be considered for integration if in a finished state:
Respect python coding rules,
Maintain linting score (>9.5/10),
The branch passes all tests,
Have tests regarding the changes,
Maintain test coverage,
Have the respective documentation.
IntroductionĀ¶
A surrogate toolĀ¶
The use of Computational Fluid Dynamics (CFD) has proven to be reliable, faster and cheaper than experimental campaigns in an industrial context. However, sensitivity analysis needs a large amount of simulation which is not feasible when using complex codes that are time and resources consuming. This is even more true in LES context as we are trying to have a representative simulation. The only solution to overcome this issue is to construct a model that would estimate a given QoI in a given range. This model requires a realistic amount of evaluation of the detail code. The general procedure to construct it consists of:
 Generate a sample space:
Generate a set of data from which to run the code. A solution is called a snapshot.
 Learn the link between the input the output data:
From the previously generated set of data, we can compute a model also called a response surface. A model is build using gaussian process [Rasmussen2006] or polynomial chaos expansion [Najm2009].
 Predict a solution from a new set of input data:
The model can finaly be used to interpolate a new snapshot from a new set of input data.
Warning
The model cannot be used for extrapolation. Indeed, it has been constructed using a sampling of the space of parameters. If we want to predict a point which is not contained within this space, the error is not contained as the point is not balanced by points surrounding it. As a famous catastrophe, an extrapolation of the physical properties of an oring of the Challenger space shuttle lead to an explosion during liftoff [Draper1995].
Once this model has been constructed, using Monte Carlo sampling we can compute Sobolā indices, etc. Indeed, this model is said to be costless to evaluate, this is why the use of the Monte Carlo sampling is feasible. To increase convergence, we can still use the same methods as for the DOE.
Both Proper Orthogonal Decomposition (POD) and Kriging (PC, RBF, etc.) are techniques that can interpolate data using snapshots. The main difference being that POD compresses the data it uses to use only the relevant modes whereas Kriging method doesnāt reduce the size of the used snapshots. On the other hand, POD cannot reconstruct data from a domain missing ones [Gunes2006]. Thus, the strategy used by BATMAN consists in:
Create a Design of Experiments,
Optionaly use POD reconstruction in order to compress data,
Construct a surrogate model [on PODās coefficients],
Interpolate new data.
Content of the packageĀ¶
The BATMAN package includes:
doc
contains the documentation,batman
contains the module implementation,test_cases
contains some example.
General functionmentĀ¶
The package is composed of several python modules which are self contained within the directory batman
.
Following is a quick reference:
ui
: command line interface,space
: defines the (re)sampling space,surrogate
: constructs the surrogate model,uq
: uncertainty quantification,visualization
: uncertainty visualization,pod
: constructs the POD,driver
: contains the main functions,tasks
: defines the context to compute each snapshot from,functions
: defines usefull test functions,misc
: defines the logging configuration and the settings schema.
Using itĀ¶
After BATMAN has been installed, batman
is available as a command line tool or it can be imported in python. The CLI is defined in ui
. The module imports the package and use the function defined in driver
.
Thus BATMAN is launched using:
batman settings.json
See also
The definition of the case is to be filled in settings.json
. Refer to CLI.
An output
directory is created and it contains the results of the computation splited across the following folders:
snapshots
,surrogate
,[
predictions
],[
uq
].
Content of test_cases
Ā¶
This folder contains ready to launch examples:
Basic_function
is a simple 1input_parameter function,Michalewicz
is a 2input_parameters nonlinear function,Ishigami
is a 3input_parameters,G_Function
is a 4input_parameters,Channel_Flow
is a 2input_parameters with a functionnal output,RAE2822
is a 2input_parameters that launches an elsA case,Flamme_1D
is a 2input_parameters that launches an AVBP case.
In every case folder, there is README.rst
file that summarizes and explains it.
ReferencesĀ¶
 Rasmussen2006
CE. Rasmussen and C. Williams: Gaussian processes for machine learning. MIT Press. 2006. ISBN: 026218253X
 Najm2009
Najm, Uncertainty Quantification and Polynomial Chaos Techniques in Computational Fluid Dynamics, Annual Review of Fluid Mechanics 41 (1) (2009) 35ā52. DOI:10.1146/annurev.fluid.010908.165248.
 Gunes2006
Gunes, S. Sirisup and GE. Karniadakis: āGappydata:ToKrigornottoKrig?ā. Journal of Com putational Physics. 2006. DOI:10.1016/j.jcp.2005.06.023
 Draper1995
Draper: āAssessmentand Propagation ofModelUncertaintyā. Journal of the Royal Statistical Society. 1995.
Command Line InterfaceĀ¶
IntroductionĀ¶
The file settings.json
contains the configuration of BATMAN. It can be devided into 2 mandatory blocks and 3 optionnal block. There is no specific order to respect.
Note
A prefilled example is shown in settings.json
located in test_cases/Snippets
.
Help of the CLI can be triggered with:
batman h
usage: BATMAN [h] [version] [v] [c] [s] [o OUTPUT] [r] [n] [u] [q]
settings
BATMAN creates a surrogate model and perform UQ.
positional arguments:
settings path to settings file
optional arguments:
h, help show this help message and exit
version show program's version number and exit
v, verbose set verbosity from WARNING to DEBUG, [default: False]
c, check check settings, [default: False]
s, savesnapshots save the snapshots to disk when using a function,
[default: False]
o OUTPUT, output OUTPUT
path to output directory, [default: ./output]
r, restart restart pod, [default: False]
n, nosurrogate do not compute surrogate but read it from disk,
[default: False]
u, uq Uncertainty Quantification study, [default: False].
q, q2 estimate Q2 and find the point with max MSE, [default:
False]
Note
Fields in square brackets are optionnals.
Block 1  Space of ParametersĀ¶
First of all, we define the parameter space using an hypercube. Taking the minimal and the maximal value along all coordinates allow to describe it.
"space": {
"corners": [
[15.0, 2500.0],
[60.0, 6000.0]
],
"sampling": {
"init_size": 4,
"method": "halton"
},
"resampling":{
"delta_space": 0.08,
"resamp_size": 0,
"method": "sigma",
"hybrid": [["sigma", 4], ["loo_sobol", 2]],
"q2_criteria": 0.9
}
}
corners
: Required array, define the space using the two corners of the hypercube[[min], [max]]
,sampling
: Define the configuration of the sample. This can either be; a list of sample as an array_like of shape (n_samples, n_features); or a dictionary with the following:init_size
: Required integer, define the initial number of snapshots,method
: Required string, method to create the DoE, can be halton, sobol, sobolscrample, lhs (Latin Hypercube Sampling), lhsc (Latin Hypercube Sampling Centered), olhs (optimized LHS), faure, uniform, saltellidistributions
: Optional array, a list of distributions. Ex for two input variables:["Uniform(15., 60.)", "Normal(4035., 400.)"]
.
resampling
: Optional, to do resampling, fill this dictionarydelta_space
: Optional number, the percentage of space to shrink to not resample close to boundaries. For0.08
, the available space for resampling will be shrinked by 8%.resamp_size
: Required integer, number of point to add in the parameter space.method
: Required string, to be choosen fromdiscrepancy
,ego_discrepancy
,sigma_discrepancy
,sigma_distance
,sigma
,loo_sigma
,loo_sobol
,extrema
,hybrid
oroptimization
(ressampling method are only compatible with specific surrogate prediction method see :ref:āSpace <space>ā.hybrid
: if method ishybrid
. You have to define a generator which is a list[["method", n_snapshot]]
q2_criteria
: optional number, stopping criterion based on the quality estimation of the model.extremum
: optional string, Minimization or maximization objective: āminā, āmaxā.weights
: optional array, when the optimisation problem is composed (ex: sigma_distance), a weight factor is used to balance the influence of each function.delta_space
: optional number, shriking factor for the parameter space.
The method used to create the DoE is paramount. It ensures that that the physics will be captured correclty all over the domain of interest, see Space. All faure, halton and sobol methods are low discrepancy sequences with good filling properties. saltelli is particular as it will create a DoE for the computation of Sobolā indices using Saltelliās formulation.
When distribution is set, a join distribution is built an is used to perform an inverse transformation (inverse CDF) on the sample. This allows to have a low discrepancy sample will still following some distribution.
Regarding the resampling, all methods need a good initial sample. Meanning that the quality is about \(Q_2\sim0.5\). loo_sigma, loo_sobol
work better than sigma
in high dimentionnal cases (>2).
Warning
If using a PC surrogate model, the only possibilities are discrepancy
and extrema
. Furthermore, sampling method
must be set as a list of distributions.
Block 2  Snapshot providerĀ¶
A snapshot defines a simulation.
"snapshot": {
"max_workers": 10,
"plabels": ["x1", "x2"],
"flabels": ["F"],
"provider": {
"type": "job",
"command": "python function.py",
"context_directory": "data",
"coupling": {
"coupling_directory": "batmancoupling",
"input_fname": "samplespace.npy",
"input_format": "npy",
"output_fname": "sampledata.npy",
"output_format": "npy"
},
"clean": false
},
"io": {
"space_fname": "samplespace.npy",
"space_format": "npy",
"data_fname": "sampledata.npy",
"data_format": "npy"
}
}
max_workers
: Required integer, maximum number of simultaneous running snapshotplabels
: Required array, input parameter names (for space)flabels
: Required array, output feature names (for data)psizes
: Optional array, number of components of parametersfsizes
: Optional array, number of components of output featuresprovider
: Theprovider
defines what is a simulationtype
: Required string, define the type of provider can be function, job or command
 If type is function:
module
: Required string, python module to loadfunction
: Required string, function in module to execute for generating datadiscover
: Optional string, UNIXstyle patterns for directories with pairs of sample files to import
 If type is job:
command
: Required string, command to use to launch the scriptcontext_directory
: Required string, store every ressource required for executing a jobcoupling_directory
: Optional string, subdirectory incontext_directory
that will contain input parameters and output filecoupling
: Optional, definition of the snapshots IO files:coupling_directory
: Optional string, subdirectory incontext_directory
that will contain input parameters and output fileinput_fname
: Optional string, basename for files storing the point coordinatesplabels
input_format
: Optional string, json, csv, npy, npz or any Antares format if installed, for speed reason preferred the use of npyoutput_fname
: Optional string, basename for files storing values associated toflabels
output_format
: Optional string, json, csv, npy, npz or any Antares format if installed, for speed reason preferred the use of npy
hosts
: Optional, definition of the remote HOSTS if any:hostname
: Required string, remote host to connect toremote_root
: Required string, remote folder to create and store datausername
: Optional string, usernamepassword
: Optional string, password
clean
: Optional boolean, delete working directory after rundiscover
: Optional string, UNIXstyle patterns for directories with pairs of sample files to import
 If type is file:
file_pairs
: Required array, list of paires (space_file, data_file)discover
: Optional string, UNIXstyle patterns for directories with pairs of sample files to import
io
: Optional input output informationspace_fname
: Required string, file format for spacespace_format
: Optional string, json, csv, npy, npz or any Antares format if installed, for speed reason preferred the use of npydata_fname
: Required string, file name for datadata_format
: Optional string, json, csv, npy, npz or any Antares format if installed, for speed reason preferred the use of npy
Optionnal Block 3  SurrogateĀ¶
Set up the surrogate model strategy to use. See Surrogate.
"prediction": {
"method": "kriging",
"predictions": [[30, 4000], [35, 3550]]
}
predictions
: set of points to predict.n_jobs
: Optional int, the number of jobs to run in parallel. If not passed, n_jobs will be the result of: psutil.cpu_count() => can cause problem.method
: method used to generate a snapshot one of rbf (Radial Basic Function), kriging, pc (polynomial chaos expension), evofusion, mixture, LinearRegression, LogisticRegression, LogisticRegressionCV, PassiveAggressiveRegressor, SGDRegressor, TheilSenRegressor, DecisionTreeRegressor, GradientBoostingRegressor, AdaBoostRegressor, RandomForestRegressor or ExtraTreesRegressor method.
 For kriging:
kernel
: Optional string, kernel to use. Ex:"ConstantKernel() + Matern(length_scale=1., nu=1.5)"
noise
: Optional number or boolean, noise level as boolean or as a floatglobal_optimizer
: Optional boolean, whether to do global optimization, or gradient based optimization to estimate hyperparameters
 For pc:
strategy
: Required string, either using quadrature or least square one of Quad or LSdegree
: Required integer, the polynomial degreesparse_param
: Optional object, Parameters for the Sparse Cleaning Truncation Strategy and/or hyperbolic truncation of the initial basis.max_considered_terms
: Optional integer, maximum considered termsmost_significant
: Optional integer, most siginificant number to retainsignificance_factor
: Optional number, fignificance factorhyper_factor
: Optional number, factor for hyperbolic truncation strategy
Note
When using pc, the sampling
must be set to a list of distributions.
 For evofusion:
cost_ratio
: Required number, cost ratio in terms of function evaluation between high and low fidelity modelsgrand_cost
: Required integer, total cost of the study in terms of number of function evaluation of the high fidelity model
 For mixture:
local_method
: Optional list of dict, List of local surrrogate models for clusters or None for Kriging local surrogate models.classifier
: Optional string, classifier from sklearn (supervised machine learning)clusterer
: Optional string, clusterer from sklearn (unsupervised machine learning)pca_percentage
: Optional number, percentage of information kept for PCA (minimum 0, maximum 1)
 For LinearRegression, LogisticRegression, LogisticRegressionCV, PassiveAggressiveRegressor, SGDRegressor, TheilSenRegressor, DecisionTreeRegressor, GradientBoostingRegressor, AdaBoostRegressor, RandomForestRegressor or ExtraTreesRegressor:
regressor_options
: Optional string, parameter of the associated scikit learn regressor
Note
We can fill directly the number of points into the brackets or indirectly using the script prediction.py
located in test_cases/Snippets
.
Optionnal Block 4  UQĀ¶
Uncertainty Quantification (UQ), see UQ.
"uq": {
"test": "Channel_Flow"
"sample": 1000,
"method": "sobol"
"pdf": ["Normal(4035., 400)", "Uniform(15, 60)"],
"type": "aggregated",
}
test
: Optional string;, use a test method for indices comparison and quality calculation. Use one of: Rosenbrock, Michalewicz, Ishigami, G_Function, Channel_Flowsample
: Required integer, number of points per sample to use for SAmethod
: Required string, type of Sobol analysis: sobol, FAST (Fourier Amplitude Sensitivity Testing). If FAST, no secondorder indices are computed and defining a surrogate model is mandatorytype
: Required string, type of indices: aggregated or blockpdf
: Required array, Probability density function for uncertainty propagation. Enter the PDF of the inputs, as list of openturns distributions. Ex: x1Normal(mu, sigma), x2Uniform(inf, sup) =>["Uniform(15., 60.)", "Normal(4035., 400.)"]
Optionnal Block 5  PODĀ¶
POD (or Proper Orthogonal Decomposition) is a approach to help reduce amount of data.
"pod": {
"dim_max": 100,
"tolerance": 0.99,
"type": "static"
}
tolerance
: Required number, tolerance of the modes to be kept. A percentage of the sum of the singular values, values that account for less than this tolerance are ignored,dim_max
: Required integer, maximum number of modes to be kept,type
: required string, type of POD to perform: static or dynamic.
The dynamic POD allows to update the POD once a snapshot is availlable. Hence a POD can be restarted when doing resampling for example.
Optionnal Block 6  VisualizationĀ¶
Set up for the visualization options. Batman creates a response function (1 input parameter), response surfaces (2 to 4 input parameters) or a Kiviat graph (more than 4 input parameters). All settings presented here are optional. See Visualization.
"visualization": {
"doe": true,
"resampling": true,
"axis_disc": [20, 20],
"flabel": "Cost function",
"plabels": ["X", "Y"],
"feat_order": [1, 2],
"ticks_nbr": 14,
"range_cbar": [0.0, 2.3],
"contours": [0.5, 1.0, 1.5],
}
bounds
: Array, sample boundariesdoe
: Boolean, if true, the Design of Experiment is represented on the response surface by black dots. Defaults value is false,resampling
: Boolean, if true, Design of Experiment corresponding to the resampling points are displayed in a different color. Such points are represented by red triangles. Only activates if doe is true,xdata
: Array, 1D discretization of the function (n_features,)axis_disc
: Integers, discretisation of each axis. Indicated value for the x and the y axis modify the surface resolution, while values corresponding the the 3rd and 4th parameters impact the frame number per movie and the movie number,flabel
: String, name of the cost function,xlabels
: Strings,plabels
: Strings, name of the input parameters to be plotted on each axis,feat_order
: Integers, associate each input parameter to an axis, the first indicated number corresponding to the parameter to be plotted on the xaxis, etcā¦ A size equal to the input parameter number is expected, all integers from 1 to the parameter number should be used. Default is [1, 2, 3, 4],ticks_nbr
: Integer, number of ticks on the colorbar (Display n1 colors). Default is 10,range_cbar
: Floats, minimum and maximum values on the colorbar,contours
: Floats, values of the isocontours to be plotted on the response surface,kiviat_fill
: Boolean, wether to plot kiviat chart or not2D_mesh
: Visualization of specific variable on a user provided 2D meshVisualization of specific variable on a user provided 2D meshfname
: String, name of mesh fileformat
: String, format of the mesh filexlabel
: String, name of the xaxisylabel
: String, name of the yaxisflabels
: String, names of the variablesvmins
: String, value of the minimal output for data filtering
Driver moduleĀ¶
Driver ClassĀ¶
Defines all methods used to interact with other classes.
 Example
>> from batman import Driver
>> driver = Driver(settings, script_path, output_path)
>> driver.sampling_pod(update=False)
>> driver.write_pod()
>> driver.prediction(write=True)
>> driver.write_model()
>> driver.uq()
>> driver.visualization().

class
batman.driver.
Driver
(settings, fname)[source]Ā¶ Driver class.

__init__
(settings, fname)[source]Ā¶ Initialize Driver.
From settings, init snapshot, space [POD, surrogate].

fname_tree
= {'data': 'data.dat', 'pod': 'surrogate/pod', 'predictions': 'predictions', 'snapshots': 'snapshots', 'space': 'space', 'surrogate': 'surrogate', 'uq': 'uq', 'visualization': 'visualization'}Ā¶

func
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.driver (WARNING)>Ā¶

prediction
(points, write=False)[source]Ā¶ Perform a prediction.
 Parameters
points (
space.point.Point
or array_like (n_samples, n_features).) ā point(s) to predict.write (bool) ā whether to write snapshots.
 Returns
Result.
 Return type
array_like (n_samples, n_features)
 Returns
Standard deviation.
 Return type
array_like (n_samples, n_features)

resampling
()[source]Ā¶ Resampling of the parameter space.
Generate new samples if quality and number of sample are not satisfied. From a new sample, it regenerates the POD.

TutorialĀ¶
IntroductionĀ¶
Examples can be found in BATMANās installer subrepository testcases
. To create a new study, use the same structure as this example on the Michalewicz function:
Michalewicz

ā__ data
 ā__ script.sh
 ā__ function.py

ā__ settings.json
The working directory consists in two parts:
data
: contains all the simulation files necessary to perform a new simulation. It can be a simple python script to a complex AVBP case. The content of this directory will be copied for each snapshot. In all cases,script.sh
launches the simulation.settings.json
: contains the case setup.
See also
Find more details on every keywords in CLI section.
Finally, the folder Posttreatment
contains example scripts that perform some post treatment.
Note
The following section is a stepbystep tutorial that can be applied to any case.
BATMAN stepbystepĀ¶
Step 1: Simulation directoryĀ¶
Michalewicz functionĀ¶
For this tutorial, the Michalewicz function was choosen. It is a multimodal ddimensional function which has \(d!\) local minima  for this testcase:
where m defines the steepness of the valleys and ridges.
Note
It is to difficult to search a global minimum when \(m\) reaches large value. Therefore, it is recommended to have \(m < 10\).
In this case we used the twodimensional form, i.e. \(d = 2\).
To summarize, we have the Michalewicz 2*D* function as follows:
See also
For other optimization functions, read more at this website.
Create the case for BATMANĀ¶
For each snapshot, BATMAN will copy the content of data
and add a new folder batmandata
which contains a single file point.json
. The content of this file is updated per snapshot and it only contains the input parameters to change for the current simulation. Hence, to use Michalewiczās function with BATMAN, we need to have this file read to gather input parameters.
Aside from the simulation code and this headers, there is a data/script.sh
. It is this script that is launched by BATMAN. Once it is completed, the computation is considered as finished. Thus, this script manages an AVBP launch, calls a python script, etc.
In the end, the quantity of interest has to be written in tecplot format within the repository cfdoutputdata
.
Note
These directoriesā name and path are fully configurables.
Note
For a simple function script, you can pass it directly in the settings file.
Step 2: Setting up the caseĀ¶
BATMANās settings are managed via a python file located in scripts
. An example template can be found within all examples directory. This file consists in five blocks with different functions:
Block 1  Space of ParametersĀ¶
The space of parameters is created using the two extrem points of the domain here we have \(x_1, x_2 \in [1, \pi]^2\). Also we want to make 50 snapshots using a halton sequence.
"space": {
"corners": [
[1.0, 1.0],
[3.1415, 3.1415]
],
"sampling": {
"init_size": 50,
"method": "halton"
}
},
Block 2  Snapshot providerĀ¶
Then, we configure the snapshot itself. We define the name of the header and output file as well as the dimension of the output. Here BATMAN will look at the variable F
, which is a scalar value..
"snapshot": {
"max_workers": 10,
"plabels": ["x1", "x2"],
"flabels": ["F"],
"provider": {
"type": "job",
"command": "python function.py",
"context_directory": "data",
"coupling": {
"coupling_directory": "batmancoupling",
"input_fname": "samplespace.npy",
"input_format": "npy",
"output_fname": "sampledata.npy",
"output_format": "npy"
},
"clean": false
},
"io": {
"space_fname": "samplespace.npy",
"space_format": "npy",
"data_fname": "sampledata.npy",
"data_format": "npz"
}
},
Note
For a simple function script, you can pass it directly in the settings file:
"provider": "function"
with function
the name of the file containing the function. For an example, see test_cases/Ishigami
.
Block 3  PODĀ¶
In this example, a POD is not necessary as it will result in only one mode. However, its use is presented. We can control the quality of the POD, chose a resampling strategy, etc.
"pod": {
"dim_max": 100,
"tolerance": 0.99,
"type": "static"
},
Block 4  SurrogateĀ¶
A model is build on the snapshot matrix to approximate a new snapshot. The Kriging method is selected. To construct a response surface, we need to make predictions.
surrogate = {
'method' : 'kriging',
'predictions' : [[1., 2.], [2., 2.]],
},
To fill in easily predictions
, use the script prediction.py
.
Block 5  UQĀ¶
Once the model has been created, it can be used to perform a statistical analysis. Here, Sobolā indices are computed using Sobolās method using 50000 samples. Because Michalewicz is referenced in Batman we can use the option ātestā.
"uq": {
"sample": 50000,
"test": "Michalewicz"
"pdf": ["Uniform(1., 3.1415)", "Uniform(1., 3.1415)"],
"type": "aggregated",
"method": "sobol"
}
Step 3: Running BATMANĀ¶
To launch BATMAN, simply call it with:
batman settings.json qsu
BATMANās log are found within BATMAN.log
. Here is an extract:
BATMAN main ::
POD summary:
modes filtering tolerance : 0.99
dimension of parameter space : 2
number of snapshots : 50
number of data per snapshot : 1
maximum number of modes : 100
number of modes : 1
modes : [ 2.69091785]
batman.pod.pod ::
pod quality = 0.45977, max error location = (3.0263943749999997, 1.5448927777777777)
 Sobol' indices 
batman.uq ::
Second order: [array([[ 0. , 0.06490131],
[ 0.06490131, 0. ]])]
batman.uq ::
First order: [array([ 0.43424729, 0.49512012])]
batman.uq ::
Total: [array([ 0.51371718, 0.56966205])]
In this example, the quality of the model is estimated around \(Q_2\sim 0.46\) which means that the model is able to represents around 46% of the variability of the quantity of interest. Also, from Sobolā indices, both parameters appears to be as important.
PosttreatmentĀ¶
Result files are separated in 4 directories under output
:
Case

__ data

__ settings.json

__ output

__ surrogate

__ predictions

__ snapshots

__ uq
snapshots
contains all snapshots computations, predictions
contains all predictions, surrogate
contains the model and uq
contains the statistical analysis. Using predictions we can plot the response surface of the function as calculated using the model:
It can be noted that using 50 snapshots on this case is not enought to capture all the nonlinearities of the function.
Note
Usually, physical phenomena are smoother. Thus, less points are needed for a 2 parameters problem when dealing with real physics.
Refinement strategiesĀ¶
In this case, the error was fairly high using 50 snapshots. A computation with 50 snapshots using 20 refinement points have been tried. To use this functionnality, the resampling dictionary has to be added:
"resampling":{
"delta_space": 0.08,
"resamp_size": 20,
"method": "loo_sigma",
"q2_criteria": 0.8
}
This block tells BATMAN to compute a maximum of 20 resampling snapshots in case the quality has not reach 0.8. This loo_sigma
strategy uses the information of the model error provided by the gaussian process regression. This leads to an improvement in the error with \(Q_2 \sim 0.71\).
Using a basic sigma
technique with again 20 new snapshots, the error is \(Q_2 \sim 0.60\).
In this case, loo_sigma
method performed better but this is highly case dependent.
Sampling the Space of ParametersĀ¶
Design of ExperimentsĀ¶
Whatever method is used, the first step consists in defining how we are going to modify input variables to retrieve the evolution of the response surface. This is called a Design of Experiments (DoE) as defined by [Sacks1989]. The parameter space is called a Space
:
space = batman.space.Space([[1, 1], [3, 3]])
space.sampling(10, 'halton')
space.write('.')
The quality of the DoE is paramount as it determines the physics that will be observed. If the space is not filled properly, homogeneously, we can bias our analysis and retrieve only a particular behaviour of the physic. This concept has been extensively been used in experiments, especially the oneatatime design, which consists of only changing only one parameter at a time. Doing so, the space is not filled properly and only simple behaviours can be recovered. In order to assess the quality of the sampling, the discrepancy is usually used. It is an indicator of the distance between the points within the parameters space. The lower the discrepancy is, the better the design is. This information can be used to optimize a DoE Among all formulations of this criterion, the centered discrepancy is the most robust one [Damblin2013]. This information can be computed from the space:
space.discrepancy()
As stated before, the golden standard would be to perform a Monte Carlo sampling but it would require a huge sampling which is unfeasible with costly numerical simulations. Therefore are found random (or quasirandom) sampling methods. Low discrepancy sequence has been designed to overcome this issue. These designs are built upon a pattern, a sequence, depending on factors such as prime numbers. This allows a fast generation of sampling space with good properties. A wellknown method is the Latin Hypercube Sampling (LHS). The idea behind is to discretize the space to get a regular grid and sample randomly a point per zone.
In Damblin et al. [Damblin2013] a comprehensive analysis of most common DOE is found. In the end, the Sobolā or Halton DOE are sufficient when dealing with a small number of parameters (<5). With an increasing number of parameters, patterns start to appear and optimized LHS are required.
Resampling the parameters spaceĀ¶
There are several methods for refining, resampling, the parameter space. In [Scheidt], the classical methods are reviewed and a framework combining several methods is proposed. In [Roy2017], we added some methods that peforme better in high dimentionnal cases. In BATMAN, the following methods are available:
Variance (\(\sigma\)), As stated in Surrogate, one of the main advantages of Gaussian processes over other surrogates is to provide an insight into the variance of the solution. The first method consists in using this data and weight it with the eigenvalues of the POD:
\[\sum_{i=1}^k \sigma_i^2 \times \mathbb{V}[f(\mathbf{x}_*)]_{i}.\]Global optimization on this indicator gives the new point to simulate.
Variance and distance (sigma_distance), The use of sigma resampling method can create resampling point really close if not coincident to the existing sampling points. A solution is to add in the optimized function, the inverse of the distance to the closest sampling point in the optimization problem. A weight factor can be used to weigh the influence of distance over \(\sigma\).
Variance and distance (sigma_discrepancy), The use of sigma resampling method can create resampling point really close if not coincident to the existing sampling points. A solution is to add in the optimized function, the inverse of the space discrepancy in the optimization problem. A weight factor can be used to weigh the influence of discrepancy over \(\sigma\).
LeaveOneOut (LOO) and \(\sigma\), A LOO is performed on the POD and highlights the point where the model is the most sensitive. The strategy here is to add a new point around it. Within this hypercube, a global optimization over \(\sigma\) is conduced giving the new point.
LOOSobolā, Using the same steps as with the LOO  \(\sigma\) method, the hypercube around the point is here truncated using prior information about Sobolā indicessee UQ. It requires that indices be close to convergence not to bias the result. Or the bias can be intentional depending on the insight we have about the case.
Extrema, This method will add 4 points. First, it look for the point in the sample which has the min value of the QoI. Within an hypercube, it add the minimal and maximal predicted values. Then it do the same for the point of the sample which has the max value of the QoI. This method allows to capture the gradient around extrem values.
Hybrid, This last method consists of a navigator composed by any combination of the previous methods.
Discrepancy. Simply add a point that minimize the discrepancy.
It is fairly easy to resample the parameter space. From a space and a surrogate:
new_point = space.refine(surrogate)
HypercubeĀ¶
The hypercube is defined by the cartesian product of the intervals of the \(n\) parameters i.e. \([a_i, b_i]^n\). The constrained optimization problem can hence be written as:
Moreover, a maximum cubevolume aspect ratio is defined in order to preserve the locality. This gives the new constrain
with \(\epsilon = 1.5\) is set arbitrarily to prevent too elongated hypercubes. The global optimum is found using a twostep strategy: first, a discrete optimization using \(\mathcal{P}\) gives an initial solution; second a basinhopping algorithm finds the optimum coordinates of the hypercube. In case of the LOOSobolā method, the hypercube is truncated using the total order Sobolā indices.
Efficient Global Optimization (EGO)Ā¶
In the case of a surrogate model based on a gaussian process, Efficient Global Optimization (EGO) [Jones1998] algorithm can be used to resample the parameter space in directive to an optimization. It comprises as a tradeoff between the actual minimal value \(f_{min}\) and an expected value given by the standard error \(s\) for a given prediction \(\hat{y}\). The expected improvement is defined as:
with \(\phi(.), \Phi(.)\) the standard normal density and distribution function. Using the fact that this quantify is monotonic in \(\hat{y}\) and \(s\), it reduces to the probability of improvement:
ReferencesĀ¶
 Damblin2013(1,2)
Damblin, M. Couplet, B. Iooss: Numerical studies of space filling designs : optimization of Latin Hypercube Samples and subprojection properties. Journal of Simulation. 2013
 Sacks1989
Sacks et al.: Design and Analysis of Computer Experiments. Statistical Science 4.4. 1989. DOI: 10.1214/ss/1177012413
 Scheidt
Scheidt: Analyse statistique dāexpĆ©riences simulĆ©es : ModĆ©lisation adaptative de rĆ©ponses non rĆ©guliĆØres par Krigeage et plans dāexpĆ©riences, Application Ć la quantification des incertitudes en ingĆ©nierie des rĆ©servoirs pĆ©troliers. UniversitĆ© Louis Pasteur. 2006
 Roy2017
P.T. Roy et al.: Resampling Strategies to Improve Surrogate Modelbased Uncertainty Quantification  Application to LES of LS89. IJNMF. 2017
 Jones1998
Jones et al.: Efficient Global Optimization of Expensive BlackBox Functions. Journal of Global Optimization 1998. DOI: 10.1023/a:1008306431147
Surrogate modelĀ¶
GeneralitiesĀ¶
A common class is used to manage surrogate models. Hence, several kind of surrogate model strategies can be used:
predictor = batman.surrogate.SurrogateModel('kriging', corners)
predictor.fit(space, target_space)
predictor.save('.')
points = [(12.5, 56.8), (2.2, 5.3)]
predictions = predictor(points)
From Kriging to Gaussian ProcessĀ¶
Kriging, a geostatistical methodĀ¶
Kriging is a geostatistical interpolation method that use not only the distance between the neighbouring points but also the relationships among these points, the autocorrelation. The method has been created by D.G. Krige [Krige1989] and has been formalized by G. Matheron [Matheron1963].
In order to predict an unmeasured location \(\hat{Y}\), interpolation methods use the surrounding measured values \(Y_i\) and weight them:
The advantage of this method is that the interpolation is exact at the sampled points and that it gives an estimation of the prediction error. Ordinary Kriging consists in the Best Linear Unbiased Predictor (BLUP) [Robinson1991]:
 Best
It minimizes the variance of the predicted error \(Var(\hat{Y}  Y)\),
 Linear
A linear combination of the data,
 Unbiased
It minimizes the mean square error \(E[\hat{Y}  Y]^2\) thus \(\sum_{i=1}^{N} \lambda_i(x)=1\),
 Predictor
It is an estimator of random effects.
\(\lambda_i\) are calculated using the spatial autocorrelation of the data, it is a variography analysis. Plots can be constructed using semivariance, covariance or correlation. An empirical variogram plot allows to see the values that should be alike because they are close to each other cite{Bohling2005}. The empirical semivariogram is given by:
A fitting model is then applied to this semivariogram. Hence, the variability of the model is inferior to dataās. Kriging smooths the gradients. The exponential model is written as:
with \(C\) the correlation matrice and the parameter \(r\) is optimized using the sample points.
A model is described using:
 Sill
It corresponds to the maximum of \(\gamma\). It defines the end of the range.
 Range
It is the zone of correlation. If the distance is superior to the range, there is no correlation, whereas if the distance is inferior to it, the sample locations are autocorrelated.
 Nugget
If the distance between the points is null, \(\gamma\) should be null. However, measurement errors are inherent and cause a nugget effect. It is the yintercept of the model.
Once the model is computed, the weights are determined to use the MSE condition and gives:
\(K\) being the covariance matrix \(K_{i,j} = C(Y_iY_j)\) and \(k\) being the covariance vector \(k_i = C(Y_iY)\) with the covariance \(C(h) = C(0)  \gamma(h) = Sill\gamma(h)\).
Furthermore we can express the field \(Y\) as \(\hat{Y} = R(S) + m(S)\) which is the residual and the trend components [Bohling2005]. Depending on the treatment of the trend, there are several Kriging techniques (ordinary Kriging being the most used):
 Simple
The variable is stationary, the mean is known,
 Ordinary
The variable is stationary, the mean is unknown,
 Universal
The variable is nonstationary, there is a tendency.
Ordinary Kriging is the most used method. In this case, the covariance matrix is augmented:
Once the weights are computed, its dot product with the residual \(R_i=Y_im\) at the known points gives the residual \(R(S)\). Thus we have an estimation of \(\hat{Y}\). Finally, the error is estimated by the second order moment:
Some care has to be taken with this estimation of the variance. Being a good indicator of the correctness of the estimation, this is only an estimation of the error based upon all surrounding points.
Gaussian ProcessĀ¶
There are two approaches when dealing with regression problems. In simple cases, we can use simple functions in order to approximate the output set of data. On the other hand, when dealing with complex multidimensional problems with strong nonlinearity, there are infinite possibilities of functions to consider. This is where the Gaussian process comes in.
As stated by Rasmussen et al. in [Rasmussen2006], a process is a generalization of a probability distribution of functions. When dealing with Gaussian processes, they can simply be fully defined using the mean and covariance of the functions:
Starting from a prior distribution of functions, it represents the belief we have on the problem. Without any assumption, the mean would be null. If we are now given a dataset \(D = \{(x_1, y_1), (x_2, y_2)\}\), we only consider the function that actually pass through or close to these points, as in the previous figure. This is the learning phase. The more points are added, the more the model will fit the function. Indeed, as we add observations, the error is reduced at these points.
The nature of the covariance matrix is of great importance as it fixes the properties of the functions to consider for inference. This matrix is also called kernel. Many covariance functions exist and they can be combined to fit specific needs. A common choice is the squared exponential covariance kernel:
with \(l\) the length scale, an hyperparameter, which depends on the magnitudes of the parameters. When dealing with a multidimensional case and nonhomogeneous parameters, it is of prime importance to adimentionize everything as one input could bias the optimization of the hyperparameters.
Then the Gaussian process regression is written as a linear regression
One of the main benefit of this method, is that it provides an information about the variance
The Kriging method is one of the most employed as of today. We can even enhance the result of the regression if we have access to the derivative (or even the hessian) of the function [Forrester2009]. This could be even more challenging if we donāt have an adjoint solver to compute it. Another method is to use a multifidelity metamodel in order to obtain an even better solution. This can be performed if we have two codes that compute the same thing or if we have two grids to run from.
MultifidelityĀ¶
It is possible to combine several level of fidelity in order to lower the computational cost of the surrogate building process. The fidelity can be either expressed as a mesh difference, a convergence difference, or even a different set of solvers. [Forrester2006] proposed a way of combining these fidelities by building a low fidelity model and correct it using a model of the error:
with \(\hat{f}_{\epsilon}\) the surrogate model representing the error between the two fidelity levels. This method needs nested design of experiments for the error model to be computed.
Considering two levels of fidelity \(f_e\) and \(f_c\), respectively an expensive and a cheap function expressed as a computational cost. A cost ratio \(\alpha\) between the two can be defined as:
Using this cost relationship an setting a computational budget \(C\), it is possible to get a relation between the number of cheap and expensive realizations:
As the design being nested, the number of cheap experiments must be strictly superior to the number or expensive ones. Indeed, the opposite would result in no additional information to the system.
ReferencesĀ¶
 Krige1989
D.G. Krige, et al. āEarly South African geostatistical techniques in todayās perspectiveā. Geostatistics 1. 1989.
 Matheron1963
Matheron. āPrinciples of Geostatisticsā. Economic Geology 58. 1963.
 Robinson1991
G.K.Robinson.āThat BLUP is a good thing: the estimation of random effectsā. Statistical Science 6.1. 1991. DOI: 10.1214/ss/1177011926.
 Bohling2005
Bohling. āKrigingā. Tech.rep. 2005.
 Forrester2006
Forrester, Alexander I.J, et al. āOptimization using surrogate models and partially converged computational fluid dynamics simulationsā. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science. 2006. DOI: 10.1098/rspa.2006.1679
 Forrester2009
Forrester and A.J. Keane.āRecent advances in surrogatebased optimizationā. Progress in Aerospace Sciences 2009. DOI: 10.1016/j.paerosci.2008.11.001
Uncertainty QuantificationĀ¶
What is UncertaintyĀ¶
As it can be infered from the name, Uncertainty Quantification (UQ) aims at undestanding the impact of the uncertainties of a system. Uncertainties can be decomposed in two parts:
Aleatoric: intrinsic variability of a system,
Epistemic: lack of knowledge, models errors.
The aleatoric part is the one we seek to measure. For example, looking at an airfoil, if we change the angle of attack, some change are expected on the lift and drag. On the other hand, the epistemic part represent our bias. Using RANS models, the turbulence is entirelly modeledāas opposed to LES where we compute most of itāso we might miss some phenomena.
Then, there are three kind of uncrtainty study:
Uncertainty Propagation: observe the response of the system to perturbed inputs (PDF, response surface),
Sensitivity Analysis: measure the respective importance of the input parameters,
Risk Assessment: get the probability to exceed a threshold.
In any case, from perturbed input we are looking at the output response of a quantity of interest .
See also
The Visualization module is used to output UQ.
Sobolā indicesĀ¶
There are several methods to estimate the contribution of different parameters on quantities of interest [iooss2015]. Among them, sensitivity methods based on the analysis of the variance allow to obtain the contribution of the parameters on the QoIās variance [ferretti2016]. Here, classical Sobolā [Sobol1993] method is used which gives not only a ranking but also quantifies the importance factor using the variance. This method only makes the hypothesis of the independence of the input variables. It uses a functional decomposition of the variance of the function to explore:
with \(p\) the number of input parameters constituting \(\mathbf{x}\). This way Sobolā indices are expressed as
\(S_{i}\) corresponds to the first order term which apprises the contribution of the ith parameter, while \(S_{ij}\) corresponds to the second order term which informs about the correlations between the ith and the jth parameters. These equations can be generalized to compute higher order terms. However, the computational effort to converge them is most often not at hand [iooss2010] and their analysis, interpretations, are not simple.
Total indices represents the global contribution of the parameters on the QoI and express as:
For a functional output, Sobolā indices can be computed all along the output and retrieve a map or create composite indices. As described by Marrel [marrel2015], aggregated indices can also be computed as the mean of the indices weighted by the variance at each point or temporal step
The indices are estimated using Martinezā formulation. In [baudin2016], they showed that this estimator is stable and provides asymptotic confidence intervalsāapproximated with Fisherās transformationāfor first order and total order indices.
Uncertainty propagationĀ¶
Instead of looking at the individual contributions of the input parameters, the easyest way to assess uncertainties is to perform simulations by perturbing the input distributions using particular distributions. The quantitie of interest can then be visualized. This is called a response surface. A complementary analysis can be drawn from here as ones can compute the Probability Density Function (PDF) of the output. In order for these statistical information to be relevant, a large number of simulations is required. Hence the need of a surrogate model (see Surrogate).
ReferencesĀ¶
 iooss2015
Iooss B. and Saltelli A.: Introduction to Sensitivity Analysis. Handbook of UQ. 2015. DOI: 10.1007/9783319112596_311
 ferretti2016
Ferretti F. and Saltelli A. et al.: Trends in sensitivity analysis practice in the last decade. Science of the Total Environment. 2016. DOI: 10.1016/j.scitotenv.2016.02.133
 Sobol1993
Sobolā I.M. Sensitivity analysis for nonlinear mathematical models. Mathematical Modeling and Computational Experiment. 1993.
 iooss2010
Iooss B. et al.: Numerical studies of the metamodel fitting and validation processes. International Journal on Advances in Systems and Measurements. 2010
 marrel2015
Marrel A. et al.: Sensitivity Analysis of Spatial and/or Temporal Phenomena. Handbook of Uncertainty Quantification. 2015. DOI: 10.1007/9783319112596_391
 baudin2016
Baudin M. et al.: Numerical stability of Sobolā indices estimation formula. 8th International Conference on Sensitivity Analysis of Model Output. 2016.
Uncertainty VisualizationĀ¶
Be able to visualize uncertainty is often neglected but it is a challenging topic. Depending on the number of input parameters and the dimension of the quantitie of interest, there are several options implemented in the package.
Function or class 
Dimensionality 
Description 

Input  Output 


nscalar 
scalar, vector 
Design of Experiment 

<5 scalar 
scalar, vector 
Response surface (fig or movies) 

vector 
vector 
Median realization with PCA 

>3 scalar 
scalar, vector 
3D version of the radar/spider plot 

scalar, vector 
Output PDF 


scalar 
vector 
Correlation of the inputs and outputs 

scalar 
scalar, vector 
Sensitivity indices 
All options return a figure object that can be reuse using reshow()
.
This enables some modification of the graph. In most cases, the first parameter data
is
of shape (n_samples, n_features)
.
Response surfaceĀ¶
What is it?Ā¶
A response surface can be created to visualize the surrogate model as a function of two input parameters, the surface itself being colored by the value of the function. The response surface is automatically plotted when requesting uncertainty quantification if the number of input parameters is less than 5. For a larger number of input parameters, a Kiviat3D graph is plotted instead (see Kiviat 3D section).
If only 1 input parameter is involved, the response surface reduces to a response function. The default display is the following:
If exactly 2 input parameters are involved, it is possible to generate the response surface, the surface itself being colored by the value of the function. The corresponding values of the 2 input parameters are displayed on the x and y axis, with the following default display:
Because the response surface is a 2D picture, a set of response surfaces is generated when dealing with 3 input parameters. The value of the 3rd input parameter is fixed to a different value on each plot. The obtained set of pictures is concatenated to one single movie file in mp4 format:
Finally, response surfaces can also be plotted for 4 input parameters. A set of several movies is created, the value of the 4th parameter being fixed to a different value on each movie.
OptionsĀ¶
Several display options can be set by the user to modify the created response surface. All the available options are listed in the following table:
Option name 
Dimensionality

Default

Description


doe 
Arraylike.

None

Display the Design of Experiment on
graph, represented by black dots.

resampling 
Integer.

None

Display the n last DoE points in red
to easily identify the resampling.

xdata 
List of
real numbers.
Size = length
of the output
vector.

If output is a
scalar: None
If output is a
vector: regular
discretisation
between 0 and 1

Only used if the output is a vector.
Specify the discretisation of the
output vector for 1D response function
and for integration of the output
before plotting 2D response function.

axis_disc 
List of
integers.
One
value per
parameter.

50 in 1D
25,25 in 2D
20,20,20 in 3D
15,15,15,15 in 4D

Discretisation of the response surface
on each axis. Values of the 1st and 2nd
parameters influence the resolution,
values for the 3rd and 4th parameters
influence the number of frame per movie
and the movie number respectively.

flabel 
String.

āFā

Name of the output function.

plabels 
List of
string.
One chain per
parameter.

āx0ā for 1st dim
āx1ā for 2nd dim
āx2ā for 3rd dim
āx3ā for 4th dim

Name of the input parameters to be
on each axis.

feat_order 
List of
integers.
One value per
parameter.

1 in 1D
1,2 in 2D
1,2,3 in 3D
1,2,3,4 in 4D

Axis on which each parameter should be
plotted. The parameter in 1st position
is plotted on the xaxis and so onā¦
All integer values from 1 to the total
dimension number should be specified.

ticks_nbr 
Integer.

10

Number of ticks in the colorbar.

range_cbar 
List of
real numbers.
Two values.

Minimal and
maximal values in
output data

Minimal and maximal values in the
colorbar. Output values that are out
of this scope are plotted in white.

contours 
List of
real numbers.

None

Values of the isocontours to plot.

fname 
String.

āResponse_surface
.pdfā

Name of the response surface file(s).
Can be followed by an additional int.

ExampleĀ¶
As an example, the previous response surface for 2 input parameters is now plotted with its design of experiment, 4 of the points being indicated as a later resampling (4 red triangles amongs the black dots). Additional isocontours are added to the graph and the axis corresponding the each input parameters are interverted. Note also the new minimal and maximal values in the colorbar and the increased color number. Finally, the names of the input parameters and of the cost function are also modified for more explicit ones.
HDRBoxplotĀ¶
What is it?Ā¶
This implements an extension of the highest density region boxplot technique [Hyndman2009]. When you have functional data, which is to say: a curve, you will want to answer some questions such as:
What is the median curve?
Can I draw a confidence interval?
Or, is there any outliers?
This module allows you to do exactly this:
data = np.loadtxt('data/elnino.dat')
print('Data shape: ', data.shape)
hdr = batman.visualization.HdrBoxplot(data)
hdr.plot()
The output is the following figure:
How does it work?Ā¶
Behind the scene, the dataset is represented as a matrix. Each line corresponding to a 1D curve. This matrix is then decomposed using Principal Components Analysis (PCA). This allows to represent the data using a finit number of modes, or components. This compression process allows to turn the functional representation into a scalar representation of the matrix. In other words, you can visualize each curve from its components. With 2 components, this is called a bivariate plot:
This visualization exhibit a cluster of points. It indicate that a lot of curve lead to common components. The center of the cluster is the mediane curve. An the more you get away from the cluster, the more the curve is unlikely to be similar to the other curves.
Using a kernel smoothing technique (see PDF), the probability density function (PDF) of the multivariate space can be recover. From this PDF, it is possible to compute the density probability linked to the cluster and plot its contours.
Finally, using these contours, the different quantiles are extracted allong with the mediane curve and the outliers.
Uncertainty visualizationĀ¶
Appart from these plots. It implements a technique called Hypothetical Outcome plots (HOPs) [Hullman2015] and extend this concept to functional data. Using the HDR Boxplot, each single realisation is superposed. All these frames are then assembled into a movie. The net benefit is to be able to observe the spatial/temporal correlations. Indeed, having the median curve and some intervals does not indicate how each realisation are drawn, if there are particular patterns. This animated representation helps such analysis:
hdr.f_hops()
Another possibility is to visualize the outcomes with sounds. Each curve is mapped to a series of tones to create a song. Combined to the previous fHOPs this opens a new way of looking at data:
hdr.sound()
Note
The hdr.sound()
output is an audio wav file. A combined video
can be obtain with ffmpeg:
ffmpeg i fHOPs.mp4 i songfHOPs.wav mux_fHOPs.mp4
The gif is obtain using:
ffmpeg i fHOPs.mp4 pix_fmt rgb8 r 1 data/fHOPs.gif
Kiviat 3DĀ¶
The HDR technique is usefull for visualizing functional output but it does not give any information on the input parameter used. Radar plot or Kiviat plot can be used for this purpose. A single realisation can be seen as a 2D kiviat plot which different axes each represent a given parameter. The surface itself being colored by the value of the function.
To be able to get a whole set of sample, a 3D version of the Kiviat plot is used [Hackstadt1994]. Thus, each sample corresponds to a 2D Kiviat plot:
kiviat = batman.visualization.Kiviat3D(space, bounds, feval, param_names)
kiviat.plot()
When dealing with functional output, the color of the surface does not gives all the information on a sample as it can only display a single information: the median value in this case. Hence, the proposed approach is to combine a functionalHOPsKiviat with sound:
batman.visualization.kiviat.f_hops(fname=os.path.join(tmp, 'kiviat.mp4'))
hdr = batman.visualization.HdrBoxplot(feval)
hdr.sound()
Probability Density FunctionĀ¶
A multivariate kernel density estimation [Wand1995] technique is used to find the probability density function (PDF) \(\hat{f}(\mathbf{x_r})\) of the multivariate space. This density estimator is given by
With \(h_{i}\) the bandwidth for the i th component and \(K_{h_i}(.) = K(./h_i)/h_i\) the kernel which is chosen as a modal probability density function that is symmetric about zero. Also, \(K\) is the Gaussian kernel and \(h_{i}\) are optimized on the data.
So taking a case with a functionnal output [Roy2017], we can recover its PDF with:
fig_pdf = batman.visualization.pdf(data)
Correlation matrixĀ¶
The correlation and covariance matrices are also availlable:
batman.visualization.corr_cov(data, sample, func.x, plabels=['Ks', 'Q'])
SobolāĀ¶
Once Sobolā indices are computed , it is easy to plot them with:
indices = [s_first, s_total]
batman.visualization.sobol(indices, p_lst=['Tu', r'$\alpha$'])
In case of functionnal data [Roy2017b], both aggregated and map indices can be passed to the function and both plot are made:
indices = [s_first, s_total, s_first_full, s_total_full]
batman.visualization.sobol(indices, p_lst=['Tu', r'$\alpha$'], xdata=x)
ReferencesĀ¶
 Hyndman2009
Rob J. Hyndman and Han Lin Shang. Rainbow plots, bagplots and boxplots for functional data. Journal of Computational and Graphical Statistics, 19:2945, 2009
 Hullman2015
Jessica Hullman and Paul Resnick and Eytan Adar. Hypothetical Outcome Plots Outperform Error Bars and Violin Plots for Inferences About Reliability of Variable Ordering. PLoS ONE 10(11): e0142444. 2015. DOI: 10.1371/journal.pone.0142444
 Hackstadt1994
Steven T. Hackstadt and Allen D. Malony and Bernd Mohr. Scalable Performance Visualization for DataParallel Programs. IEEE. 1994. DOI: 10.1109/SHPCC.1994.296663
 Wand1995
M.P. Wand and M.C. Jones. Kernel Smoothing. 1995. DOI: 10.1007/9781489944931
 Roy2017b
P.T. Roy et al.: Comparison of Polynomial Chaos and Gaussian Process surrogates for uncertainty quantification and correlation estimation of spatially distributed openchannel steady flows. SERRA. 2017. DOI: 10.1007/s0047701714704
AcknowledgementĀ¶
We are gratefull to the help and support on OpenTURNS MichaĆ«l Baudin has provided.
POD for Proper Orthogonal DecompositionĀ¶
What is it?Ā¶
The Proper Orthogonal Decomposition (POD) is a technique used to decompose a matrix and characterize it by its principal components which are called modes [AnindyaChatterjee2000]. To approximate a function \(z(x,t)\), only a finite sum of terms is required:
The function \(\phi_{k}(x)\) have an infinite representation. It can be chosen as a Fourier series or Chebyshev polynomials, etc. For a chosen basis of function, a set of unique timefunctions \(a_k(t)\) arise. In case of the POD, the basis function are orthonormal. Meaning that:
The principle of the POD is to choose \(\phi_k(x)\) such that the approximation of \(z(x,t)\) is the best in a least squares sense. These orthonormal functions are called the proper orthogonal modes of the function.
When dealing with CFD simulations, the size of the domain \(m\) is usually smaller than the number of measures, snapshots, \(n\). Hence, from the existing decomposition methods, the Singular Value Decomposition (SVD) is used. It is the snapshots methods [Cordier2006].
The Singular Value Decomposition (SVD) is a factorization operation of a matrix expressed as:
with \(V\) diagonalizes \(A^TA\), \(U\) diagonalizes \(AA^T\) and \(\Sigma\) is the singular value matrix which diagonal is composed by the singular values of \(A\). Knowing that a singular value is the square root of an eigen value. \(u_i\) and \(v_i\) are eigen vectors of respectively \(U\) and \(V\) which form an orthonormal basis. Thus, the initial matrix can be rewritten:
\(r\) being the rank of the matrix. If taken \(k < r\), an approximation of the initial matrix can be constructed. This allows to compress the data as only an extract of \(u\) and \(v\) need to be stored.
ReferencesĀ¶
 AnindyaChatterjee2000
Anindya Chatterjee. āAn introduction to the proper orthogonal decompositionā. Current Science 78.7. 2000.
 Cordier2006
Cordierand M. Bergmann. āReĢduction de dynamique par deĢcomposition orthogonale aux valeurs propres (POD)ā. Ecole de printemps OCET. 2006.
API ReferenceĀ¶
This is the class and function reference of batman. Please refer to previous sections for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses.
batman.space
: Parameter spaceĀ¶

Manages the space of parameters. 

DOE class. 

Resampling the space of parameters. 
Space moduleĀ¶

class
batman.space.
Doe
(n_samples, bounds, kind, dists=None, discrete=None)[source]Ā¶ DOE class.

__init__
(n_samples, bounds, kind, dists=None, discrete=None)[source]Ā¶ Initialize the DOE generation.
In case of
kind
isuniform
,n_samples
is decimated in order to have the same number of points in all dimensions.If
kind
isdiscrete
, a join distribution between a discrete uniform distribution is made with continuous distributions.Another possibility is to set a list of PDF to sample from. Thus one can do: dists=[āUniform(15., 60.)ā, āNormal(4035., 400.)ā]. If not set, uniform distributions are used.
 Parameters
n_samples (int) ā number of samples.
bounds (array_like) ā Spaceās corners [[min, n dim], [max, n dim]]
kind (str) ā Sampling Method if string can be one of [āhaltonā, āsobolā, āfaureā, ā[o]lhs[c]ā, āsobolscrambleā, āuniformā, ādiscreteā] otherwize can be a list of openturns distributions.
dists (lst(str)) ā List of valid openturns distributions as string.
discrete (int) ā Position of the discrete variable.

generate
()[source]Ā¶ Generate the DOE.
 Returns
Sampling.
 Return type
array_like (n_samples, n_features)

logger
= <Logger batman.space.sampling (WARNING)>Ā¶


class
batman.space.
Refiner
(data, corners, delta_space=0.08, discrete=None, pod=None)[source]Ā¶ Resampling the space of parameters.

__init__
(data, corners, delta_space=0.08, discrete=None, pod=None)[source]Ā¶ Initialize the refiner with the Surrogate and space corners.
Points data are scaled between
[0, 1]
based on the size of the corners taking into account a :param:delta_space
factor. Parameters
data (
batman.surrogate.SurrogateModel
orbatman.space.Space
.) ā Surrogate or spacecorners (array_like) ā hypercube ([min, n_features], [max, n_features]).
delta_space (float) ā Shrinking factor for the parameter space.
discrete (int) ā index of the discrete variable.
pod (
batman.pod.Pod
.) ā POD instance.

discrepancy
()[source]Ā¶ Find the point that minimize the discrepancy.
 Returns
The coordinate of the point to add.
 Return type
lst(float)

distance_min
(point)[source]Ā¶ Get the distance of influence.
Compute the Chebyshev distance, max Linf norm between the anchor point and every sampling points. Linf allows to add this lenght to all coordinates and ensure that no points will be within this hypercube. It returns the minimal distance.
point
will be scaled byself.corners
so the returned distance is scaled. Parameters
point (array_like) ā Anchor point.
 Returns
The distance to the nearest point.
 Return type
float.

extrema
(refined_points)[source]Ā¶ Find the min or max point.
Using an anchor point based on the extremum value at sample points, search the hypercube around it. If a new extremum is found,it uses NelderMead method to add a new point. The point is then bounded back by the hypercube.
 Returns
The coordinate of the point to add
 Return type
lst(float)

func
(coords, sign=1)[source]Ā¶ Get the prediction for a given point.
Retrieve Gaussian Process estimation. The function returns plus or minus the function depending on the sign. 1 if we want to find the max and 1 if we want the min.

hybrid
(refined_points, point_loo, method, dists)[source]Ā¶ Composite resampling strategy.
Uses all methods one after another to add new points. It uses the navigator defined within settings file.
 Parameters
 Returns
The coordinate of the point to add
 Return type
lst(float)

hypercube_distance
(point, distance)[source]Ā¶ Get the hypercube to add a point in.
Propagate the distance around the anchor.
point
will be scaled byself.corners
and input distance has to be already scalled. Ensure that new values are bounded by corners. Parameters
point (array_like) ā Anchor point.
distance (float) ā The distance of influence.
 Returns
The hypercube around the point.
 Return type
array_like.

hypercube_optim
(point)[source]Ā¶ Get the hypercube to add a point in.
Compute the largest hypercube around the point based on the L2norm. Ensure that only the leaveoneout point lies within it. Ensure that new values are bounded by corners.
 Parameters
point (np.array) ā Anchor point.
 Returns
The hypercube around the point (a point per column).
 Return type
array_like.

leave_one_out_sigma
(point_loo)[source]Ā¶ Mixture of Leaveoneout and Sigma.
Estimate the quality of the POD by leaveoneout cross validation (LOOCV), and add a point arround the max error point. The point is added within an hypercube around the max error point. The size of the hypercube is equal to the distance with the nearest point.

leave_one_out_sobol
(point_loo, dists)[source]Ā¶ Mixture of Leaveoneout and Sobolā indices.
Same as function
leave_one_out_sigma()
but change the shape of the hypercube. Using Sobolā indices, the corners are shrinked by the corresponding percentage of the total indices.

logger
= <Logger batman.space.refiner (WARNING)>Ā¶

optimization
(method='EI', extremum='min')[source]Ā¶ Maximization of the Probability/Expected Improvement.

pred_sigma
(coords)[source]Ā¶ Prediction and sigma.
Same as
Refiner.func()
andRefiner.func_sigma()
. Function prediction and sigma are weighted using POD modes. Parameters
coords (lst(float)) ā coordinate of the point
 Returns
sum_f and sum_sigma
 Return type
floats

sigma
(hypercube=None)[source]Ā¶ Find the point at max Sigma.
It returns the point where the variance (sigma) is maximum. To do so, it uses Gaussian Process information. A genetic algorithm get the global maximum of the function.
 Parameters
hypercube (array_like) ā Corners of the hypercube.
 Returns
The coordinate of the point to add.
 Return type
lst(float)

sigma_discrepancy
(weights=None)[source]Ā¶ Maximization of the composite indicator: sigma  discrepancy.

sigma_distance
(hypercube=None, weights=[0.5, 0.5])[source]Ā¶ Find the point at max Sigma.
It returns the point where the variance (sigma) is maximum. To do so, it uses Gaussian Process information. A genetic algorithm get the global maximum of the function.
 Parameters
hypercube (array_like) ā Corners of the hypercube.
like weights (array) ā Weights for sigma and distance
 Returns
The coordinate of the point to add.
 Return type
lst(float)


class
batman.space.
Sample
(type=None, space=None, data=None, plabels=None, flabels=None, psizes=None, fsizes=None, pformat='npy', fformat='npy')[source]Ā¶ Container class for samples.

__init__
(type=None, space=None, data=None, plabels=None, flabels=None, psizes=None, fsizes=None, pformat='npy', fformat='npy')[source]Ā¶ Initialize the container and build the column index.
This index carries feature names. Features can be scalars or vectors. Vector features do not need to be of the same size. Samples are stored as a 2D rowmajor array: 1 sample per row.

append
(other, axis=0)[source]Ā¶ Append samples to the container.
 Parameters
other (arraylike or
pandas.DataFrame
orSample
) ā samples to append (1 sample per row)axis (0 or 1) ā how to append (add new samples or new features).

property
data
Ā¶ Core of the data
numpy.ndarray
.

property
dataframe
Ā¶ Underlying dataframe.

property
flabels
Ā¶ List of data feature labels.

property
fsizes
Ā¶ Sizes of data features.

logger
= <Logger batman.space.sample (WARNING)>Ā¶

property
plabels
Ā¶ List of space feature labels.

property
psizes
Ā¶ Sizes of space features.

read
(space_fname='samplespace.npy', data_fname='sampledata.npy', plabels=None, flabels=None)[source]Ā¶ Read and append samples from files.
Samples are stored in 2 files: space and data.

property
shape
Ā¶ Shape of the internal array.

property
space
Ā¶ Space
numpy.ndarray
(point coordinates).

property
values
Ā¶ Underlying
numpy.ndarray
.Shape is (n_sample, n_columns). There may be multiple columns per feature. See Sample.psizes and Sample.fsizes.


class
batman.space.
Space
(corners, sample=inf, nrefine=0, plabels=None, psizes=None, multifidelity=None, duplicate=False, threshold=0.0, gp_samplers=None)[source]Ā¶ Manages the space of parameters.

__init__
(corners, sample=inf, nrefine=0, plabels=None, psizes=None, multifidelity=None, duplicate=False, threshold=0.0, gp_samplers=None)[source]Ā¶ Generate a Space.
 Parameters
corners (array_like) ā hypercube ([min, n_features], [max, n_features]).
sample (int/array_like) ā number of sample or list of sample of shape (n_samples, n_features).
nrefine (int) ā number of point to use for refinement.
psizes (list(int)) ā number of components of each parameters.
multifidelity (list(float)) ā Whether to consider the first parameter as the fidelity level. It is a list of [ācost_ratioā, āgrand_costā].
duplicate (bool) ā Whether to allow duplicate points in space,
threshold (float) ā minimal distance between 2 disctinct points,
gp_samplers (dict) ā Gaussian process samplers for functional inputs

append
(points)[source]Ā¶ Add points to the space.
Ignore any point that already exists or that would exceed space capacity.
 Parameters
points (array_like) ā Point(s) to add to space (n_samples, n_features)
 Returns
Added points.
 Return type
numpy.ndarray

static
discrepancy
(sample, bounds=None, method='CD')[source]Ā¶ Compute the discrepancy.
Centered, wrap around or mixture discrepancy measures the uniformity of the parameter space. The lowest the value, the uniform the design.
 Parameters
sample (array_like) ā The sample to compute the discrepancy from (n_samples, k_vars).
bounds (array_like) ā Desired range of transformed data. The transformation apply the bounds on the sample and not the theoretical space, unit cube. Thus min and max values of the sample will coincide with the bounds. ([min, k_vars], [max, k_vars]).
method (str) ā Type of discrepancy. [āCDā, āWDā, āMDā].
 Returns
Discrepancy.
 Return type
float.

logger
= <Logger batman.space.space (WARNING)>Ā¶

static
mst
(sample, fname=None, plot=True)[source]Ā¶ Minimum Spanning Tree.
MST is used here as a discrepancy criterion. Comparing two different designs: the higher the mean, the better the design is in terms of space filling.
 Parameters
sample (array_like) ā The sample to compute the discrepancy from (n_samples, k_vars).
fname (str) ā whether to export to filename or display the figures.
 Returns
Mean, standard deviation and edges of the MST.
 Rtypes
float, float, array_like (n_edges, 2 nodes indices).

optimization_results
(extremum)[source]Ā¶ Compute the optimal value.
 Parameters
extremum (str) ā minimization or maximization objective [āminā, āmaxā].

refine
(surrogate, method, point_loo=None, delta_space=0.08, dists=None, hybrid=None, discrete=None, extremum='min', weights=[0.5, 0.5])[source]Ā¶ Refine the sample, update space points and return the new point(s).
 Parameters
surrogate (
batman.surrogate.SurrogateModel
.) ā Surrogate.method (str) ā Refinement method.
point_loo (array_like) ā Leaveoneout worst point (n_features,).
delta_space (float) ā Shrinking factor for the parameter space.
dists (lst(str)) ā List of valid openturns distributions as string.
int)) hybrid (lst(lst(str,) ā Navigator as list of [Method, n].
discrete (int) ā Index of the discrete variable.
extremum (str) ā Minimization or maximization objective [āminā, āmaxā].
weights (array_like) ā Weights used in composed optimisation fucntion.
 Returns
List of points to add.
 Return type
numpy.ndarray

sampling
(n_samples=None, kind='halton', dists=None, discrete=None)[source]Ā¶ Create point samples in the parameter space.
Minimum number of samples for halton and sobol: 4 For uniform sampling, the number of points is per dimensions. The points are registered into the space and replace existing ones.


batman.space.
dists_to_ot
(dists)[source]Ā¶ Convert distributions to openTURNS.
The list of distribution is converted to openTURNS objects.
 Example
>> from batman.space import dists_to_ot >> dists = dists_to_ot(['Uniform(12, 15)', 'Normal(400, 10)'])
 Parameters
 Returns
List of openTURNS distributions.
 Return type
list(
openturns.Distribution
)

batman.space.
kernel_to_ot
(kernel)[source]Ā¶ Convert kernel to openTURNS.
The kernel is converted to openTURNS objects.
 Example
>> from batman.space import kernels_to_ot >> kernel = kernel_to_ot("AbsoluteExponential([0.5], 1.0)")
 Parameters
kernel (str) ā Kernel available in openTURNS.
 Returns
openTURNS kernel.
 Return type
list(
openturns.Kernel
)
batman.surrogate
: Surrogate ModellingĀ¶

Surrogate model. 

Kriging based on Gaussian Process. 

Polynomial Chaos class. 

RBF class. 
Surrogate model moduleĀ¶

class
batman.surrogate.
Evofusion
(sample, data)[source]Ā¶ Multifidelity algorithm using Evofusion.

__init__
(sample, data)[source]Ā¶ Create the predictor.
Data are arranged as decreasing fidelity. Hence,
sample[0]
corresponds to the highest fidelity. Parameters
sample (array_like) ā The sample used to generate the data. (fidelity, n_samples, n_features)
data (array_like) ā The observed data. (fidelity, n_samples, [n_features])

evaluate
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.surrogate.multifidelity (WARNING)>Ā¶


class
batman.surrogate.
Kriging
(sample, data, kernel=None, n_jobs=None, noise=False, global_optimizer=True)[source]Ā¶ Kriging based on Gaussian Process.

__init__
(sample, data, kernel=None, n_jobs=None, noise=False, global_optimizer=True)[source]Ā¶ Create the predictor.
Uses sample and data to construct a predictor using Gaussian Process. Input is to be normalized before and depending on the number of parameters, the kernel is adapted to be anisotropic.
self.data
contains the predictors as a list(array) of the size of the ouput. A predictor per line of data is created. This leads to a line of predictors that predicts a new column of data.If
noise
is a float, it will be used asnoise_level
bysklearn.gaussian_process.kernels.WhiteKernel
. Otherwise, ifnoise
isTrue
, default values are use for the WhiteKernel. Ifnoise
isFalse
, no noise is added.A multiprocessing strategy is used:
Create a process per mode, do not create if only one,
Create n_restart (3 by default) processes by process.
In the end, there is \(N=n_{restart} \times n_{modes})\) processes. If there is not enought CPU, \(N=\frac{n_{cpu}}{n_{restart}}\).
 Parameters
sample (array_like) ā Sample used to generate the data (n_samples, n_features).
data (array_like) ā Observed data (n_samples, n_features).
kernel (
sklearn.gaussian_process.kernels
.*.) ā Kernel from scikitlearn.noise (float/bool) ā Noise used into kriging.
global_optimizer (bool) ā Whether to do global optimization or gradient based optimization to estimate hyperparameters.

evaluate
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.surrogate.kriging (WARNING)>Ā¶


class
batman.surrogate.
Mixture
(samples, data, corners, fsizes=None, pod=None, standard=True, local_method=None, pca_percentage=0.8, clusterer='cluster.KMeans(n_clusters=2)', classifier='gaussian_process.GaussianProcessClassifier()')[source]Ā¶ Mixture class.
Unsupervised machine learning separate the DoE into clusters, supervised machine learning classify new sample to a cluster and local models predict the new sample.

__init__
(samples, data, corners, fsizes=None, pod=None, standard=True, local_method=None, pca_percentage=0.8, clusterer='cluster.KMeans(n_clusters=2)', classifier='gaussian_process.GaussianProcessClassifier()')[source]Ā¶ Cluster data and fit local models.
If
data
is not scalar, compute PCA ondata
.Cluster data.
Each sample is affiliated to a cluster.
Fit a classifier to handle new samples.
A local model for each cluster is built.
If
local_method
is not None, set as list of dict with options. Ex: [{ākrigingā: {**args}}] Parameters
sample (array_like) ā Sample of parameters of Shape (n_samples, n_params).
data (array_like) ā Sample of realization which corresponds to the sample of parameters
sample
(n_samples, n_features).corners (array_like) ā Hypercube ([min, n_features], [max, n_features]).
fsizes (int) ā Number of components of output features.
pod (dict) ā
Whether to compute POD or not in local models.
tolerance (float) ā Basis modes filtering criteria.
dim_max (int) ā Number of basis modes to keep.
standard (bool) ā Whether to standardize data before clustering.
local_method (lst(dict)) ā List of local surrrogate models for clusters or None for Kriging local surrogate models.
pca_percentage (float) ā Percentage of information kept for PCA.
clusterer (str) ā Clusterer from sklearn (unsupervised machine learning). http://scikitlearn.org/stable/modules/clustering.html#clustering
classifier (str) ā Classifier from sklearn (supervised machine learning). http://scikitlearn.org/stable/supervised_learning.html

boundaries
(samples, plabels=None, fname=None)[source]Ā¶ Boundaries of clusters in the parameter space.
Plot the boundaries for 2D and 3D hypercube or parallel coordinates plot for more than 3D to see the influence of sample variables on cluster affiliation.
 Parameters
 Returns
figure.
 Return type
Matplotlib figure instances, Matplotlib AxesSubplot instances.

estimate_quality
(multi_q2=False)[source]Ā¶ Estimate quality of the local models.
Compute the Q2 for each cluster and return either the Q2 for each cluster or the lowest one with its cluster affiliation.
 Parameters
multi_q2 (float/bool) ā Whether to return the minimal q2 or the q2 of each cluster.
 Returns
q2: Q2 quality for each cluster or the minimal value
 Return type
array_like(n_cluster)/float.
 Returns
point: Max MSE point for each cluster or the one corresponding to minimal Q2 value.
 Return type
array_like(n_cluster)/float.

evaluate
(points, classification=False)[source]Ā¶ Predict new samples.
Classify new samples then predict using the corresponding local model.
 Parameters
points (array_like) ā Samples to predict (n_samples, n_features).
classification (bool) ā Whether to output classification info.
 Returns
predict, sigma: Prediction and sigma of new samples.
 Return type
array_like (n_samples, n_features), array_like (n_samples, n_features)

logger
= <Logger batman.surrogate.mixture (WARNING)>Ā¶


class
batman.surrogate.
PC
(strategy, degree, distributions, N_quad=None, sample=None, stieltjes=True, sparse_param={})[source]Ā¶ Polynomial Chaos class.

__init__
(strategy, degree, distributions, N_quad=None, sample=None, stieltjes=True, sparse_param={})[source]Ā¶ Generate truncature and projection strategies.
Allong with the strategies the sample is storred as an attribute.
sample
as well as corresponding weights:weights
. Parameters
strategy (str) ā Least square or Quadrature [āLSā, āQuadā, āSparseLSā].
degree (int) ā Polynomial degree.
distributions (lst(
openturns.Distribution
)) ā Distributions of each input parameter.sample (array_like) ā Samples for least square (n_samples, n_features).
stieltjes (bool) ā Wether to use Stieltjes algorithm for the basis.
sparse_param (dict) ā
Parameters for the Sparse Cleaning Truncation Strategy and/or hyperbolic truncation of the initial basis.
max_considered_terms (int) ā Maximum Considered Terms,
most_significant (int), Most Siginificant number to retain,
significance_factor (float), Significance Factor,
hyper_factor (float), factor for hyperbolic truncation strategy.

evaluate
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

fit
(sample, data)[source]Ā¶ Create the predictor.
The result of the Polynomial Chaos is stored as
pc_result
and the surrogate is stored aspc
. It exposesself.weights
,self.coefficients
and Sobolā indicesself.s_first
andself.s_total
. Parameters
sample (array_like) ā The sample used to generate the data (n_samples, n_features).
data (array_like) ā The observed data (n_samples, [n_features]).

logger
= <Logger batman.surrogate.polynomial_chaos (WARNING)>Ā¶


class
batman.surrogate.
RBFnet
(trainIn, trainOut, regparam=0.0, radius=1.5, regtree=0, function='default', Pmin=2, Radscale=1.0)[source]Ā¶ RBF class.

__init__
(trainIn, trainOut, regparam=0.0, radius=1.5, regtree=0, function='default', Pmin=2, Radscale=1.0)[source]Ā¶ Initialization.
initialise le reseau principal a partir dāun tableau de points dāentrainements de taille Setsize*(Ninputs) pour les inputs et Setsize*(Noutputs) pour les outputs ds trainOut possibilite dāutiliser dāarbre de regression sur les donnees (regtree=1) le reseau est ensuite entraine sur cet ensemble avec le parametere Regparam
initialisation trainIn est copie pour le pas affecter lāoriginal dans le prog appelant

coefs_mean
()[source]Ā¶ Mean coefficients.
fonction permettant de calculer le plan moyen a partir de laquelle partent les gaussiennes par regression lineaire multiple

compute_radius
(cel)[source]Ā¶ Radius.
calcule le rayon pour la cellule i lorsque on utilise pas lāarbre de regression ce rayon est calcule comme la demi distance a la plus proche cellule

evaluate
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)


class
batman.surrogate.
SklearnRegressor
(sample, data, regressor)[source]Ā¶ Interface to Scikitlearn regressors.

__init__
(sample, data, regressor)[source]Ā¶ Create the predictor.
Uses sample and data to construct a predictor using sklearn. Input is to be normalized before and depending on the number of parameters, the kernel is adapted to be anisotropic.
 Parameters
sample (array_like) ā Sample used to generate the data (n_samples, n_features).
data (array_like) ā Observed data (n_samples, n_features).
regressor (Either regressor object or str(
sklearn.ensemble
.Regressor)) ā ScikitLearn regressor.regressor_options ā Associated regressor hyper parameter

evaluate
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.surrogate.sk_interface (WARNING)>Ā¶


class
batman.surrogate.
SurrogateModel
(kind, corners, plabels, **kwargs)[source]Ā¶ Surrogate model.

__call__
(points)[source]Ā¶ Predict snapshots.
 Parameters
points (
batman.space.Point
or array_like (n_samples, n_features).) ā point(s) to predict.path (str) ā if not set, will return a list of predicted snapshots instances, otherwise write them to disk.
 Returns
Result.
 Return type
array_like (n_samples, n_features)
 Returns
Standard deviation.
 Return type
array_like (n_samples, n_features)

__init__
(kind, corners, plabels, **kwargs)[source]Ā¶ Init Surrogate model.
 Parameters
kind (str) ā name of prediction method, one of: [ārbfā, ākrigingā, āpcā, āevofusionā, āmixtureā, āLinearRegressionā, āLogisticRegressionā, āLogisticRegressionCVā, āPassiveAggressiveRegressorā, āSGDRegressorā, āTheilSenRegressorā, āDecisionTreeRegressorā, āGradientBoostingRegressorā, āAdaBoostRegressorā, āRandomForestRegressorā, āExtraTreesRegressorā].
corners (array_like) ā hypercube ([min, n_features], [max, n_features]).
**kwargs ā See below
 Keyword Arguments
For Polynomial Chaos the following keywords are available
strategy (str) ā Least square or Quadrature [āLSā, āQuadā, āSparseLSā].
degree (int) ā Polynomial degree.
distributions (lst(
openturns.Distribution
)) ā Distributions of each input parameter.sample (array_like) ā Samples for least square (n_samples, n_features).
sparse_param (dict) ā Parameters for the Sparse Cleaning Truncation Strategy and/or hyperbolic truncation of the initial basis.
max_considered_terms (int) ā Maximum Considered Terms,
most_significant (int), Most Siginificant number to retain,
significance_factor (float), Significance Factor,
hyper_factor (float), factor for hyperbolic truncation strategy.
For Kriging the following keywords are available
kernel (
sklearn.gaussian_process.kernels
.*) ā Kernel.noise (float/bool) ā noise level.
global_optimizer (bool) ā Whether to do global optimization or gradient based optimization to estimate hyperparameters.
For Mixture the following keywords are available
fsizes (int) ā Number of components of output features.
pod (dict) ā Whether to compute POD or not in local models.
tolerance (float) ā Basis modes filtering criteria.
dim_max (int) ā Number of basis modes to keep.
standard (bool) ā Whether to standardize data before clustering.
local_method (lst(dict)) ā List of local surrrogate models for clusters or None for Kriging local surrogate models.
pca_percentage (float) ā Percentage of information kept for PCA.
clusterer (str) ā Clusterer from sklearn (unsupervised machine learning). http://scikitlearn.org/stable/modules/clustering.html#clustering
classifier (str) ā Classifier from sklearn (supervised machine learning). http://scikitlearn.org/stable/supervised_learning.html
For all the over regressor from scikit learn the following keywords are available
regressor_options (str) ā Parameter of the associated scikit learn regressor

fit
(sample, data, pod=None)[source]Ā¶ Construct the surrogate.
 Parameters
sample (array_like) ā sample (n_samples, n_features).
data (array_like) ā function evaluations (n_samples, n_features).
pod (
batman.pod.Pod
.) ā POD instance.

logger
= <Logger batman.surrogate.surrogate_model (WARNING)>Ā¶

batman.uq
: Uncertainty QuantificationĀ¶

Uncertainty Quantification class. 
UQ moduleĀ¶

class
batman.uq.
UQ
(surrogate, dists=None, nsample=1000, method='sobol', indices='aggregated', space=None, data=None, plabels=None, xlabel=None, flabel=None, xdata=None, fname=None, test=None, mesh={})[source]Ā¶ Uncertainty Quantification class.

__init__
(surrogate, dists=None, nsample=1000, method='sobol', indices='aggregated', space=None, data=None, plabels=None, xlabel=None, flabel=None, xdata=None, fname=None, test=None, mesh={})[source]Ā¶ Init the UQ class.
From the settings file, it gets:
Method to use for the Sensitivity Analysis (SA),
Type of Sobolā indices to compute,
Number of points per sample to use for SA (\(N(2p+2)\) predictions), resulting storage is 6N(out+p)*8 octets => 184Mo if N=1e4
Method to use to predict a new snapshot,
The list of input variables,
The lengh of the output function.
Also, it creates the model and int_model as
openturns.PythonFunction
. Parameters
surrogate (class:batman.surrogate.SurrogateModel.) ā Surrogate model.
space (class:batman.space.Space.) ā sample space (can be a list).
data (array_like) ā Snapshotās data (n_samples, n_features).
xlabel (str) ā label of the discretization parameter.
flabel (str) ā name of the quantity of interest.
xdata (array_like) ā 1D discretization of the function (n_features,).
fname (str) ā folder output path.
test (str) ā Test function from class:batman.functions.
mesh (dict) ā
For 2D plots the following keywords are available
fname (str) ā name of mesh file.
fformat (str) ā format of the mesh file.
xlabel (str) ā name of the xaxis.
ylabel (str) ā name of the yaxis.
vmins (lst(double)) ā value of the minimal output for data filtering.

error_model
(indices, function)[source]Ā¶ Compute the error between the POD and the analytic function.
Warning
For test purpose only. Choises are Ishigami, Rosenbrock, Michalewicz, G_Function and Channel_Flow test functions.
From the surrogate of the function, evaluate the error using the analytical evaluation of the function on the sample points.
\[Q^2 = 1  \frac{err_{l2}}{var_{model}}\]Knowing that \(err_{l2} = \sum \frac{(prediction  reference)^2}{n}\), \(var_{model} = \sum \frac{(prediction  mean)^2}{n}\)
Also, it computes the mean square error on the Sobol first andtotal order indices.
A summary is written within model_err.dat.
 Parameters
indices (array_like) ā Sobol first order indices computed using the POD.
function (str) ā name of the analytic function.
 Returns
err_q2, mse, s_l2_2nd, s_l2_1st, s_l2_total.
 Return type
array_like.

error_propagation
()[source]Ā¶ Compute the moments.
1st, 2nd order moments are computed for every output of the function. Also compute the PDF for these outputs, and compute correlations (YY and XY) and correlation (YY). Both exported as 2D cartesian plots. Files are respectivelly:
pdfmoment.dat
, moments [discretized on curvilinear abscissa]pdf.dat
> the PDFs [discretized on curvilinear abscissa]correlation_covariance.dat
> correlation and covariance YYcorrelation_XY.dat
> correlation XYpdf.pdf
, plot of the PDF (with moments if dim > 1)

func
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

int_func
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.uq.uq (WARNING)>Ā¶

sobol
()[source]Ā¶ Compute Sobolā indices.
It returns the second, first and total order indices of Sobolā. Two methods are possible for the indices:
sobol
FAST
Warning
The second order indices are only available with the sobol method. Also, when there is no surrogate (ensemble mode), FAST is not available and the DoE must have been generated with saltelli.
And two types of computation are availlable for the global indices:
block
aggregated
If aggregated, map indices are computed. In case of a scalar value, all types returns the same values. block indices are written within sensitivity.dat and aggregated indices within sensitivity_aggregated.dat.
Finally, it calls
error_pod()
in order to compare the indices with their analytical values. Returns
Sobolā indices.
 Return type
array_like.


batman.uq.
cosi
(sample, data)[source]Ā¶ Cosine transformation sensitivity.
Use Discret Cosine Transformation (DCT) to compute sensitivity indices.
 Parameters
sample (array_like) ā Sample of parameters of Shape (n_samples, n_params).
data (array_like) ā Sample of realization which corresponds to the sample of parameters
sample
(n_samples, ).
 Returns
First order sensitivity indices.
 Return type
(Sobol, n_features)
batman.visualization
: Uncertainty VisualizationĀ¶

3D version of the Kiviat plot. 

High Density Region boxplot. 

Plot the space of parameters 2dby2d. 

Response surface visualization in 2d (image), 3d (movie) or 4d (movies). 

Correlation and covariance matrices. 

Plot PDF in 1D or 2D. 

Create gaussian kernel. 

Create a dummy figure and use its manager to display 
Visualization moduleĀ¶

class
batman.visualization.
HdrBoxplot
(data, variance=0.8, alpha=None, threshold=0.95, outliers_method='kde', optimize=False)[source]Ā¶ High Density Region boxplot.
From a given dataset, it computes the HDRboxplot. Results are accessibles directly through class attributes:
median
: median curve,outliers
: outliers regarding a given threshold,hdr_90
: 90% quantile band,extra_quantiles
: other quantile bands,hdr_50
: 50% quantile band.
The following methods are for convenience:
 Example
>> hdr = HdrBoxplot(data) >> hdr.plot() >> hdr.f_hops(generate=10) >> hdr.sound()

__init__
(data, variance=0.8, alpha=None, threshold=0.95, outliers_method='kde', optimize=False)[source]Ā¶ Compute HDR Boxplot on
data
.Compute a 2D kernel smoothing with a Gaussian kernel,
Compute contour lines for quantiles 90, 50 and
alpha
,Compute mediane curve along with quantiles regions and outlier curves.
 Parameters
data (array_like) ā dataset (n_samples, n_features).
variance (float) ā percentage of total variance to conserve.
alpha (array_like) ā extra quantile values (n_alpha).
threshold (float) ā threshold for outliers.
outliers_method (str) ā detection method [ākdeā, āforestā].
optimize (bool) ā bandwidth global optimization or grid search.
n_contours (int) ā discretization to compute contour.

band_quantiles
(band)[source]Ā¶ Find extreme curves for a quantile band.
From the
band
of quantiles, the associated PDF extrema values are computed. If min_alpha is not provided (single quantile value), max_pdf is set to 1E6 in order not to constrain the problem on high values.An optimization is performed per component in order to find the min and max curves. This is done by comparing the PDF value of a given curve with the band PDF.
 Parameters
band (array_like) ā alpha values [max_alpha, min_alpha] ex: [0.9, 0.5].
 Returns
[max_quantile, min_quantile] (2, n_features).
 Return type
list(array_like)

f_hops
(frame_rate=400, fname='fHOPs.mp4', samples=None, x_common=None, labels=None, xlabel='t', flabel='F', offset=0.05)[source]Ā¶ Functional Hypothetical Outcome Plots.
Each frame consists in a HDR boxplot and an additional outcome. If it is an outlier, it is rendered as red dashed line.
If
samples
is None it will use the dataset, if an int>0 it will samples n new samples ; and if array_like, shape (n_samples, n_features) it will use this. Parameters
frame_rate (int) ā time between two outcomes (in milliseconds).
fname (str) ā export movie to filename.
int, list samples (False,) ā Data selector.
x_common (array_like) ā abscissa.
xlabel (str) ā label for x axis.
flabel (str) ā label for y axis.
offset (float) ā Margin around the extreme values of the plot.

find_outliers
(data, data_r=None, method='kde', threshold=0.95)[source]Ā¶ Detect outliers.
The Isolation forrest method requires additional computations to find the centroide. This operation is only performed once and stored in
self.detector
. Thus calling, several times the method will not cause any overhead. Parameters
 Returns
Outliers.
 Return type
array_like (n_outliers, n_features)

logger
= <Logger batman.visualization.hdr (WARNING)>Ā¶

plot
(samples=None, fname=None, x_common=None, labels=None, xlabel='t', flabel='F', dataset=True)[source]Ā¶ Functional plot and nvariate space.
If
self.n_components
is 2, an additional contour plot is done. Ifsamples
is None, the dataset is used for all plots ; otherwize the given sample is used. Parameters
 Returns
figures and all axis.
 Return type
Matplotlib figure instances, Matplotlib AxesSubplot instances.

sample
(samples)[source]Ā¶ Sample new curves from KDE.
If
samples
is an int>0, n new curves are randomly sampled taking into account the joined PDF ; and if array_like, shape (n_samples, n_components) curves are sampled from reduce coordinates of the nvariate space. Parameters
array_like samples (int,) ā Data selector.
 Returns
new curves.
 Return type
array_like (n_samples, n_features)

sound
(frame_rate=400, tone_range=None, amplitude=1000.0, distance=True, samples=False, fname='songfHOPs.wav')[source]Ā¶ Make sound from curves.
Each curve is converted into a sum of tones. This sum is played during a given time before another serie starts.
If
samples
is False it will use the dataset, if an int>0 it will samples n new samples ; and if array_like, shape (n_samples, n_features) it will use this. Parameters
frame_rate (int) ā time between two outcomes (in milliseconds).
tone_range (list(int)) ā range of frequencies of a tone (in hertz).
amplitude (float) ā amplitude of the signal.
distance (bool) ā use distance from median for tone generation.
int, list samples (False,) ā Data selector.
fname (str) ā export sound to filename.

class
batman.visualization.
Kiviat3D
(sample, data, idx=None, bounds=None, plabels=None, range_cbar=None, stack_order='qoi', cbar_order='qoi')[source]Ā¶ 3D version of the Kiviat plot.
Each realization is stacked on top of each other. The axis represent the parameters used to perform the realization.

__init__
(sample, data, idx=None, bounds=None, plabels=None, range_cbar=None, stack_order='qoi', cbar_order='qoi')[source]Ā¶ Prepare params for Kiviat plot.
 Parameters
sample (array_like) ā Sample of parameters of Shape (n_samples, n_params).
data (array_like) ā Sample of realization which corresponds to the sample of parameters
sample
(n_samples, n_features).idx (int) ā Index on the functional data to consider.
bounds (array_like) ā Boundaries to scale the colors shape ([min, n_features], [max, n_features]).
plabels (list(str)) ā Names of each parameters (n_features).
range_cbar (array_like) ā Minimum and maximum values for output function (2 values).
stack_order (str/int) ā Set stacking order [āqoiā, āhdrā]. If an integer, it represents the input variable to take into account.
cbar_order (str) ā Set color mapping order [āqoiā, āhdrā].

f_hops
(frame_rate=400, fname='kiviatHOPs.mp4', flabel='F', ticks_nbr=10, fill=True)[source]Ā¶ Plot HOPs 3D kiviat.
Each frame consists in a 3D Kiviat with an additional outcome highlighted.

static
mesh_connectivity
(n_points, n_params)[source]Ā¶ Compute connectivity for Kiviat.
Using the
n_points
andn_params
, it creates the connectivity required by VTKās pixel elements:4 3 *** 5    0 *** 2 1
This will output:
4 0 1 3 4 4 1 2 4 5

static
mesh_vtk_ascii
(coords, data, connectivity, fname='mesh_kiviat.vtk')[source]Ā¶ Write mesh file in VTK ascii format.
Format is as following (example with 3 cells):
# vtk DataFile Version 2.0 Kiviat 3D ASCII DATASET UNSTRUCTURED_GRID POINTS 6 float 0.40 0.73 0.00 0.00 0.03 0.00 0.50 0.00 0.00 0.40 0.85 0.04 0.00 0.12 0.04 0.50 0.00 0.04 CELLS 3 15 4 0 1 3 4 4 1 2 4 5 4 2 0 5 3 CELL_TYPES 3 8 8 8 POINT_DATA 6 SCALARS value double LOOKUP_TABLE default 17.770e+0 17.770e+0 17.770e+0 17.774e+0 17.774e+0 17.774e+0
 Parameters
coordinates (array_like) ā Sample coordinates of shape (n_samples, n_features).
data (array_like) ā function evaluations of shape (n_samples, n_features).


class
batman.visualization.
Tree
(sample, data, bounds=None, plabels=None, range_cbar=None)[source]Ā¶ Tree.
Extend principle of
batman.visualization.Kiviat3D
but for 2D parameter space. Sample are represented by segments and an azimutal component encode the value frombatman.visualization.HdrBoxplot
.Subclass
batman.visualization.Kiviat3D
by overwritingbatman.visualization.Kiviat3D._axis()
andbatman.visualization.Kiviat3D.plane()
.
__init__
(sample, data, bounds=None, plabels=None, range_cbar=None)[source]Ā¶ Prepare params for Tree plot.
 Parameters
sample (array_like) ā Sample of parameters of Shape (n_samples, n_params).
data (array_like) ā Sample of realization which corresponds to the sample of parameters
sample
(n_samples, n_features).bounds (array_like) ā Boundaries to scale the colors shape ([min, n_features], [max, n_features]).
plabels (list(str)) ā Names of each parameters (n_features).
range_cbar (array_like) ā Minimum and maximum values for output function (2 values).


batman.visualization.
corr_cov
(data, sample, xdata, xlabel='x', plabels=None, interpolation=None, fname=None)[source]Ā¶ Correlation and covariance matrices.
Compute the covariance regarding YY and XY as well as the correlation regarding YY.
 Parameters
data (array_like) ā function evaluations (n_samples, n_features).
sample (array_like) ā sample (n_samples, n_featrues).
xdata (array_like) ā 1D discretization of the function (n_features,).
xlabel (str) ā label of the discretization parameter.
interpolation (str) ā If None, does not interpolate correlation and covariance matrices (YY). Otherwize use Matplotlib methods from imshow such as [ābilinearā, ālanczosā, āspline16ā, āhermiteā, ā¦].
fname (str) ā whether to export to filename or display the figures.
 Returns
figure.
 Return type
Matplotlib figure instances, Matplotlib AxesSubplot instances.

batman.visualization.
cusunoro
(sample, data, plabels=None, fname=None)[source]Ā¶ Cumulative sums of normalised reordered output.
Data are normalized (mean=0, variance=1),
Choose a feature and order its values,
Order normalized data accordingly,
Compute the cumulative sum vector.
Plot and repeat for all features.
 Parameters
sample (array_like) ā Sample of parameters of Shape (n_samples, n_params).
data (array_like) ā Sample of realization which corresponds to the sample of parameters
sample
(n_samples, ).plabels (list(str)) ā Names of each parameters (n_features).
fname (str) ā whether to export to filename or display the figures.
 Returns
figure, axis and sensitivity indices.
 Return type
Matplotlib figure instance, Matplotlib AxesSubplot instance, array_like.

batman.visualization.
doe
(sample, plabels=None, resampling=0, multifidelity=False, fname=None)[source]Ā¶ Plot the space of parameters 2dby2d.
A nvariate plot is constructed with all couple of variables. The distribution on each variable is shown on the diagonal.
 Parameters
 Returns
figure.
 Return type
Matplotlib figure instances, Matplotlib AxesSubplot instances.

batman.visualization.
doe_ascii
(sample, bounds=None, plabels=None, fname=None)[source]Ā¶ Plot the space of parameters 2dby2d in ASCII.
 Parameters
sample (array_like) ā sample (n_samples, n_featrues).
bounds (array_like) ā Desired range of transformed data. The transformation apply the bounds on the sample and not the theoretical space, unit cube. Thus min and max values of the sample will coincide with the bounds. ([min, k_vars], [max, k_vars]).
fname (str) ā whether to export to filename or display on console.

batman.visualization.
kernel_smoothing
(data, optimize=False)[source]Ā¶ Create gaussian kernel.
The optimization option could lead to longer computation of the PDF.
 Parameters
data (array_like) ā output sample to draw a PDF from (n_samples, n_features).
optimize (bool) ā use global optimization of grid search.
 Returns
gaussian kernel.
 Return type

batman.visualization.
mesh_2D
(fname, var=None, flabels=None, fformat='csv', xlabel='X axis', ylabel='Y axis', vmins=None, output_path=None)[source]Ā¶ Visualization of specific variable on a user provided 2D mesh.
The provided mesh should contain two columns (x,y coordinates for each mesh point) and be one of
batman.input_output.available_formats()
. (x, y) must be respectively the first and second column. Any other column is treated as an extra variable and will be used to plot a figure. Ifvar
is not None, its content will be used as plotting variables. Parameters
fname (str) ā name of mesh file.
var (array_like) ā data to be plotted shape (n_coords, n_vars).
fformat (str) ā format of the mesh file.
xlabel (str) ā name of the xaxis.
ylabel (str) ā name of the yaxis.
vmins (lst(double)) ā value of the minimal output for data filtering.
output_path (str) ā name of the output path.
 Returns
figure.
 Return type
Matplotlib figure instances.

batman.visualization.
moment_independent
(sample, data, plabels=None, scale_plt=True, fname=None)[source]Ā¶ Moment independent measures.
Use both PDF and ECDF to cumpute moment independent measures. The following algorithm describes the PDF method (ECDF works the same):
Compute the unconditional PDF,
Choose a feature and order its values and order the data accordingly,
Create bins based on the feature ranges,
Compute the PDF of the ordered data on all successive bins,
Plot and repeat for all features.
 Parameters
sample (array_like) ā Sample of parameters of Shape (n_samples, n_params).
data (array_like) ā Sample of realization which corresponds to the sample of parameters
sample
(n_samples, ).plabels (list(str)) ā Names of each parameters (n_features).
scale_plt (bool) ā Whether to scale yaxes between figures.
fname (str) ā Whether to export to filename or display the figures.
 Returns
Figure, axis and sensitivity indices.
 Return type
Matplotlib figure instance, Matplotlib AxesSubplot instances, dict([āKolmogorovā, āKuiperā, āDeltaā, āSobolā], n_features)

batman.visualization.
pairplot
(sample, data, plabels=None, flabel=None, fname=None)[source]Ā¶ Output function of the input parameter space.
A nvariate plot is constructed with all couple of variables  output.
 Parameters
 Returns
figure.
 Return type
Matplotlib figure instance, Matplotlib AxesSubplot instances.

batman.visualization.
pdf
(data, xdata=None, xlabel=None, flabel=None, moments=False, dotplot=False, ticks_nbr=10, range_cbar=None, fname=None)[source]Ā¶ Plot PDF in 1D or 2D.
 Parameters
data (nd_array/dict) ā
array of shape (n_samples, n_features) or a dictionary with the following:
bounds (array_like) ā first line is mins and second line is maxs (2, n_features).
model (
batman.surrogate.SurrogateModel
/str) ā path to the surrogate data.method (str) ā surrogate model method.
dist (
openturns.ComposedDistribution
) ā joint distribution.
xdata (array_like) ā 1D discretization of the function (n_features,).
xlabel (str) ā label of the discretization parameter.
flabel (str) ā name of the quantity of interest.
moments (bool) ā whether to plot moments along with PDF if dim > 1.
dotplot (bool) ā whether to plot quantile dotplot or histogram.
ticks_nbr (int) ā number of color isolines for response surfaces.
range_cbar (array_like) ā Minimum and maximum values for output function (2 values).
fname (str) ā whether to export to filename or display the figures.
 Returns
figure.
 Return type
Matplotlib figure instances, Matplotlib AxesSubplot instances.

batman.visualization.
reshow
(fig)[source]Ā¶ Create a dummy figure and use its manager to display
fig
. Parameters
fig ā Matplotlib figure instance.

batman.visualization.
response_surface
(bounds, sample=None, data=None, fun=None, doe=None, resampling=0, xdata=None, axis_disc=None, flabel='F', plabels=None, feat_order=None, ticks_nbr=10, range_cbar=None, contours=None, fname=None)[source]Ā¶ Response surface visualization in 2d (image), 3d (movie) or 4d (movies).
You have to set either (i)
sample
withdata
or (ii)fun
depending on your data. If (i), the data are interpolated on a mesh in order to be plotted as a surface. Otherwize,fun
is directly used to generate correct data.The DoE can also be plotted by setting
doe
along withresampling
. Parameters
bounds (array_like) ā sample boundaries ([min, n_features], [max, n_features]).
sample (array_like) ā sample (n_samples, n_features).
data (array_like) ā function evaluations(n_samples, [n_features]).
fun (callable) ā function to plot the response from.
doe (array_like) ā design of experiment (n_samples, n_features).
resampling (int) ā number of resampling points.
xdata (array_like) ā 1D discretization of the function (n_features,).
axis_disc (array_like) ā discretisation of the sample on each axis (n_features).
flabel (str) ā name of the quantity of interest.
feat_order (array_like) ā order of features for multidimensional plot (n_features).
ticks_nbr (int) ā number of color isolines for response surfaces.
range_cbar (array_like) ā min and max values for colorbar range (2).
contours (array_like) ā isocontour values to plot on response surface.
fname (str) ā whether to export to filename or display the figures.
 Returns
figure.
 Return type
Matplotlib figure instances, Matplotlib AxesSubplot instances.

batman.visualization.
save_show
(fname, figures, **kwargs)[source]Ā¶ Either show or save the figure[s].
If
fname
is None the figure will show.

batman.visualization.
sensitivity_indices
(indices, conf=None, plabels=None, polar=False, xdata=None, xlabel='x', fname=None)[source]Ā¶ Plot Sensitivity indices.
If len(indices)>2 map indices are also plotted along with aggregated indices.
 Parameters
indices (array_like) ā [first (n_features), total (n_features), first (xdata, n_features), total (xdata, n_features)].
conf (float/array_like) ā relative error around indices. If float, same error is applied for all parameters. Otherwise shape (n_features, [first, total] orders).
polar (bool) ā Whether to use bar chart or polar bar chart.
xdata (array_like) ā 1D discretization of the function (n_features,).
xlabel (str) ā label of the discretization parameter.
fname (str) ā whether to export to filename or display the figures.
 Returns
figure.
 Return type
Matplotlib figure instances, Matplotlib AxesSubplot instances.
batman.pod
: Proper Orthogonal DecompositionĀ¶

POD class. 
Pod moduleĀ¶

class
batman.pod.
Pod
(corners, tolerance=0.99, dim_max=100)[source]Ā¶ POD class.

property
VS
Ā¶ Compute V*S matrix product.
S is diagonal and stored as vector thus (V*S).T = SV.T

__init__
(corners, tolerance=0.99, dim_max=100)[source]Ā¶ Initialize POD components.
The decomposition of the snapshot matrix is stored as attributes:
U: Singular vectors matrix, array_like (n_features, n_snapshots), after filtering array_like(n_features, n_modes),
S: Singular values matrix, array_like (n_modes, n_snapshots), only the diagonal is stored, of length (n_modes),
V: array_like(n_snapshots, n_snapshots), after filtering (n_snapshots, n_modes).

directories
= {'mean_snapshot': 'Mean.txt', 'modes': 'Mods.npz'}Ā¶

static
downgrade
(S, Vt)[source]Ā¶ Downgrade by removing the kth row of V.
\[\begin{split}S^{k} &= U\Sigma R^T Q^T\\ S^{k} &= UU'\Sigma'V'^TQ^T \\ S^{k} &= U^{k}\Sigma'V^{(k)^T}\end{split}\] Parameters
S ā Singular vector, array_like (n_modes,).
Vt ā V.T without one row, array_like (n_snapshots  1, n_modes).
 Returns
Uā, Sā, V(k).T
 Return type
array_like.

estimate_quality
()[source]Ā¶ Quality estimator.
Estimate the quality of the POD by the leaveoneout method.
 Returns
Q2.
 Return type
float.

static
filtering
(U, S, V, tolerance, dim_max)[source]Ā¶ Remove lowest modes in U, S and V.
 Parameters
 Returns
U (nb of data, nb of modes).
 Return type
array_like.
 Returns
S (nb of modes).
 Return type
array_like.
 Returns
V (nb of snapshots, nb of modes).
 Return type
array_like.

fit
(samples)[source]Ā¶ Create a POD from a set of samples.
 Parameters
samples (
batman.space.Sample
.) ā Samples.

inverse_transform
(samples)[source]Ā¶ Convert VS back into the original space.
 Parameters
samples ā Samples VS to convert (n_samples, n_components).
 Returns
Samples in the original space.
 Return type
array_like (n_samples, n_features)

logger
= <Logger batman.pod.pod (WARNING)>Ā¶

pod_file_name
= 'pod.npz'Ā¶

points_file_name
= 'points.dat'Ā¶

property
batman.functions
: FunctionsĀ¶
Data module 

SixHumpCamel class [Molga2005]. 

Branin class [Forrester2008]. 


Michalewicz class [Molga2005]. 

Ishigami class [Ishigami1990]. 
Rastrigin class [Molga2005]. 


G_Function class [Saltelli2000]. 

Forrester class [Forrester2007]. 

Environmental Model class [Bliznyuk2008]. 

Channel Flow class. 

Manning equation for rectangular channel class. 

Detect space or unique point. 

Convert float output to list. 
Data moduleĀ¶
Analytical moduleĀ¶
Defines analytical Uncertainty Quantification oriented functions for test and model evaluation purpose.
See also
It implements the following classes:
In most case, Sobolā indices are precomputed and storred as attributes.
ReferencesĀ¶
 Molga2005(1,2,3,4,5,6)
Molga, M., & Smutnicki, C. Test functions for optimization needs (2005).
 Dixon1978
Dixon, L. C. W., & Szego, G. P. (1978). The global optimization problem: an introduction. Towards global optimization, 2, 115.
 Ishigami1990(1,2)
Ishigami, T., & Homma, T. (1990, December): An importance quantification technique in uncertainty analysis for computer models. In Uncertainty Modeling and Analysis, 1990. Proceedings., First International Symposium on (pp. 398403). IEEE.
 Saltelli2000(1,2)
Saltelli, A., Chan, K., & Scott, E. M. (Eds.). (2000). Sensitivity analysis (Vol. 134). New York: Wiley.
 Forrester2007(1,2)
Forrester, Sobester. (2007). MultiFidelity Optimization via Surrogate Modelling. In Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.
 Forrester2008(1,2)
Forrester, A., Sobester, A., & Keane, A. (2008). Engineering design via surrogate modelling: a practical guide. Wiley.
 Bliznyuk2008(1,2)
Bliznyuk, N., Ruppert, D., Shoemaker, C., Regis, R., Wild, S., & Mugunthan, P. (2008). Bayesian calibration and uncertainty analysis for computationally expensive models using optimization and radial basis function approximation. Journal of Computational and Graphical Statistics, 17(2).
 Surjanovic2017
Surjanovic, S. & Bingham, D. (2013). Virtual Library of Simulation Experiments: Test Functions and Datasets. Retrieved September 11, 2017, from http://www.sfu.ca/~ssurjano.

class
batman.functions.analytical.
Branin
[source]Ā¶ Branin class [Forrester2008].
\[f(x) = \left( x_2  \frac{5.1}{4\pi^2}x_1^2 + \frac{5}{\pi}x_1  6 \right)^2 + 10 \left[ \left( 1  \frac{1}{8\pi} \right) \cos(x_1) + 1 \right] + 5x_1.\]The function has two local minima and one global minimum. It is a modified version of the original Branin function that seek to be representative of engineering functions.
\[f(x^*) = 15,310076, x^* = (\pi, 12.275), x_1 \in [5, 10], x_2 \in [0, 15]\]
__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶


class
batman.functions.analytical.
Channel_Flow
(dx=8000.0, length=40000.0, width=500.0, slope=0.0005, hinit=10.0)[source]Ā¶ Channel Flow class.
\[\begin{split}\frac{dh}{ds}=\mathcal{F}(h)=I\frac{1(h/h_n)^{10/3}}{1(h/h_c)^{3}}\\ h_c=\left(\frac{q^2}{g}\right)^{1/3}, h_n=\left(\frac{q^2}{IK_s^2}\right)^{3/10}\end{split}\]
__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

__init__
(dx=8000.0, length=40000.0, width=500.0, slope=0.0005, hinit=10.0)[source]Ā¶ Initialize the geometrical configuration.

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶


class
batman.functions.analytical.
ChemicalSpill
(s=None, tstep=0.3)[source]Ā¶ Environmental Model class [Bliznyuk2008].
Model a pollutant spill caused by a chemical accident.
C(x)
being the concentration of the pollutant at the spacetime vector(s, t)
, with0 < s < 3
andt > 0
.A mass
M
of pollutant is spilled at each of two locations, denoted by the spacetime vectors(0, 0)
and \((L, \tau)\). Each element of the response is a scaled concentration of the pollutant at the spacetime vector.\[\begin{split}f(X) = \sqrt{4\pi}C(X), x \in [[7, 13], [0.02, 0.12], [0.01, 3], [30.1, 30.295]]\\ C(X) = \frac{M}{\sqrt{4\pi D_{t}}}\exp \left(\frac{s^2}{4D_t}\right) + \frac{M}{\sqrt{4\pi D_{t}(t  \tau)}} \exp \left(\frac{(sL)^2}{4D(t  \tau)}\right) I (\tau < t)\end{split}\]
__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶


class
batman.functions.analytical.
Forrester
(fidelity='e')[source]Ā¶ Forrester class [Forrester2007].
\[\begin{split}F_{e}(x) = (6x2)^2\sin(12x4), \\ F_{c}(x) = AF_e(x)+B(x0.5)+C,\end{split}\]were \(x\in{0,1}\) and \(A=0.5, B=10, C=5\).
This set of two functions are used to represents a high an a low fidelity.

__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

__init__
(fidelity='e')[source]Ā¶ Forresterfunction definition.
e
stands for expansive andc
for cheap. Parameters
fidelity (str) ā select the fidelity
['e''f']

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶


class
batman.functions.analytical.
G_Function
(d=4, a=None)[source]Ā¶ G_Function class [Saltelli2000].
\[F = \Pi_{i=1}^d \frac{\lvert 4x_i  2\rvert + a_i}{1 + a_i}\]Depending on the coefficient \(a_i\), their is an impact on the impact on the output. The more the coefficient is for a parameter, the less the parameter is important.

__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

__init__
(d=4, a=None)[source]Ā¶ Gfunction definition.
 Parameters
d (int) ā input dimension
a (np.array) ā (1, d)

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶


class
batman.functions.analytical.
Ishigami
(a=7.0, b=0.1)[source]Ā¶ Ishigami class [Ishigami1990].
\[F = \sin(x_1)+7\sin(x_2)^2+0.1x_3^4\sin(x_1), x\in [\pi, \pi]^3\]It exhibits strong nonlinearity and nonmonotonicity. Depending on a and b, emphasize the nonlinearities. It also has a dependence on X3 due to second order interactions (F13).

__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶


class
batman.functions.analytical.
Manning
(width=100.0, slope=0.0005, inflow=1000, d=1)[source]Ā¶ Manning equation for rectangular channel class.

__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

__init__
(width=100.0, slope=0.0005, inflow=1000, d=1)[source]Ā¶ Initialize the geometrical configuration.

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶


class
batman.functions.analytical.
Michalewicz
(d=2, m=10)[source]Ā¶ Michalewicz class [Molga2005].
It is a multimodal ddimensional function which has \(d!\) local minima
\[f(x)=\sum_{i=1}^d \sin(x_i)\sin^{2m}\left(\frac{ix_i^2}{\pi}\right),\]where m defines the steepness of the valleys and ridges.
It is to difficult to search a global minimum when \(m\) reaches large value. Therefore, it is recommended to have \(m < 10\).
\[f(x^*) = 1.8013, x^* = (2.20, 1.57), x \in [0, \pi]^d\]
__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶


class
batman.functions.analytical.
Rastrigin
(d=2)[source]Ā¶ Rastrigin class [Molga2005].
It is a multimodal ddimensional function which has regularly distributed local minima.
\[f(x)=10d+\sum_{i=1}^d [x_i^210\cos(2\pi x_i)]\]\[f(x^*) = 0, x^* = (0, ..., 0), x \in [5.12, 5.12]^d\]
__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶


class
batman.functions.analytical.
Rosenbrock
(d=2)[source]Ā¶ Rosenbrock class [Dixon1978].
\[f(x)=\sum_{i=1}^{d1}[100(x_{i+1}x_i^2)^2+(x_i1)^2]\]The function is unimodal, and the global minimum lies in a narrow, parabolic valley.
\[f(x^*) = 0, x^* = (1, ..., 1), x \in [2.048, 2.048]^d\]
__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶


class
batman.functions.analytical.
SixHumpCamel
[source]Ā¶ SixHumpCamel class [Molga2005].
\[\left(42.1x_1^2+\frac{x_1^4}{3}\right)x_1^2+x_1x_2+ (4+4x_2^2)x_2^2\]The function has six local minima, two of which are global.
\[f(x^*) = 1.0316, x^* = (0.0898, 0.7126), (0.0898,0.7126), x_1 \in [3, 3], x_2 \in [2, 2]\]
__call__
(x_n, *args, **kwargs)Ā¶ Get evaluation from space or point.
If the function is a Kriging instance, get and returns the variance.
 Returns
function evaluation(s) [sigma(s)]
 Return type
np.array([n_eval], n_feature)

logger
= <Logger batman.functions.analytical (WARNING)>Ā¶

batman.tasks
: TasksĀ¶
Data Provider moduleĀ¶

class
batman.tasks.
ProviderFile
(plabels, flabels, file_pairs, psizes=None, fsizes=None, discover_pattern=None, save_dir=None, space_fname='samplespace.npy', space_format='npy', data_fname='sampledata.npy', data_format='npy')[source]Ā¶ Provides Snapshots loaded from a list of files.

__init__
(plabels, flabels, file_pairs, psizes=None, fsizes=None, discover_pattern=None, save_dir=None, space_fname='samplespace.npy', space_format='npy', data_fname='sampledata.npy', data_format='npy')[source]Ā¶ Initialize the provider.
Load known samples from a list of files. If
discover_pattern
is specified, it will also try to locate and import samples from there. Parameters
file_pairs (list(tuple(str))) ā list of paires (space_file, data_file).
fsizes (list(int)) ā number of components of output features.
discover_pattern (str) ā UNIXstyle patterns for directories with pairs of sample files to import.
save_dir (str) ā path to a directory for saving known snapshots.
space_fname (str) ā name of space file to write.
data_fname (str) ā name of data file to write.
space_format (str) ā space file format.
data_format (str) ā data file format.

build_data
(points)[source]Ā¶ Compute data for requested points.
This provider cannot compute any data and will raise if called.
 Returns
NotImplemented

property
flabels
Ā¶ Names of data features.

property
fsizes
Ā¶ Shape of data features.

property
known_points
Ā¶ List of points whose associated data is already known.

logger
= <Logger batman.tasks.provider_file (WARNING)>Ā¶

property
plabels
Ā¶ Names of space parameters.

property
psizes
Ā¶ Shape of space parameters.


class
batman.tasks.
ProviderFunction
(plabels, flabels, module, function, psizes=None, fsizes=None, discover_pattern=None, save_dir=None, space_fname='samplespace.json', space_format='json', data_fname='sampledata.json', data_format='json')[source]Ā¶ Provides Snapshots built through an external python function.

__init__
(plabels, flabels, module, function, psizes=None, fsizes=None, discover_pattern=None, save_dir=None, space_fname='samplespace.json', space_format='json', data_fname='sampledata.json', data_format='json')[source]Ā¶ Initialize the provider.
Load a python function to be called for computing new snpahots.
 Parameters
module (str) ā python module to load.
function (str) ā function in module to execute for generating data.
fsizes (list(int)) ā number of components of output features.
discover_pattern (str) ā UNIXstyle patterns for directories with pairs of sample files to import.
save_dir (str) ā path to a directory for saving known snapshots.
space_fname (str) ā name of space file to write.
data_fname (str) ā name of data file to write.
space_format (str) ā space file format.
data_format (str) ā data file format.

build_data
(points)[source]Ā¶ Compute data for requested points.
 Parameters
points (array_like) ā points to build data from, (n_points, n_features).
 Returns
samples for requested points (carry both space and data).
 Return type
Sample

property
flabels
Ā¶ Names of data features.

property
fsizes
Ā¶ Shape of data features.

property
known_points
Ā¶ List of points whose associated data is already known.

logger
= <Logger batman.tasks.provider_function (WARNING)>Ā¶

property
plabels
Ā¶ Names of space parameters.

property
psizes
Ā¶ Shape of space parameters.

require_data
(points)[source]Ā¶ Return samples for requested points.
Data for unknown points is generated through a python function.
 Parameters
points (array_like) ā points to build data from, (n_points, n_features).
 Returns
samples for requested points (carry both space and data)
 Return type
Sample


class
batman.tasks.
ProviderJob
(plabels, flabels, command, context_directory, psizes=None, fsizes=None, coupling=None, hosts=None, pool=None, clean=False, discover_pattern=None, save_dir=None, space_fname='samplespace.json', space_format='json', data_fname='sampledata.json', data_format='json')[source]Ā¶ Provides Snapshots built through a 3rdparty program.

__init__
(plabels, flabels, command, context_directory, psizes=None, fsizes=None, coupling=None, hosts=None, pool=None, clean=False, discover_pattern=None, save_dir=None, space_fname='samplespace.json', space_format='json', data_fname='sampledata.json', data_format='json')[source]Ā¶ Initialize the provider.
 Parameters
command (str) ā command to be executed for computing new snapshots.
context_directory (str) ā store every ressource required for executing a job.
fsizes (list(int)) ā number of components of output features.
coupling (dict) ā
Definition of the snapshots IO files:
coupling_directory (str) ā subdirectory in
context_directory
that will contain input parameters and output file.input_fname (str) ā basename for files storing the point coordinates
plabels
.input_format (str) ā
json
(default),csv
,npy
,npz
.output_fname (str) ā basename for files storing values associated to
flabels
.output_format (str) ā
json
(default),csv
,npy
,npz
.
Definition of the remote HOSTS if any:
hostname (str) ā Remote host to connect to.
remote_root (str) ā Remote folder to create and store data.
username (str) ā username.
password (str) ā password.
pool (
concurrent.futures
.xxx.xxx.Executor.) ā pool executor.clean (bool) ā whether to remove working directories.
discover_pattern (str) ā UNIXstyle patterns for directories with pairs of sample files to import.
save_dir (str) ā path to a directory for saving known snapshots.
space_fname (str) ā name of space file to write.
data_fname (str) ā name of data file to write.
space_format (str) ā space file format.
data_format (str) ā data file format.

build_data
(points, sample_id=None)[source]Ā¶ Compute data for requested points.
Ressources for executing a job are copied from the context directory to a work directory. The shell command is executed from this directory. The command shall find its inputs and place its outputs in the coupling subdirectory, inside the work directory.
 Parameters
points (array_like) ā points to compute (n_points, n_features).
sample_id (list) ā points indices in the points list.
 Returns
samples for requested points (carry both space and data) and failed if any.
 Return type
Sample
, list([point, err])

property
flabels
Ā¶ Names of data features.

property
fsizes
Ā¶ Shape of data features.

property
known_points
Ā¶ List of points whose associated data is already known.

logger
= <Logger batman.tasks.provider_job (WARNING)>Ā¶

property
plabels
Ā¶ Names of space parameters.

property
psizes
Ā¶ Shape of space parameters.

batman.misc
: MiscĀ¶

NestedPool class. 

Print progress bar in console. 

Perform a discret or a continuous/discrete optimization. 

Import a configuration file. 

Ask user for delete confirmation. 

Ask user for a folder path. 

Get absolute path. 

Return an absolute and normalized path. 
Misc moduleĀ¶

class
batman.misc.
NestedPool
(processes=None, initializer=None, initargs=(), maxtasksperchild=None, context=None)[source]Ā¶ NestedPool class.
Inherit from
pathos.multiprocessing.Pool
. Enable nested process pool.

class
batman.misc.
ProgressBar
(total)[source]Ā¶ Print progress bar in console.

batman.misc.
optimization
(bounds, discrete=None)[source]Ā¶ Perform a discret or a continuous/discrete optimization.
If a variable is discrete, the decorator allows to find the optimum by doing an optimization per discrete value and then returns the optimum.
 Parameters
bounds (array_like) ā bounds for optimization ([min, max], n_features).
discrete (int) ā index of the discrete variable.
batman.input_output
: Input OutputĀ¶
IO moduleĀ¶
Provides Formater objects to deal with I/Os.
Every formaters have the same interface, exposing the two methods read and write.
 Example
Using json formater
>> from input_output import formater
>> varnames = ['x1', 'x2', 'x3']
>> data = [[1, 2, 3], [87, 74, 42]]
>> fmt = formater('json')
>> fmt.write('file.json', data, varnames)
{'x1': [1, 87], 'x2': [2, 74], 'x3': [3, 42]}
>> # can load a subset of variables, in a different order (unavailable for format 'npy')
>> fmt.read('file.json', ['x2', 'x1'])
array([[2, 1], [74, 87]])