water_benchmark_hub.batadal

water_benchmark_hub.batadal.batadal

Module provides access to the BATADAL benchmark.

class water_benchmark_hub.batadal.batadal.BATADAL

Bases: BenchmarkResource

The BATtle of the Attack Detection ALgorithms (BATADAL) by Riccardo Taormina, Stefano Galelli, Nils Ole Tippenhauer, Avi Ostfeld, Elad Salomons, Demetrios Eliades is a competition on planning and management of water networks undertaken within the Water Distribution Systems Analysis Symposium. The goal of the battle was to compare the performance of algorithms for the detection of cyber-physical attacks, whose frequency has increased in the last few years along with the adoption of smart water technologies. The design challenge was set for the C-Town network, a real-world, medium-sized water distribution system operated through programmable logic controllers and a supervisory control and data acquisition (SCADA) system. Participants were provided with data sets containing (simulated) SCADA observations, and challenged to design an attack detection algorithm. The effectiveness of all submitted algorithms was evaluated in terms of time-to-detection and classification accuracy. Seven teams participated in the battle and proposed a variety of successful approaches leveraging data analysis, model-based detection mechanisms, and rule checking. Results were presented at the Water Distribution Systems Analysis Symposium (World Environmental and Water Resources Congress) in Sacramento, California on May 21-25, 2017. The paper summarizes the BATADAL problem, proposed algorithms, results, and future research directions.

See https://www.batadal.net/ for details.

This module provides functions for loading the original BATADAL data set load_data(), as well as functions for loading the scenarios load_scenario() and pre-generated SCADA data load_scada_data().

static get_meta_info() dict

Gets the meta information of this resource.

Returns:

Meta info.

Return type:

dict

static load_data(download_dir: str | None = None, return_X_y: bool = False, return_ground_truth: bool = False, return_features_desc: bool = False, verbose: bool = True) dict

Loads the original BATADAL competition data.

Parameters:
  • download_dir (str, optional) –

    Path to the data files – if None, the temp folder will be used. If the path does not exist, the data files will be downloaded to the given path.

    The default is None.

  • return_X_y (bool, optional) –

    If True, the data together with the labels is returned as pairs of Numpy arrays. Otherwise, the data is returned as Pandas data frames.

    The default is False.

  • return_ground_truth (bool) –

    If True and if return_X_y is True, the ground truth labels are included in the returned dictionary – note that the labels provided in the benchmark constitute a partial labeling only.

    The default is False.

  • return_features_desc (bool) –

    If True and if return_X_y is True, feature names (i.e. descriptions) are included in the returned dictionary.

    The default is False.

  • verbose (bool, optional) –

    If True, a progress bar is shown while downloading files.

    The default is True.

Returns:

Dictionary of the loaded benchmark data. The dictionary contains the two training data sets (“train_1” and “train_2”), as well as the test data set (“test”). If return_X_y is False, each dictionary entry is a Pandas dataframe. Otherwise, it is a tuple of sensor readings and labels (except for the test set) – if return_ground_truth is True or return_features_desc is True, the corresponding data is appended to the tuple.

Return type:

dict

static load_scada_data(download_dir: str | None = None, return_X_y: bool = False, return_ground_truth: bool = False, return_features_desc: bool = False, verbose: bool = True) Any

Loads the SCADA data of the simulated BATADAL benchmark scenario – note that due to randomness and undocumented aspects of the original BATADAL data set, these differ from the original data set which can be loaded by calling load_data().

Parameters:
  • download_dir (str, optional) –

    Path to the data files – if None, the temp folder will be used. If the path does not exist, the data files will be downloaded to the given path.

    The default is None.

  • return_X_y (bool, optional) –

    If True, the data together with the labels is returned as pairs of Numpy arrays. Otherwisen the data is returned as Pandas data frames.

    The default is False.

  • return_ground_truth (bool) –

    If True and if return_X_y is True, the ground truth labels are included in the returned dictionary – note that the labels provided in the benchmark constitute a partial labeling only.

    The default is False.

  • return_features_desc (bool) –

    If True and if return_X_y is True, feature names (i.e. descriptions) are included in the returned dictionary.

    The default is False.

  • verbose (bool, optional) –

    If True, a progress bar is shown while downloading files.

    The default is True.

static load_scenario(download_dir: str | None = None, verbose: bool = True) epyt_flow.simulation.ScenarioConfig

Creates and returns the BATADAL scenario – it can be either modified or directly passed to the EPyT-Flow simulator epyt_flow.simulation.scenario_simulator.ScenarioSimulator.

Note

Note that due to randomness and undocumented aspects of the original BATADAL benchmark, the scenario simulation results differ from the original data set which can be loaded by calling load_data().

Parameters:
  • download_dir (str, optional) –

    Path to the data files – if None, the temp folder will be used. If the path does not exist, the data files will be downloaded to the given path.

    The default is None.

  • verbose (bool, optional) –

    If True, a progress bar is shown while downloading files.

    The default is True.

Returns:

The BATADAL scenario.

Return type:

epyt_flow.simulation.scenario_config.ScenarioConfig

water_benchmark_hub.batadal.batadal_data

Module provides the event configurations for BATADAL.