water_benchmark_hub.leakdb

water_benchmark_hub.leakdb.leakdb

Module provides access to the LeakDB benchmark.

class water_benchmark_hub.leakdb.leakdb.LeakDB

Bases: BenchmarkResource

LeakDB (Leakage Diagnosis Benchmark) by Vrachimis, S. G., Kyriakou, M. S., Eliades, D. G., and Polycarpou, M. M. (2018), is a realistic leakage dataset for water distribution networks. The dataset is comprised of 1000 artificially created but realistic leakage scenarios, on different water distribution networks, under varying conditions.

See https://github.com/KIOS-Research/LeakDB/ for details.

This module provides functions for loading the original LeakDB data set load_data(), as well as methods for loading the scenarios load_scenarios() and pre-generated SCADA data load_scada_data(). The official scoring/evaluation is implemented in compute_evaluation_score() – i.e. those results can be directly compared to the official paper. Besides this, the user can choose to evaluate predictions using any other metric.

static compute_evaluation_score(scenarios_id: list[int], use_net1: bool, y_pred_labels_per_scenario: list[numpy.ndarray]) → dict

Evaluates the predictions (leakage detection) for a list of given scenarios.

Parameters:

scenarios_id (list[int]) – List of scenarios ID that are to be evaluated – there is a total number of 1000 scenarios.
use_net1 (bool) – If True, Net1 LeakDB will be used for evaluation, otherwise the Hanoi LeakDB will be used.
y_pred_labels_per_scenario (list[numpy.ndarray]) – Predicted binary labels (over time) for each scenario in scenarios_id.

Returns:

Dictionary containing the f1-score, true positive rate, true negative rate, and early detection score.

Return type:

dict

static get_meta_info() → dict

Gets the meta information of this resource.

Returns:: Meta info.
Return type:: dict

static load_data(scenarios_id: list[int], use_net1: bool, download_dir: str | None = None, return_X_y: bool = False, return_features_desc: bool = False, return_leak_locations: bool = False, verbose: bool = True) → dict

Loads the original LeakDB benchmark data set.

Warning

All scenarios together are a huge data set – approx. 8GB for Net1 and 25GB for Hanoi. Downloading and loading might take some time! Also, a sufficient amount of hard disk memory is required.

Parameters:

scenarios_id (list[int]) – List of scenarios ID that are to be loaded – there are a total number of 1000 scenarios.
use_net1 (bool) – If True, Net1 LeakDB will be loaded, otherwise the Hanoi LeakDB will be loaded.
download_dir (str, optional) –
Path to the data files – if None, the temp folder will be used. If the path does not exist, the data files will be downloaded to the given path.

The default is None.
return_X_y (bool, optional) –
If True, the data is returned together with the labels (presence of a leakage) as two Numpy arrays, otherwise, the data is returned as Pandas data frames.

The default is False.
return_features_desc (bool, optional) –
If True and if return_X_y is True, the returned dictionary contains the features’ descriptions (i.e. names) under the key “features_desc”.

The default is False.
return_leak_locations (bool) –
If True and if return_X_y is True, the leak locations are returned as well – as an instance of scipy.sparse.bsr_array.

The default is False.
verbose (bool, optional) –
If True, a progress bar is shown while downloading files.

The default is True.

Returns:

Dictionary containing the scenario data sets. Data of each requested scenario can be accessed by using the scenario ID as a key.

Return type:

dict

static load_scada_data(scenarios_id: list[int], use_net1: bool = True, download_dir: str | None = None, return_X_y: bool = False, return_leak_locations: bool = False, verbose: bool = True) → list[epyt_flow.simulation.scada.ScadaData] | list[tuple[numpy.ndarray, numpy.ndarray]]

Loads the SCADA data of the simulated LeakDB benchmark scenarios – see load_scenarios().

Note

Note that due to the randomness in the demand creation as well as in the model uncertainties, the SCADA data differs from the original data set which can be loaded by calling load_data(). However, the leakages (i.e. location and profile) are consistent with the original data set.

Parameters:

scenarios_id (list[int]) – List of scenarios ID that are to be loaded – there are a total number of 1000 scenarios.
use_net1 (bool, optional) –
If True, Net1 LeakDB will be loaded, otherwise the Hanoi LeakDB will be loaded.

The default is True.
download_dir (str, optional) –
Path to the data files – if None, the temp folder will be used. If the path does not exist, the data files will be downloaded to the given path.

The default is None.
return_X_y (bool, optional) –
If True, the data is returned together with the labels (presence of a leakage) as two Numpy arrays, otherwise, the data is returned as epyt_flow.simulation.scada.scada_data.ScadaData instances.

The default is False.
return_leak_locations (bool) –
If True, the leak locations are returned as well – as an instance of scipy.sparse.bsr_array.

The default is False.
verbose (bool, optional) –
If True, a progress bar is shown while downloading files.

The default is True.

Returns:

The simulated benchmark scenarios as either a list of epyt_flow.simulation.scada.scada_data.ScadaData instances or as a list of (X, y) Numpy arrays. If ‘return_leak_locations’ is True, the leak locations are included as an instance of scipy.sparse.bsr_array as well.

Return type:

list[epyt_flow.simulation.scada.scada_data.ScadaData] or list[tuple[numpy.ndarray, numpy.ndarray]]

static load_scenarios(scenarios_id: list[int], use_net1: bool = True, download_dir: str | None = None, verbose: bool = True) → list[epyt_flow.simulation.ScenarioConfig]

Creates and returns the LeakDB scenarios – they can be either modified or passed directly to the EPyT-Flow simulator epyt_flow.simulation.scenario_simulator.ScenarioSimulator.

Note

Note that due to the randomness in the demand creation as well as in the model uncertainties, the simulation results will differ between different runs, and will also differ from the original data set (see load_data()). However, the leakages (i.e. location and profile) will be always the same and be consistent with the original data set.

Parameters:

scenarios_id (list[int]) – List of scenarios ID that are to be loaded – there is a total number of 1000 scenarios.
use_net1 (bool, optional) –
If True, Net1 network will be used, otherwise the Hanoi network will be used.

The default is True.
download_dir (str, optional) –
Path to the Net1.inp or Hanoi.inp file – if None, the temp folder will be used. If the path does not exist, the .inp will be downloaded to the give path.

The default is None.
verbose (bool, optional) –
If True, a progress bar is shown while downloading files.

The default is True.

Returns:

LeakDB scenarios.

Return type:

list[epyt_flow.simulation.scenario_config.ScenarioConfig]

water_benchmark_hub.leakdb.leakdb_data

Module provides the leakage configurations for LeakDB.