water_benchmark_hub.gecco_waterquality
water_benchmark_hub.gecco_waterquality.gecco_water_quality
Module provides functions for loading different GECCO water quality data sets.
- class water_benchmark_hub.gecco_waterquality.gecco_water_quality.GeccoWaterQuality
Bases:
BenchmarkResourceBase class for GECCO Water Quality 2017 - 2019 benchmarks.
Note that the scoring/evaluation algorithm is the same for all GECCO water quality benchmarks and is implemented in
compute_evaluation_score().- static compute_evaluation_score(y_pred: numpy.ndarray, y: numpy.ndarray) float
Evaluates the performance of a detection method.
Note
All GECCO water quality challenges use the F1-score for evaluation.
- Parameters:
y_pred (numpy.ndarray) – Event indication prediction over time
y (numpy.ndarray) – Ground truth event indication over time.
- Returns:
Evaluation score.
- Return type:
float
- static get_meta_info() dict
Gets the meta information of this resource.
- Returns:
Meta info.
- Return type:
dict
- class water_benchmark_hub.gecco_waterquality.gecco_water_quality.GeccoWaterQuality2017
Bases:
GeccoWaterQualityClass for Loading the original GECCO Industrial Challenge 2017 Dataset: A water quality dataset for the “Monitoring of drinking-water quality” competition organized by M. Friese, J. Stork, A. Fischbach, M. Rebolledo, T. Bartz-Beielstein at the Genetic and Evolutionary Computation Conference 2017, Berlin, Germany
This is a benchmark for anomaly detection algorithms on water quality. The data is provided by the “Thüringer Fernwasserversorgung” (Germany) and constitutes a real-world data set. In this data set, 9 numeric water quality features are given at a sampling rate of 1 min over approx. 3 month. The goal is to predict the presence of an anomaly – i.e. binary classification.
More information can be found at https://zenodo.org/records/3884465 and http://www.spotseven.de/gecco-challenge/gecco-challenge-2017/
- static get_meta_info() dict
Gets the meta information of this resource.
- Returns:
Meta info.
- Return type:
dict
- static load_data(download_dir: str | None = None, return_X_y: bool = True, verbose: bool = True) pandas.DataFrame | tuple[numpy.ndarray, numpy.ndarray]
Loads the original GECCO Industrial Challenge 2017 Dataset.
Note
Note that this is NOT a simulated scenario and therefore only the final data set is provided.
- Parameters:
download_dir (str, optional) –
Path to the data files – if None, the temp folder will be used. If the path does not exist, the data files will be downloaded to the given path.
The default is None.
return_X_y (bool, optional) –
If True, the data is returned together with the labels as two Numpy arrays, otherwise the data is returned as Pandas data frame.
The default is True.
verbose (bool, optional) –
If True, a progress bar is shown while downloading files.
The default is True.
- Returns:
The benchmark data set as either a Pandas data frame or as a pair of (X, y) Numpy arrays.
- Return type:
pandas.DataFrame or tuple[numpy.ndarray, numpy.ndarray]
- class water_benchmark_hub.gecco_waterquality.gecco_water_quality.GeccoWaterQuality2018
Bases:
GeccoWaterQualityClass for Loading the GECCO Industrial Challenge 2018 Dataset: A water quality dataset for the “Internet of Things: Online Anomaly Detection for Drinking Water Quality” competition organized by F. Rehbach, M. Rebolledo, S. Moritz, S. Chandrasekaran, T. Bartz-Beielstein at the Genetic and Evolutionary Computation Conference 2018, Kyoto, Japan.
This is a benchmark (based on
GeccoWaterQuality2017()) for anomaly detection algorithms on water quality. The data is provided by the “Thüringer Fernwasserversorgung” (Germany) and constitutes a real-world data set. In this data set, 9 numeric water quality features are given at a sampling rate of 1 min over approx. 3 month. The goal is to predict the presence of an anomaly – i.e. binary classification.More information can be found at https://zenodo.org/records/3884398 and http://www.spotseven.de/gecco/gecco-challenge/gecco-challenge-2018/
- static get_meta_info() dict
Gets the meta information of this resource.
- Returns:
Meta info.
- Return type:
dict
- static load_data(download_dir: str | None = None, return_X_y: bool = True, verbose: bool = True) pandas.DataFrame | tuple[numpy.ndarray, numpy.ndarray]
Loads the GECCO Industrial Challenge 2018 Dataset.
Note
Note that this is NOT a simulated scenario and therefore only the final data set is provided.
- Parameters:
download_dir (str, optional) –
Path to the data files – if None, the temp folder will be used. If the path does not exist, the data files will be downloaded to the given path.
The default is None.
return_X_y (bool, optional) –
If True, the data is returned together with the labels as two Numpy arrays, otherwise the data is returned as Pandas data frame.
The default is True.
verbose (bool, optional) –
If True, a progress bar is shown while downloading files.
The default is True.
- Returns:
The benchmark data set as either a Pandas data frame or as a pair of (X, y) Numpy arrays.
- Return type:
pandas.DataFrame or tuple[numpy.ndarray, numpy.ndarray]
- class water_benchmark_hub.gecco_waterquality.gecco_water_quality.GeccoWaterQuality2019
Bases:
GeccoWaterQualityClass for Loading GECCO Industrial Challenge 2019 Dataset: A water quality dataset for the “Internet of Things: Online Event Detection for Drinking Water Quality Control” competition organized by F. Rehbach, S. Moritz, T. Bartz-Beielstein at the Genetic and Evolutionary Computation Conference 2019, Prague, Czech Republic.
This is a benchmark (based on
GeccoWaterQuality2018) for anomaly detection algorithms on water quality. The data is provided by the “Thüringer Fernwasserversorgung” (Germany) and constitutes a real-world data set. In this data set, 6 numeric water quality features are given at a sampling rate of 1 min over approx. 3 month. The goal is to predict the presence of an anomaly – i.e. binary classification. The data set itself comes in three splits: A train set, a validation set, and a test set.More information can be found at https://zenodo.org/records/4304080 and https://www.th-koeln.de/informatik-und-ingenieurwissenschaften/gecco-challenge-2019_63244.php
- static get_meta_info() dict
Gets the meta information of this resource.
- Returns:
Meta info.
- Return type:
dict
- static load_data(download_dir: str | None = None, return_X_y: bool = True, verbose: bool = True) dict
Loads GECCO Industrial Challenge 2019 Dataset.
Note
Note that this is NOT a simulated scenario and therefore only the final data set is provided.
- Parameters:
download_dir (str, optional) –
Path to the data files – if None, the temp folder will be used. If the path does not exist, the data files will be downloaded to the given path.
The default is None.
return_X_y (bool, optional) –
If True, the data is returned together with the labels as two Numpy arrays, otherwise the data is returned as Pandas data frame.
The default is True.
verbose (bool, optional) –
If True, a progress bar is shown while downloading files.
The default is True.
- Returns:
The data set as a dictionary with entries “train”, “validation”, and “test” containing the respective data.
- Return type:
dict