water_benchmark_hub.water_usage

water_benchmark_hub.water_usage.water_usage

Module provides access to the Water Usage Data set by P. Pavlou et al.

class water_benchmark_hub.water_usage.water_usage.WaterUsage

Bases: BenchmarkResource

“Monitoring domestic water consumption: A comparative study of model-based and data-driven end-use disaggregation methods” by P. Pavlou, S. Filippou, S. Solonos, S. G. Vrachimis, K. Malialis, D. G. Eliades, T. Theocharides, M. M. Polycarpou is a benchmark concerning the monitoring of water usage of different household appliances. Informing consumers about it has been shown to have an impact on their behavior toward drinking water conservation. The data were created using the STochastic Residential water End-use Model (STREaM) (Cominola et al., 2018), a modelling software developed that generates synthetic time series data of a household.

This benchmark data set is for identifying active appliances from the aggregated water consumption – i.e. a multi-class classification probelm. The data set considers the use of standard toilet, standard shower, standard faucet, high efficiency clothes washer, and standard dishwasher in a 2-person household for a period of 180 days (6 months) and it has a resolution of 10s. The data set is already split into 3 sub-sets for training (90 days), validation (45 days), and testing (45 days).

For more information see https://github.com/KIOS-Research/Water-Usage-Dataset/

Note

Note that although this data set is synthetic, only the final data set is provided.

This class provides a method for loading the original data set – see load_data() – as well as a method implementing the original scoring mechanism – see compute_evaluation_score().

static compute_evaluation_score(y_pred: numpy.ndarray, y: numpy.ndarray) → dict

Evaluates the performance of a detection method.

Note that instead of a single metric, the following set of metrics is used:

Accuracy
Precision
F1-score (using “micro” averaging)
Cohen’s kappa
ROC AUC

Parameters:

y_pred (numpy.ndarray) – Event indication prediction over time
y (numpy.ndarray) – Ground truth event indication over time.

Returns:

All evaluation scores.

Return type:

dict

static get_meta_info() → dict

Gets the meta information of this resource.

Returns:: Meta info.
Return type:: dict

static load_data(download_dir: str | None = None, return_X_y: bool = True, verbose: bool = True) → dict

Loads the original data set.

Parameters:

download_dir (str, optional) –
Path to the data files – if None, the temp folder will be used. If the path does not exist, the data files will be downloaded to the given path.

The default is None.
return_X_y (bool, optional) –
If True, the data is returned together with the multi-class labels as two Numpy arrays, otherwise, the data is returned as Pandas data frame.

The default is True.
verbose (bool, optional) –
If True, a progress bar is shown while downloading files.

The default is True.

Returns:

The data set as a dictionary with entries “train”, “validation”, and “test” containing the respective data.

Return type:

dict