-
Large-Scale Multipurpose Benchmark Datasets For Assessing Data-Driven Deep Learning Approaches For Water Distribution Networks
Authors:
Andres Tello,
Huy Truong,
Alexander Lazovik,
Victoria Degeler
Abstract:
Currently, the number of common benchmark datasets that researchers can use straight away for assessing data-driven deep learning approaches is very limited. Most studies provide data as configuration files. It is still up to each practitioner to follow a particular data generation method and run computationally intensive simulations to obtain usable data for model training and evaluation. In this…
▽ More
Currently, the number of common benchmark datasets that researchers can use straight away for assessing data-driven deep learning approaches is very limited. Most studies provide data as configuration files. It is still up to each practitioner to follow a particular data generation method and run computationally intensive simulations to obtain usable data for model training and evaluation. In this work, we provide a collection of datasets that includes several small and medium size publicly available Water Distribution Networks (WDNs), including Anytown, Modena, Balerma, C-Town, D-Town, L-Town, Ky1, Ky6, Ky8, and Ky13. In total 1,394,400 hours of WDNs data operating under normal conditions is made available to the community.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Graph Neural Networks for Pressure Estimation in Water Distribution Systems
Authors:
Huy Truong,
Andrés Tello,
Alexander Lazovik,
Victoria Degeler
Abstract:
Pressure and flow estimation in Water Distribution Networks (WDN) allows water management companies to optimize their control operations. For many years, mathematical simulation tools have been the most common approach to reconstructing an estimate of the WDN hydraulics. However, pure physics-based simulations involve several challenges, e.g. partially observable data, high uncertainty, and extens…
▽ More
Pressure and flow estimation in Water Distribution Networks (WDN) allows water management companies to optimize their control operations. For many years, mathematical simulation tools have been the most common approach to reconstructing an estimate of the WDN hydraulics. However, pure physics-based simulations involve several challenges, e.g. partially observable data, high uncertainty, and extensive manual configuration. Thus, data-driven approaches have gained traction to overcome such limitations. In this work, we combine physics-based modeling and Graph Neural Networks (GNN), a data-driven approach, to address the pressure estimation problem. First, we propose a new data generation method using a mathematical simulation but not considering temporal patterns and including some control parameters that remain untouched in previous works; this contributes to a more diverse training data. Second, our training strategy relies on random sensor placement making our GNN-based estimation model robust to unexpected sensor location changes. Third, a realistic evaluation protocol considers real temporal patterns and additionally injects the uncertainties intrinsic to real-world scenarios. Finally, a multi-graph pre-training strategy allows the model to be reused for pressure estimation in unseen target WDNs. Our GNN-based model estimates the pressure of a large-scale WDN in The Netherlands with a MAE of 1.94mH$_2$O and a MAPE of 7%, surpassing the performance of previous studies. Likewise, it outperformed previous approaches on other WDN benchmarks, showing a reduction of absolute error up to approximately 52% in the best cases.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Too Good To Be True: performance overestimation in (re)current practices for Human Activity Recognition
Authors:
Andrés Tello,
Victoria Degeler,
Alexander Lazovik
Abstract:
Today, there are standard and well established procedures within the Human Activity Recognition (HAR) pipeline. However, some of these conventional approaches lead to accuracy overestimation. In particular, sliding windows for data segmentation followed by standard random k-fold cross validation, produce biased results. An analysis of previous literature and present-day studies, surprisingly, show…
▽ More
Today, there are standard and well established procedures within the Human Activity Recognition (HAR) pipeline. However, some of these conventional approaches lead to accuracy overestimation. In particular, sliding windows for data segmentation followed by standard random k-fold cross validation, produce biased results. An analysis of previous literature and present-day studies, surprisingly, shows that these are common approaches in state-of-the-art studies on HAR. It is important to raise awareness in the scientific community about this problem, whose negative effects are being overlooked. Otherwise, publications of biased results lead to papers that report lower accuracies, with correct unbiased methods, harder to publish. Several experiments with different types of datasets and different types of classification models allow us to exhibit the problem and show it persists independently of the method or dataset.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Corneal endothelium assessment in specular microscopy images with Fuchs' dystrophy via deep regression of signed distance maps
Authors:
Juan S. Sierra,
Jesus Pineda,
Daniela Rueda,
Alejandro Tello,
Angelica M. Prada,
Virgilio Galvis,
Giovanni Volpe,
Maria S. Millan,
Lenny A. Romero,
Andres G. Marrugo
Abstract:
Specular microscopy assessment of the human corneal endothelium (CE) in Fuchs' dystrophy is challenging due to the presence of dark image regions called guttae. This paper proposes a UNet-based segmentation approach that requires minimal post-processing and achieves reliable CE morphometric assessment and guttae identification across all degrees of Fuchs' dystrophy. We cast the segmentation proble…
▽ More
Specular microscopy assessment of the human corneal endothelium (CE) in Fuchs' dystrophy is challenging due to the presence of dark image regions called guttae. This paper proposes a UNet-based segmentation approach that requires minimal post-processing and achieves reliable CE morphometric assessment and guttae identification across all degrees of Fuchs' dystrophy. We cast the segmentation problem as a regression task of the cell and gutta signed distance maps instead of a pixel-level classification task as typically done with UNets. Compared to the conventional UNet classification approach, the distance-map regression approach converges faster in clinically relevant parameters. It also produces morphometric parameters that agree with the manually-segmented ground-truth data, namely the average cell density difference of -41.9 cells/mm2 (95% confidence interval (CI) [-306.2, 222.5]) and the average difference of mean cell area of 14.8 um2 (95% CI [-41.9, 71.5]). These results suggest a promising alternative for CE assessment.
△ Less
Submitted 29 November, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Bottom-up strategy for data retrieval and data entry over front-end application Software
Authors:
Rusel Cierto Trinidad,
Alcides Bernardo Tello
Abstract:
Some people implement pattern and best practices without analyzing its efficiency on their projects. Consequently, our goal in this article is to convince software developers that it is worth to make an earnest effort to evaluate the use of best practices and software patterns. For such purpose, in this study we took a concrete case system for geographical locations inputs through user interfaces.…
▽ More
Some people implement pattern and best practices without analyzing its efficiency on their projects. Consequently, our goal in this article is to convince software developers that it is worth to make an earnest effort to evaluate the use of best practices and software patterns. For such purpose, in this study we took a concrete case system for geographical locations inputs through user interfaces. Then, we performed a comparative study on a traditional method against our approach, named reverse logistic to retrieve results, by measuring the time that a user spends to perform actions when entering data into a system. Surprisingly, we had a decrease of 59% in the amount of time spent in comparison to the time spent on the traditional method. This result lays a foundation for feeding data from the typical final step and search based on string matching algorithms, speeding up the interaction between people and computer response
△ Less
Submitted 21 February, 2019;
originally announced February 2019.