-
Using remotely sensed data for air pollution assessment
Authors:
Teresa Bernardino,
Maria Alexandra Oliveira,
João Nuno Silva
Abstract:
Air pollution constitutes a global problem of paramount importance that affects not only human health, but also the environment. The existence of spatial and temporal data regarding the concentrations of pollutants is crucial for performing air pollution studies and monitor emissions. However, although observation data presents great temporal coverage, the number of stations is very limited and th…
▽ More
Air pollution constitutes a global problem of paramount importance that affects not only human health, but also the environment. The existence of spatial and temporal data regarding the concentrations of pollutants is crucial for performing air pollution studies and monitor emissions. However, although observation data presents great temporal coverage, the number of stations is very limited and they are usually built in more populated areas.
The main objective of this work is to create models capable of inferring pollutant concentrations in locations where no observation data exists. A machine learning model, more specifically the random forest model, was developed for predicting concentrations in the Iberian Peninsula in 2019 for five selected pollutants: $NO_2$, $O_3$ $SO_2$, $PM10$, and $PM2.5$. Model features include satellite measurements, meteorological variables, land use classification, temporal variables (month, day of year), and spatial variables (latitude, longitude, altitude).
The models were evaluated using various methods, including station 10-fold cross-validation, in which in each fold observations from 10\% of the stations are used as testing data and the rest as training data. The $R^2$, RMSE and mean bias were determined for each model. The $NO_2$ and $O_3$ models presented good values of $R^2$, 0.5524 and 0.7462, respectively. However, the $SO_2$, $PM10$, and $PM2.5$ models performed very poorly in this regard, with $R^2$ values of -0.0231, 0.3722, and 0.3303, respectively. All models slightly overestimated the ground concentrations, except the $O_3$ model. All models presented acceptable cross-validation RMSE, except the $O_3$ and $PM10$ models where the mean value was a little higher (12.5934 $μg/m^3$ and 10.4737 $μg/m^3$, respectively).
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Topological relations in water quality monitoring
Authors:
Bruno Chaves Figueiredo,
Maria Alexandra Oliveira,
João Nuno Silva
Abstract:
The Alqueva Multi-Purpose Project (EFMA) is a massive abduction and storage infrastructure system in the Alentejo, which has a water quality monitoring network with almost thousands of water quality stations distributed across three subsystems: Alqueva, Pedrogão, and Ardila. Identification of pollution sources in complex infrastructure systems, such as the EFMA, requires recognition of water flow…
▽ More
The Alqueva Multi-Purpose Project (EFMA) is a massive abduction and storage infrastructure system in the Alentejo, which has a water quality monitoring network with almost thousands of water quality stations distributed across three subsystems: Alqueva, Pedrogão, and Ardila. Identification of pollution sources in complex infrastructure systems, such as the EFMA, requires recognition of water flow direction and delimitation of areas being drained to specific sampling points. The transfer channels in the EFMA infrastructure artificially connect several water bodies that do not share drainage basins, which further complicates the interpretation of water quality data because the water does not flow exclusively downstream and is not restricted to specific basins.
The existing user-friendly GIS tools do not facilitate the exploration and visualisation of water quality data in spatial-temporal dimensions, such as defining temporal relationships between monitoring campaigns, nor do they allow the establishment of topological and hydrological relationships between different sampling points.
This thesis work proposes a framework capable of aggregating many types of information in a GIS environment, visualising large water quality-related datasets and, a graph data model to integrate and relate water quality between monitoring stations and land use. The graph model allows to exploit the relationship between water quality in a watercourse and reservoirs associated with infrastructures.
The graph data model and the developed framework demonstrated encouraging results and has proven to be preferred when compared to relational databases.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
On the development of an application for the compilation of global sea level changes
Authors:
Mihir Odhavji,
Maria Alexandra Oliveira,
João Nuno Silva
Abstract:
There is a lot of data about mean sea level variation from studies conducted around the globe. This data is dispersed, lacks organization along with standardization, and in most cases, it is not available online. In some instances, when it is available, it is often in unpractical ways and different formats. Analyzing it would be inefficient and very time-consuming. In addition to all of that, to s…
▽ More
There is a lot of data about mean sea level variation from studies conducted around the globe. This data is dispersed, lacks organization along with standardization, and in most cases, it is not available online. In some instances, when it is available, it is often in unpractical ways and different formats. Analyzing it would be inefficient and very time-consuming. In addition to all of that, to successfully process spatial-temporal data, the user has to be equipped with particular skills and tools used for geographic data like PostGIS, PostgreSQL and GeoAlchemy. The presented solution is to develop a web application that solves some of the issues faced by researchers. The web application allows the user to add data, be it through forms in a browser or automated with the help of an API. The application also assists with data querying, processing and visualization by making tables, showing maps and drawing graphs. Comparing data points from different areas and publications is also made possible. The implemented web application permits the query and storage of spatial-temporal data about mean sea level variation in a simplified, easily accessible and user-friendly manner. It will also allow the realization of more global studies.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
LiDAR data acquisition and processing for ecology applications
Authors:
Ion Ciobotari,
Adriana Príncipe,
Maria Alexandra Oliveira,
João Nuno Silva
Abstract:
The collection of ecological data in the field is essential to diagnose, monitor and manage ecosystems in a sustainable way. Since acquisition of this information through traditional methods are generally time-consuming, due to the capability of recording large volumes of data in short time periods, automation of data acquisition sees a growing trend. Terrestrial laser scanners (TLS), particularly…
▽ More
The collection of ecological data in the field is essential to diagnose, monitor and manage ecosystems in a sustainable way. Since acquisition of this information through traditional methods are generally time-consuming, due to the capability of recording large volumes of data in short time periods, automation of data acquisition sees a growing trend. Terrestrial laser scanners (TLS), particularly LiDAR sensors, have been used in ecology, allowing to reconstruct the 3D structure of vegetation, and thus, infer ecosystem characteristics based on the spatial variation of the density of points. However, the low amount of information obtained per beam, lack of data analysis tools and the high cost of the equipment limit their use. This way, a low-cost TLS (<10k$) was developed along with data acquisition and processing mechanisms applicable in two case studies: an urban garden and a target area for ecological restoration. The orientation of LiDAR was modified to make observations in the vertical plane and a motor was integrated for its rotation, enabling the acquisition of 360 degree data with high resolution. Motion and location sensors were also integrated for automatic error correction and georeferencing. From the data generated, histograms of point density variation along the vegetation height were created, where shrub stratum was easily distinguishable from tree stratum, and maximum tree height and shrub cover were calculated. These results agreed with the field data, whereby the developed TLS has proved to be effective in calculating metrics of structural complexity of vegetation.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Semantic Modelling of Organizational Knowledge as a Basis for Enterprise Data Governance 4.0 -- Application to a Unified Clinical Data Model
Authors:
Miguel AP Oliveira,
Stephane Manara,
Bruno Molé,
Thomas Muller,
Aurélien Guillouche,
Lysann Hesske,
Bruce Jordan,
Gilles Hubert,
Chinmay Kulkarni,
Pralipta Jagdev,
Cedric R. Berger
Abstract:
Individuals and organizations cope with an always-growing amount of data, which is heterogeneous in its contents and formats. An adequate data management process yielding data quality and control over its lifecycle is a prerequisite to getting value out of this data and minimizing inherent risks related to multiple usages. Common data governance frameworks rely on people, policies, and processes t…
▽ More
Individuals and organizations cope with an always-growing amount of data, which is heterogeneous in its contents and formats. An adequate data management process yielding data quality and control over its lifecycle is a prerequisite to getting value out of this data and minimizing inherent risks related to multiple usages. Common data governance frameworks rely on people, policies, and processes that fall short of the overwhelming complexity of data. Yet, harnessing this complexity is necessary to achieve high-quality standards. The latter will condition any downstream data usage outcome, including generative artificial intelligence trained on this data. In this paper, we report our concrete experience establishing a simple, cost-efficient framework that enables metadata-driven, agile and (semi-)automated data governance (i.e. Data Governance 4.0). We explain how we implement and use this framework to integrate 25 years of clinical study data at an enterprise scale in a fully productive environment. The framework encompasses both methodologies and technologies leveraging semantic web principles. We built a knowledge graph describing avatars of data assets in their business context, including governance principles. Multiple ontologies articulated by an enterprise upper ontology enable key governance actions such as FAIRification, lifecycle management, definition of roles and responsibilities, lineage across transformations and provenance from source systems. This metadata model is the keystone to data governance 4.0: a semi-automatised data management process that considers the business context in an agile manner to adapt governance constraints to each use case and dynamically tune it based on business changes.
△ Less
Submitted 23 November, 2023; v1 submitted 20 October, 2023;
originally announced November 2023.
-
Image analysis for automatic measurement of crustose lichens
Authors:
Pedro Guedes,
Maria Alexandra Oliveira,
Cristina Branquinho,
João Nuno Silva
Abstract:
Lichens, organisms resulting from a symbiosis between a fungus and an algae, are frequently used as age estimators, especially in recent geological deposits and archaeological structures, using the correlation between lichen size and age. Current non-automated manual lichen and measurement (with ruler, calipers or using digital image processing tools) is a time-consuming and laborious process, esp…
▽ More
Lichens, organisms resulting from a symbiosis between a fungus and an algae, are frequently used as age estimators, especially in recent geological deposits and archaeological structures, using the correlation between lichen size and age. Current non-automated manual lichen and measurement (with ruler, calipers or using digital image processing tools) is a time-consuming and laborious process, especially when the number of samples is high.
This work presents a workflow and set of image acquisition and processing tools developed to efficiently identify lichen thalli in flat rocky surfaces, and to produce relevant lichen size statistics (percentage cover, number of thalli, their area and perimeter).
The developed workflow uses a regular digital camera for image capture along with specially designed targets to allow for automatic image correction and scale assignment. After this step, lichen identification is done in a flow comprising assisted image segmentation and classification based on interactive foreground extraction tool (GrabCut) and automatic classification of images using Simple Linear Iterative Clustering (SLIC) for image segmentation and Support Vector Machines (SV) and Random Forest classifiers.
Initial evaluation shows promising results. The manual classification of images (for training) using GrabCut show an average speedup of 4 if compared with currently used techniques and presents an average precision of 95\%. The automatic classification using SLIC and SVM with default parameters produces results with average precision higher than 70\%. The developed system is flexible and allows a considerable reduction of processing time, the workflow allows it applicability to data sets of new lichen populations.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
GAPS: Geo Data Portals for Air Pollution Studies
Authors:
Filipe Fernandes,
Helena Serrano,
Maria Alexandra Oliveira,
Cristina Branquinho,
João Nuno Silva
Abstract:
There is a wealth of data on air pollution within several users' reach, including modelled concentrations and depositions as well as observations from air quality stations. However, data integration to perceive spatial and temporal trends at the national level is a complex undertaking. The difficulties are mainly related to the data sources (many files with a lot of information and in different fo…
▽ More
There is a wealth of data on air pollution within several users' reach, including modelled concentrations and depositions as well as observations from air quality stations. However, data integration to perceive spatial and temporal trends at the national level is a complex undertaking. The difficulties are mainly related to the data sources (many files with a lot of information and in different formats). In addition, the processing of this data is time-consuming and impractical when the time period under analysis increases. Furthermore,the processing of spatial-temporal data requires the use of multiple specific tools, such as geographic information systems software and statistical, which are not readily accessible to non-specialised users. The proposed solution is to develop libraries that are responsible for aggregating different types of data related with pollution. In addition to allowing the integration of different data, the libraries are also the basis for develo** web applications. The libraries allow the collection of data made available by the Portuguese Environment Agency (APA) and official modelling results for Europe. The applications developed will extend the libraries to allow data processing and representation through Dashboards. These Dashboards provide access to non-specialised users, namely regarding the trends of pollutants in each region. The implemented libraries allow the development of projects that would need access to the same data sources. These Dashboards allow for simplified access to data and for studies that have so far been beyond the reach of researchers.
△ Less
Submitted 4 March, 2021;
originally announced March 2021.