-
Raster Forge: Interactive Raster Manipulation Library and GUI for Python
Authors:
Afonso Oliveira,
Nuno Fachada,
João P. Matos-Carvalho
Abstract:
Raster Forge is a Python library and graphical user interface for raster data manipulation and analysis. The tool is focused on remote sensing applications, particularly in wildfire management. It allows users to import, visualize, and process raster layers for tasks such as image compositing or topographical analysis. For wildfire management, it generates fuel maps using predefined models. Its im…
▽ More
Raster Forge is a Python library and graphical user interface for raster data manipulation and analysis. The tool is focused on remote sensing applications, particularly in wildfire management. It allows users to import, visualize, and process raster layers for tasks such as image compositing or topographical analysis. For wildfire management, it generates fuel maps using predefined models. Its impact extends from disaster management to hydrological modeling, agriculture, and environmental monitoring. Raster Forge can be a valuable asset for geoscientists and researchers who rely on raster data analysis, enhancing geospatial data processing and visualization across various disciplines.
△ Less
Submitted 19 May, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Data Science for Geographic Information Systems
Authors:
Afonso Oliveira,
Nuno Fachada,
João P. Matos-Carvalho
Abstract:
The integration of data science into Geographic Information Systems (GIS) has facilitated the evolution of these tools into complete spatial analysis platforms. The adoption of machine learning and big data techniques has equipped these platforms with the capacity to handle larger amounts of increasingly complex data, transcending the limitations of more traditional approaches. This work traces th…
▽ More
The integration of data science into Geographic Information Systems (GIS) has facilitated the evolution of these tools into complete spatial analysis platforms. The adoption of machine learning and big data techniques has equipped these platforms with the capacity to handle larger amounts of increasingly complex data, transcending the limitations of more traditional approaches. This work traces the historical and technical evolution of data science and GIS as fields of study, highlighting the critical points of convergence between domains, and underlining the many sectors that rely on this integration. A GIS application is presented as a case study in the disaster management sector where we utilize aerial data from Tróia, Portugal, to emphasize the process of insight extraction from raw data. We conclude by outlining prospects for future research in integration of these fields in general, and the developed application in particular.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Text clustering with LLM embeddings
Authors:
Alina Petukhova,
João P. Matos-Carvalho,
Nuno Fachada
Abstract:
Text clustering is an important approach for organising the growing amount of digital content, hel** to structure and find hidden patterns in uncategorised data. However, the effectiveness of text clustering heavily relies on the choice of textual embeddings and clustering algorithms. We argue that recent advances in large language models (LLMs) can potentially improve this task. In this researc…
▽ More
Text clustering is an important approach for organising the growing amount of digital content, hel** to structure and find hidden patterns in uncategorised data. However, the effectiveness of text clustering heavily relies on the choice of textual embeddings and clustering algorithms. We argue that recent advances in large language models (LLMs) can potentially improve this task. In this research, we investigated how different textual embeddings -- particularly those used in LLMs -- and clustering algorithms affect how text datasets are clustered. A series of experiments were conducted to assess how embeddings influence clustering results, the role played by dimensionality reduction through summarisation, and model size adjustment. Findings reveal that LLM embeddings excel at capturing subtleties in structured language, while BERT leads the lightweight options in performance. In addition, we observe that increasing model dimensionality and employing summarization techniques do not consistently lead to improvements in clustering efficiency, suggesting that these strategies require careful analysis to use in real-life models. These results highlight a complex balance between the need for refined text representation and computational feasibility in text clustering applications. This study extends traditional text clustering frameworks by incorporating embeddings from LLMs, providing a path for improved methodologies, while informing new avenues for future research in various types of textual analysis.
△ Less
Submitted 30 May, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
A Closed-form Solution for Weight Optimization in Fully-connected Feed-forward Neural Networks
Authors:
Slavisa Tomic,
João Pedro Matos-Carvalho,
Marko Beko
Abstract:
This work addresses weight optimization problem for fully-connected feed-forward neural networks. Unlike existing approaches that are based on back-propagation (BP) and chain rule gradient-based optimization (which implies iterative execution, potentially burdensome and time-consuming in some cases), the proposed approach offers the solution for weight optimization in closed-form by means of least…
▽ More
This work addresses weight optimization problem for fully-connected feed-forward neural networks. Unlike existing approaches that are based on back-propagation (BP) and chain rule gradient-based optimization (which implies iterative execution, potentially burdensome and time-consuming in some cases), the proposed approach offers the solution for weight optimization in closed-form by means of least squares (LS) methodology. In the case where the input-to-output map** is injective, the new approach optimizes the weights in a back-propagating fashion in a single iteration by jointly optimizing a set of weights in each layer for each neuron. In the case where the input-to-output map** is not injective (e.g., in classification problems), the proposed solution is easily adapted to obtain its final solution in a few iterations. An important advantage over the existing solutions is that these computations (for all neurons in a layer) are independent from each other; thus, they can be carried out in parallel to optimize all weights in a given layer simultaneously. Furthermore, its running time is deterministic in the sense that one can obtain the exact number of computations necessary to optimize the weights in all network layers (per iteration, in the case of non-injective map**). Our simulation and empirical results show that the proposed scheme, BPLS, works well and is competitive with existing ones in terms of accuracy, but significantly surpasses them in terms of running time. To summarize, the new method is straightforward to implement, is competitive and computationally more efficient than the existing ones, and is well-tailored for parallel implementation.
△ Less
Submitted 17 June, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
Use of explicit replies as coordination mechanisms in online student debate
Authors:
Bruno D. Ferreira-Saraiva,
Joao P. Matos-Carvalho,
Manuel Pita
Abstract:
People in conversation entrain their linguistic behaviours through spontaneous alignment mechanisms [7] - both in face-to-face and computer-mediated communication (CMC) [8]. In CMC, one of the mechanisms through which linguistic entrainment happens is through explicit replies. Indeed, the use of explicit replies influences the structure of conversations, favouring the formation of reply-trees typi…
▽ More
People in conversation entrain their linguistic behaviours through spontaneous alignment mechanisms [7] - both in face-to-face and computer-mediated communication (CMC) [8]. In CMC, one of the mechanisms through which linguistic entrainment happens is through explicit replies. Indeed, the use of explicit replies influences the structure of conversations, favouring the formation of reply-trees typically delineated by topic shifts [5]. The interpersonal coordination mechanisms realized by how actors address each other have been studied using a probabilistic framework proposed by David Gibson [2,3]. Other recent approaches use computational methods and information theory to quantify changes in text. We explore coordination mechanisms concerned with some of the roles utterances play in dialogues - specifically in explicit replies. We identify these roles by finding community structure in the conversation's vocabulary using a non-parametric, hierarchical topic model. Some conversations may always stay on the ground, remaining at the level of general introductory chatter. Some others may develop a specific sub-topic in significant depth and detail. Even others may jump between general chatter, out-of-topic remarks and people agreeing or disagreeing without further elaboration.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Multispectral Indices for Wildfire Management
Authors:
Afonso Oliveira,
João P. Matos-Carvalho,
Filipe Moutinho,
Nuno Fachada
Abstract:
This paper highlights and summarizes the most important multispectral indices and associated methodologies for fire management. Various fields of study are examined where multispectral indices align with wildfire prevention and management, including vegetation and soil attribute extraction, water feature map**, artificial structure identification, and post-fire burnt area estimation. The versati…
▽ More
This paper highlights and summarizes the most important multispectral indices and associated methodologies for fire management. Various fields of study are examined where multispectral indices align with wildfire prevention and management, including vegetation and soil attribute extraction, water feature map**, artificial structure identification, and post-fire burnt area estimation. The versatility and effectiveness of multispectral indices in addressing specific issues in wildfire management are emphasized. Fundamental insights for optimizing data extraction are presented. Concrete indices for each task, including the NDVI and the NDWI, are suggested. Moreover, to enhance accuracy and address inherent limitations of individual index applications, the integration of complementary processing solutions and additional data sources like high-resolution imagery and ground-based measurements is recommended. This paper aims to be an immediate and comprehensive reference for researchers and stakeholders working on multispectral indices related to the prevention and management of fires.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.