-
Adaptive Fusion of Multi-view Remote Sensing data for Optimal Sub-field Crop Yield Prediction
Authors:
Francisco Mena,
Deepak Pathak,
Hiba Najjar,
Cristhian Sanchez,
Patrick Helber,
Benjamin Bischke,
Peter Habelitz,
Miro Miranda,
Jayanth Siddamsetty,
Marlon Nuske,
Marcela Charfuelan,
Diego Arenas,
Michaela Vollmer,
Andreas Dengel
Abstract:
Accurate crop yield prediction is of utmost importance for informed decision-making in agriculture, aiding farmers, and industry stakeholders. However, this task is complex and depends on multiple factors, such as environmental conditions, soil properties, and management practices. Combining heterogeneous data views poses a fusion challenge, like identifying the view-specific contribution to the p…
▽ More
Accurate crop yield prediction is of utmost importance for informed decision-making in agriculture, aiding farmers, and industry stakeholders. However, this task is complex and depends on multiple factors, such as environmental conditions, soil properties, and management practices. Combining heterogeneous data views poses a fusion challenge, like identifying the view-specific contribution to the predictive task. We present a novel multi-view learning approach to predict crop yield for different crops (soybean, wheat, rapeseed) and regions (Argentina, Uruguay, and Germany). Our multi-view input data includes multi-spectral optical images from Sentinel-2 satellites and weather data as dynamic features during the crop growing season, complemented by static features like soil properties and topographic information. To effectively fuse the data, we introduce a Multi-view Gated Fusion (MVGF) model, comprising dedicated view-encoders and a Gated Unit (GU) module. The view-encoders handle the heterogeneity of data sources with varying temporal resolutions by learning a view-specific representation. These representations are adaptively fused via a weighted sum. The fusion weights are computed for each sample by the GU using a concatenation of the view-representations. The MVGF model is trained at sub-field level with 10 m resolution pixels. Our evaluations show that the MVGF outperforms conventional models on the same task, achieving the best results by incorporating all the data sources, unlike the usual fusion results in the literature. For Argentina, the MVGF model achieves an R2 value of 0.68 at sub-field yield prediction, while at field level evaluation (comparing field averages), it reaches around 0.80 across different countries. The GU module learned different weights based on the country and crop-type, aligning with the variable significance of each data source to the prediction task.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Classifying Proposals of Decentralized Autonomous Organizations Using Large Language Models
Authors:
Christian Ziegler,
Marcos Miranda,
Guangye Cao,
Gustav Arentoft,
Doo Wan Nam
Abstract:
Our study demonstrates the effective use of Large Language Models (LLMs) for automating the classification of complex datasets. We specifically target proposals of Decentralized Autonomous Organizations (DAOs), as the clas-sification of this data requires the understanding of context and, therefore, depends on human expertise, leading to high costs associated with the task. The study applies an it…
▽ More
Our study demonstrates the effective use of Large Language Models (LLMs) for automating the classification of complex datasets. We specifically target proposals of Decentralized Autonomous Organizations (DAOs), as the clas-sification of this data requires the understanding of context and, therefore, depends on human expertise, leading to high costs associated with the task. The study applies an iterative approach to specify categories and further re-fine them and the prompt in each iteration, which led to an accuracy rate of 95% in classifying a set of 100 proposals. With this, we demonstrate the po-tential of LLMs to automate data labeling tasks that depend on textual con-text effectively.
△ Less
Submitted 3 July, 2024; v1 submitted 13 January, 2024;
originally announced January 2024.
-
Topic Shifts as a Proxy for Assessing Politicization in Social Media
Authors:
Marcelo Sartori Locatelli,
Pedro Calais,
Matheus Prado Miranda,
João Pedro Junho,
Tomas Lacerda Muniz,
Wagner Meira Jr.,
Virgilio Almeida
Abstract:
Politicization is a social phenomenon studied by political science characterized by the extent to which ideas and facts are given a political tone. A range of topics, such as climate change, religion and vaccines has been subject to increasing politicization in the media and social media platforms. In this work, we propose a computational method for assessing politicization in online conversations…
▽ More
Politicization is a social phenomenon studied by political science characterized by the extent to which ideas and facts are given a political tone. A range of topics, such as climate change, religion and vaccines has been subject to increasing politicization in the media and social media platforms. In this work, we propose a computational method for assessing politicization in online conversations based on topic shifts, i.e., the degree to which people switch topics in online conversations. The intuition is that topic shifts from a non-political topic to politics are a direct measure of politicization -- making something political, and that the more people switch conversations to politics, the more they perceive politics as playing a vital role in their daily lives. A fundamental challenge that must be addressed when one studies politicization in social media is that, a priori, any topic may be politicized. Hence, any keyword-based method or even machine learning approaches that rely on topic labels to classify topics are expensive to run and potentially ineffective. Instead, we learn from a seed of political keywords and use Positive-Unlabeled (PU) Learning to detect political comments in reaction to non-political news articles posted on Twitter, YouTube, and TikTok during the 2022 Brazilian presidential elections. Our findings indicate that all platforms show evidence of politicization as discussion around topics adjacent to politics such as economy, crime and drugs tend to shift to politics. Even the least politicized topics had the rate in which their topics shift to politics increased in the lead up to the elections and after other political events in Brazil -- an evidence of politicization.
△ Less
Submitted 13 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Enabling Mobility-Oriented JCAS in 6G Networks: An Architecture Proposal
Authors:
Philipp Rosemann,
Sanket Partani,
Marc Miranda,
Jannik Mähn,
Michael Karrenbauer,
William Meli,
Rodrigo Hernangomez,
Maximilian Lübke,
Jacob Kochems,
Stefan Köpsell,
Anosch Aziz-Koch,
Julia Beuster,
Oliver Blume,
Norman Franchi,
Reiner Thomä,
Slawomir Stanczak,
Hans D. Schotten
Abstract:
Sensing plays a crucial role in autonomous and assisted vehicular driving, as well as in the operation of autonomous drones. The traditional segregation of communication and onboard sensing systems in mobility applications is due to be merged using Joint Communication and Sensing (JCAS) in the development of the 6G mobile radio standard. The integration of JCAS functions into the future road traff…
▽ More
Sensing plays a crucial role in autonomous and assisted vehicular driving, as well as in the operation of autonomous drones. The traditional segregation of communication and onboard sensing systems in mobility applications is due to be merged using Joint Communication and Sensing (JCAS) in the development of the 6G mobile radio standard. The integration of JCAS functions into the future road traffic landscape introduces novel challenges for the design of the 6G system architecture. Special emphasis will be placed on facilitating direct communication between road users and aerial drones. In various mobility scenarios, diverse levels of integration will be explored, ranging from leveraging communication capabilities to coordinate different radars to achieving deep integration through a unified waveform. In this paper, we have identified use cases and derive five higher-level Tech Cases (TCs). Technical and functional requirements for the 6G system architecture for a device-oriented JCAS approach will be extracted from the TCs and used to conceptualize the architectural views.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Predicting Crop Yield With Machine Learning: An Extensive Analysis Of Input Modalities And Models On a Field and sub-field Level
Authors:
Deepak Pathak,
Miro Miranda,
Francisco Mena,
Cristhian Sanchez,
Patrick Helber,
Benjamin Bischke,
Peter Habelitz,
Hiba Najjar,
Jayanth Siddamsetty,
Diego Arenas,
Michaela Vollmer,
Marcela Charfuelan,
Marlon Nuske,
Andreas Dengel
Abstract:
We introduce a simple yet effective early fusion method for crop yield prediction that handles multiple input modalities with different temporal and spatial resolutions. We use high-resolution crop yield maps as ground truth data to train crop and machine learning model agnostic methods at the sub-field level. We use Sentinel-2 satellite imagery as the primary modality for input data with other co…
▽ More
We introduce a simple yet effective early fusion method for crop yield prediction that handles multiple input modalities with different temporal and spatial resolutions. We use high-resolution crop yield maps as ground truth data to train crop and machine learning model agnostic methods at the sub-field level. We use Sentinel-2 satellite imagery as the primary modality for input data with other complementary modalities, including weather, soil, and DEM data. The proposed method uses input modalities available with global coverage, making the framework globally scalable. We explicitly highlight the importance of input modalities for crop yield prediction and emphasize that the best-performing combination of input modalities depends on region, crop, and chosen model.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Discriminatory or Samaritan -- which AI is needed for humanity? An Evolutionary Game Theory Analysis of Hybrid Human-AI populations
Authors:
Tim Booker,
Manuel Miranda,
Jesús A. Moreno López,
José María Ramos Fernández,
Max Reddel,
Valeria Widler,
Filippo Zimmaro,
Alberto Antonioni,
The Anh Han
Abstract:
As artificial intelligence (AI) systems are increasingly embedded in our lives, their presence leads to interactions that shape our behaviour, decision-making, and social interactions. Existing theoretical research has primarily focused on human-to-human interactions, overlooking the unique dynamics triggered by the presence of AI. In this paper, resorting to methods from evolutionary game theory,…
▽ More
As artificial intelligence (AI) systems are increasingly embedded in our lives, their presence leads to interactions that shape our behaviour, decision-making, and social interactions. Existing theoretical research has primarily focused on human-to-human interactions, overlooking the unique dynamics triggered by the presence of AI. In this paper, resorting to methods from evolutionary game theory, we study how different forms of AI influence the evolution of cooperation in a human population playing the one-shot Prisoner's Dilemma game in both well-mixed and structured populations. We found that Samaritan AI agents that help everyone unconditionally, including defectors, can promote higher levels of cooperation in humans than Discriminatory AI that only help those considered worthy/cooperative, especially in slow-moving societies where change is viewed with caution or resistance (small intensities of selection). Intuitively, in fast-moving societies (high intensities of selection), Discriminatory AIs promote higher levels of cooperation than Samaritan AIs.
△ Less
Submitted 3 July, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
PADLL: Taming Metadata-intensive HPC Jobs Through Dynamic, Application-agnostic QoS Control
Authors:
Ricardo Macedo,
Mariana Miranda,
Yusuke Tanimura,
Jason Haga,
Amit Ruhela,
Stephen Lien Harrell,
Richard Todd Evans,
José Pereira,
João Paulo
Abstract:
Modern I/O applications that run on HPC infrastructures are increasingly becoming read and metadata intensive. However, having multiple concurrent applications submitting large amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to overall performance degradation and I/O unfairness. We present PADLL, an application and file system agnost…
▽ More
Modern I/O applications that run on HPC infrastructures are increasingly becoming read and metadata intensive. However, having multiple concurrent applications submitting large amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to overall performance degradation and I/O unfairness. We present PADLL, an application and file system agnostic storage middleware that enables QoS control of data and metadata workflows in HPC storage systems. It adopts ideas from Software-Defined Storage, building data plane stages that mediate and rate limit POSIX requests submitted to the shared file system, and a control plane that holistically coordinates how all I/O workflows are handled. We demonstrate its performance and feasibility under multiple QoS policies using synthetic benchmarks, real-world applications, and traces collected from a production file system. Results show that PADLL can enforce complex storage QoS policies over concurrent metadata-aggressive jobs, ensuring fairness and prioritization.
△ Less
Submitted 23 March, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Neuroimaging Feature Extraction using a Neural Network Classifier for Imaging Genetics
Authors:
Cédric Beaulac,
Sidi Wu,
Erin Gibson,
Michelle F. Miranda,
Jiguo Cao,
Leno Rocha,
Mirza Faisal Beg,
Farouk S. Nathoo
Abstract:
A major issue in the association of genes to neuroimaging phenotypes is the high dimension of both genetic data and neuroimaging data. In this article, we tackle the latter problem with an eye toward develo** solutions that are relevant for disease prediction. Supported by a vast literature on the predictive power of neural networks, our proposed solution uses neural networks to extract from neu…
▽ More
A major issue in the association of genes to neuroimaging phenotypes is the high dimension of both genetic data and neuroimaging data. In this article, we tackle the latter problem with an eye toward develo** solutions that are relevant for disease prediction. Supported by a vast literature on the predictive power of neural networks, our proposed solution uses neural networks to extract from neuroimaging data features that are relevant for predicting Alzheimer's Disease (AD) for subsequent relation to genetics. Our neuroimaging-genetic pipeline is comprised of image processing, neuroimaging feature extraction and genetic association steps. We propose a neural network classifier for extracting neuroimaging features that are related with disease and a multivariate Bayesian group sparse regression model for genetic association. We compare the predictive power of these features to expert selected features and take a closer look at the SNPs identified with the new neuroimaging features.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
HiClass: a Python library for local hierarchical classification compatible with scikit-learn
Authors:
Fábio M. Miranda,
Niklas Köhnecke,
Bernhard Y. Renard
Abstract:
HiClass is an open-source Python library for local hierarchical classification entirely compatible with scikit-learn. It contains implementations of the most common design patterns for hierarchical machine learning models found in the literature, that is, the local classifiers per node, per parent node and per level. Additionally, the package contains implementations of hierarchical metrics, which…
▽ More
HiClass is an open-source Python library for local hierarchical classification entirely compatible with scikit-learn. It contains implementations of the most common design patterns for hierarchical machine learning models found in the literature, that is, the local classifiers per node, per parent node and per level. Additionally, the package contains implementations of hierarchical metrics, which are more appropriate for evaluating classification performance on hierarchical data. The documentation includes installation and usage instructions, examples within tutorials and interactive notebooks, and a complete description of the API. HiClass is released under the simplified BSD license, encouraging its use in both academic and commercial environments. Source code and documentation are available at https://github.com/scikit-learn-contrib/hiclass.
△ Less
Submitted 3 January, 2023; v1 submitted 13 December, 2021;
originally announced December 2021.
-
A Bayesian machine scientist to aid in the solution of challenging scientific problems
Authors:
Roger Guimera,
Ignasi Reichardt,
Antoni Aguilar-Mogas,
Francesco A Massucci,
Manuel Miranda,
Jordi Pallares,
Marta Sales-Pardo
Abstract:
Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need "machine scientists" that are able to extract these models automatically from data. Here, w…
▽ More
Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need "machine scientists" that are able to extract these models automatically from data. Here, we introduce a Bayesian machine scientist, which establishes the plausibility of models using explicit approximations to the exact marginal posterior over models and establishes its prior expectations about models by learning from a large empirical corpus of mathematical expressions. It explores the space of models using Markov chain Monte Carlo. We show that this approach uncovers accurate models for synthetic and real data and provides out-of-sample predictions that are more accurate than those of existing approaches and of other nonparametric methods.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Density estimation in representation space to predict model uncertainty
Authors:
Tiago Ramalho,
Miguel Miranda
Abstract:
Deep learning models frequently make incorrect predictions with high confidence when presented with test examples that are not well represented in their training dataset. We propose a novel and straightforward approach to estimate prediction uncertainty in a pre-trained neural network model. Our method estimates the training data density in representation space for a novel input. A neural network…
▽ More
Deep learning models frequently make incorrect predictions with high confidence when presented with test examples that are not well represented in their training dataset. We propose a novel and straightforward approach to estimate prediction uncertainty in a pre-trained neural network model. Our method estimates the training data density in representation space for a novel input. A neural network model then uses this information to determine whether we expect the pre-trained model to make a correct prediction. This uncertainty model is trained by predicting in-distribution errors, but can detect out-of-distribution data without having seen any such example. We test our method for a state-of-the art image classification model in the settings of both in-distribution uncertainty estimation as well as out-of-distribution detection.
△ Less
Submitted 3 October, 2019; v1 submitted 20 August, 2019;
originally announced August 2019.
-
Classification of EEG Signals using Genetic Programming for Feature Construction
Authors:
Icaro Marcelino Miranda,
Claus Aranha,
Marcelo Ladeira
Abstract:
The analysis of electroencephalogram (EEG) waves is of critical importance for the diagnosis of sleep disorders, such as sleep apnea and insomnia, besides that, seizures, epilepsy, head injuries, dizziness, headaches and brain tumors. In this context, one important task is the identification of visible structures in the EEG signal, such as sleep spindles and K-complexes. The identification of thes…
▽ More
The analysis of electroencephalogram (EEG) waves is of critical importance for the diagnosis of sleep disorders, such as sleep apnea and insomnia, besides that, seizures, epilepsy, head injuries, dizziness, headaches and brain tumors. In this context, one important task is the identification of visible structures in the EEG signal, such as sleep spindles and K-complexes. The identification of these structures is usually performed by visual inspection from human experts, a process that can be error prone and susceptible to biases. Therefore there is interest in develo** technologies for the automated analysis of EEG. In this paper, we propose a new Genetic Programming (GP) framework for feature construction and dimensionality reduction from EEG signals. We use these features to automatically identify spindles and K-complexes on data from the DREAMS project. Using 5 different classifiers, the set of attributes produced by GP obtained better AUC scores than those obtained from PCA or the full set of attributes. Also, the results obtained from the proposed framework obtained a better balance of Specificity and Recall than other models recently proposed in the literature. Analysis of the features most used by GP also suggested improvements for data acquisition protocols in future EEG examinations.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Linguistic Diversities of Demographic Groups in Twitter
Authors:
Pantelis Vikatos,
Johnnatan Messias,
Manoel Miranda,
Fabricio Benevenuto
Abstract:
The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter. In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algor…
▽ More
The massive popularity of online social media provides a unique opportunity for researchers to study the linguistic characteristics and patterns of user's interactions. In this paper, we provide an in-depth characterization of language usage across demographic groups in Twitter. In particular, we extract the gender and race of Twitter users located in the U.S. using advanced image processing algorithms from Face++. Then, we investigate how demographic groups (i.e. male/female, Asian/Black/White) differ in terms of linguistic styles and also their interests. We extract linguistic features from 6 categories (affective attributes, cognitive attributes, lexical density and awareness, temporal references, social and personal concerns, and interpersonal focus), in order to identify the similarities and differences in particular writing set of attributes. In addition, we extract the absolute ranking difference of top phrases between demographic groups. As a dimension of diversity, we also use the topics of interest that we retrieve from each user. Our analysis unveils clear differences in the writing styles (and the topics of interest) of different demographic groups, with variation seen across both gender and race lines. We hope our effort can stimulate the development of new studies related to demographic information in the online space.
△ Less
Submitted 10 May, 2017;
originally announced May 2017.
-
The Emergence of Crowdsourcing among Pokémon Go Players
Authors:
Priscila Martins,
Manoel Miranda,
Fabrício Benevenuto,
Jussara Almeida
Abstract:
Since its launching, Pok{é}mon Go has been pointed as the largest gaming phenomenon of the smartphone age. As the game requires the user to walk in the real world to see and capture Pok{é}mons, a new wave of crowdsourcing apps have emerged to allow users to collaborate with each other, sharing where and when Pok{é}mons were found. In this paper we characterize one of such initiatives, called PokeC…
▽ More
Since its launching, Pok{é}mon Go has been pointed as the largest gaming phenomenon of the smartphone age. As the game requires the user to walk in the real world to see and capture Pok{é}mons, a new wave of crowdsourcing apps have emerged to allow users to collaborate with each other, sharing where and when Pok{é}mons were found. In this paper we characterize one of such initiatives, called PokeCrew. Our analyses uncover a set of aspects of user behavior and system usage in such emerging crowdsourcing task, hel** unveil some problems and benefits. We hope our effort can inspire the design of new crowdsourcing systems.
△ Less
Submitted 24 March, 2017;
originally announced March 2017.