Search | arXiv e-print repository

arXiv:2402.10035 [pdf, other]

Investigation of Federated Learning Algorithms for Retinal Optical Coherence Tomography Image Classification with Statistical Heterogeneity

Authors: Sanskar Amgain, Prashant Shrestha, Sophia Bano, Ignacio del Valle Torres, Michael Cunniffe, Victor Hernandez, Phil Beales, Binod Bhattarai

Abstract: Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely. Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy conc… ▽ More Purpose: We apply federated learning to train an OCT image classifier simulating a realistic scenario with multiple clients and statistical heterogeneous data distribution where data in the clients lack samples of some categories entirely. Methods: We investigate the effectiveness of FedAvg and FedProx to train an OCT image classification model in a decentralized fashion, addressing privacy concerns associated with centralizing data. We partitioned a publicly available OCT dataset across multiple clients under IID and Non-IID settings and conducted local training on the subsets for each client. We evaluated two federated learning methods, FedAvg and FedProx for these settings. Results: Our experiments on the dataset suggest that under IID settings, both methods perform on par with training on a central data pool. However, the performance of both algorithms declines as we increase the statistical heterogeneity across the client data, while FedProx consistently performs better than FedAvg in the increased heterogeneity settings. Conclusion: Despite the effectiveness of federated learning in the utilization of private data across multiple medical institutions, the large number of clients and heterogeneous distribution of labels deteriorate the performance of both algorithms. Notably, FedProx appears to be more robust to the increased heterogeneity. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2308.05254 [pdf, other]

Data-driven Intra-Autonomous Systems Graph Generator

Authors: Caio Vinicius Dadauto, Nelson Luis Saldanha da Fonseca, Ricardo da Silva Torres

Abstract: Accurate modeling of realistic network topologies is essential for evaluating novel Internet solutions. Current topology generators, notably scale-free-based models, fail to capture multiple properties of intra-AS topologies. While scale-free networks encode node-degree distribution, they overlook crucial graph properties like betweenness, clustering, and assortativity. The limitations of existing… ▽ More Accurate modeling of realistic network topologies is essential for evaluating novel Internet solutions. Current topology generators, notably scale-free-based models, fail to capture multiple properties of intra-AS topologies. While scale-free networks encode node-degree distribution, they overlook crucial graph properties like betweenness, clustering, and assortativity. The limitations of existing generators pose challenges for training and evaluating deep learning models in communication networks, emphasizing the need for advanced topology generators encompassing diverse Internet topology characteristics. This paper introduces a novel deep-learning-based generator of synthetic graphs representing intra-autonomous in the Internet, named Deep-Generative Graphs for the Internet (DGGI). It also presents a novel massive dataset of real intra-AS graphs extracted from the project ITDK, called IGraphs. It is shown that DGGI creates synthetic graphs that accurately reproduce the properties of centrality, clustering, assortativity, and node degree. The DGGI generator overperforms existing Internet topology generators. On average, DGGI improves the MMD metric $84.4\%$, $95.1\%$, $97.9\%$, and $94.7\%$ for assortativity, betweenness, clustering, and node degree, respectively. △ Less

Submitted 26 February, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: 14 pages, 15 figures

arXiv:2304.13120 [pdf, other]

Precision Spectroscopy of Fast, Hot Exotic Isotopes Using Machine Learning Assisted Event-by-Event Doppler Correction

Authors: Silviu-Marian Udrescu, Diego Alejandro Torres, Ronald Fernando Garcia Ruiz

Abstract: We propose an experimental scheme for performing sensitive, high-precision laser spectroscopy studies on fast exotic isotopes. By inducing a step-wise resonant ionization of the atoms travelling inside an electric field and subsequently detecting the ion and the corresponding electron, time- and position-sensitive measurements of the resulting particles can be performed. Using a Mixture Density Ne… ▽ More We propose an experimental scheme for performing sensitive, high-precision laser spectroscopy studies on fast exotic isotopes. By inducing a step-wise resonant ionization of the atoms travelling inside an electric field and subsequently detecting the ion and the corresponding electron, time- and position-sensitive measurements of the resulting particles can be performed. Using a Mixture Density Network (MDN), we can leverage this information to predict the initial energy of individual atoms and thus apply a Doppler correction of the observed transition frequencies on an event-by-event basis. We conduct numerical simulations of the proposed experimental scheme and show that kHz-level uncertainties can be achieved for ion beams produced at extreme temperatures ($> 10^8$ K), with energy spreads as large as $10$ keV and non-uniform velocity distributions. The ability to perform in-flight spectroscopy, directly on highly energetic beams, offers unique opportunities to studying short-lived isotopes with lifetimes in the millisecond range and below, produced in low quantities, in hot and highly contaminated environments, without the need for cooling techniques. Such species are of marked interest for nuclear structure, astrophysics, and new physics searches. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Journal ref: Phys. Rev. Research 6, 013128, 31 January 2024

arXiv:2301.00771 [pdf, other]

doi 10.55417/fr.2023004

Flexible Supervised Autonomy for Exploration in Subterranean Environments

Authors: Harel Biggie, Eugene R. Rush, Danny G. Riley, Shakeeb Ahmad, Michael T. Ohradzansky, Kyle Harlow, Michael J. Miles, Daniel Torres, Steve McGuire, Eric W. Frew, Christoffer Heckman, J. Sean Humbert

Abstract: While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue… ▽ More While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications. △ Less

Submitted 11 April, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

Comments: Field Robotics special issue: DARPA Subterranean Challenge, Advancement and Lessons Learned from the Finals

arXiv:2205.02979 [pdf, other]

Explaining the Effectiveness of Multi-Task Learning for Efficient Knowledge Extraction from Spine MRI Reports

Authors: Arijit Sehanobish, McCullen Sandora, Nabila Abraham, Jayashri Pawar, Danielle Torres, Anasuya Das, Murray Becker, Richard Herzog, Benjamin Odry, Ron Vianu

Abstract: Pretrained Transformer based models finetuned on domain specific corpora have changed the landscape of NLP. However, training or fine-tuning these models for individual tasks can be time consuming and resource intensive. Thus, a lot of current research is focused on using transformers for multi-task learning (Raffel et al.,2020) and how to group the tasks to help a multi-task model to learn effect… ▽ More Pretrained Transformer based models finetuned on domain specific corpora have changed the landscape of NLP. However, training or fine-tuning these models for individual tasks can be time consuming and resource intensive. Thus, a lot of current research is focused on using transformers for multi-task learning (Raffel et al.,2020) and how to group the tasks to help a multi-task model to learn effective representations that can be shared across tasks (Standley et al., 2020; Fifty et al., 2021). In this work, we show that a single multi-tasking model can match the performance of task specific models when the task specific models show similar representations across all of their hidden layers and their gradients are aligned, i.e. their gradients follow the same direction. We hypothesize that the above observations explain the effectiveness of multi-task learning. We validate our observations on our internal radiologist-annotated datasets on the cervical and lumbar spine. Our method is simple and intuitive, and can be used in a wide range of NLP problems. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: To appear at NAACL-2022, Industry Track. Follow-up of previous work: arXiv:2204.04544

arXiv:2204.04544 [pdf, other]

Efficient Extraction of Pathologies from C-Spine Radiology Reports using Multi-Task Learning

Authors: Arijit Sehanobish, Nathaniel Brown, Ishita Daga, Jayashri Pawar, Danielle Torres, Anasuya Das, Murray Becker, Richard Herzog, Benjamin Odry, Ron Vianu

Abstract: Pretrained Transformer based models finetuned on domain specific corpora have changed the landscape of NLP. Generally, if one has multiple tasks on a given dataset, one may finetune different models or use task specific adapters. In this work, we show that a multi-task model can beat or achieve the performance of multiple BERT-based models finetuned on various tasks and various task specific adapt… ▽ More Pretrained Transformer based models finetuned on domain specific corpora have changed the landscape of NLP. Generally, if one has multiple tasks on a given dataset, one may finetune different models or use task specific adapters. In this work, we show that a multi-task model can beat or achieve the performance of multiple BERT-based models finetuned on various tasks and various task specific adapter augmented BERT-based models. We validate our method on our internal radiologist's report dataset on cervical spine. We hypothesize that the tasks are semantically close and related and thus multitask learners are powerful classifiers. Our work opens the scope of using our method to radiologist's reports on various body parts. △ Less

Submitted 9 April, 2022; originally announced April 2022.

Comments: Accepted at 6th International Workshop on Health Intelligence, AAAI-2022. To appear in as a book chapter published by Springer in Studies in Computational Intelligence

arXiv:2203.08046 [pdf, ps, other]

Intelligent Reconfigurable Surfaces vs. Decode-and-Forward: What is the Impact of Electromagnetic Interference?

Authors: Andrea De Jesus Torres, Luca Sanguinetti, Emil Björnson

Abstract: This paper considers the use of an intelligent reconfigurable surface (IRS) to aid wireless communication systems. The main goal is to compare this emerging technology with conventional decode-and-forward (DF) relaying. Unlike prior comparisons, we assume that electromagnetic interference (EMI), consisting of incoming waves from external sources, is present at the location where the IRS or DF rela… ▽ More This paper considers the use of an intelligent reconfigurable surface (IRS) to aid wireless communication systems. The main goal is to compare this emerging technology with conventional decode-and-forward (DF) relaying. Unlike prior comparisons, we assume that electromagnetic interference (EMI), consisting of incoming waves from external sources, is present at the location where the IRS or DF relay are placed. The analysis, in terms of minimizing the total transmit power, shows that EMI has a strong impact on DF relay-assisted communications, even when the relaying protocol is optimized against EMI. It turns out that IRS-aided communications is more resilient to EMI. To beat an IRS, we show that the DF relay must use multiple antennas and actively suppress the EMI by beamforming. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: 5 pages, 9 figures, submitted to the 23rd IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC2022)

arXiv:2111.10157 [pdf, other]

Lattention: Lattice-attention in ASR rescoring

Authors: Prabhat Pandey, Sergio Duarte Torres, Ali Orkan Bayer, Ankur Gandhe, Volker Leutnant

Abstract: Lattices form a compact representation of multiple hypotheses generated from an automatic speech recognition system and have been shown to improve performance of downstream tasks like spoken language understanding and speech translation, compared to using one-best hypothesis. In this work, we look into the effectiveness of lattice cues for rescoring n-best lists in second-pass. We encode lattices… ▽ More Lattices form a compact representation of multiple hypotheses generated from an automatic speech recognition system and have been shown to improve performance of downstream tasks like spoken language understanding and speech translation, compared to using one-best hypothesis. In this work, we look into the effectiveness of lattice cues for rescoring n-best lists in second-pass. We encode lattices with a recurrent network and train an attention encoder-decoder model for n-best rescoring. The rescoring model with attention to lattices achieves 4-5% relative word error rate reduction over first-pass and 6-8% with attention to both lattices and acoustic features. We show that rescoring models with attention to lattices outperform models with attention to n-best hypotheses. We also study different ways to incorporate lattice weights in the lattice encoder and demonstrate their importance for n-best rescoring. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: Submitted to ICASSP 2022

ACM Class: I.2.7

arXiv:2111.02229 [pdf, other]

doi 10.1109/TSP.2022.3222102

Cramér-Rao Bounds for Holographic Positioning

Authors: Antonio A. D'Amico, Andrea de Jesus Torres, Luca Sanguinetti, Moe Win

Abstract: Multiple antennas arrays play a key role in wireless networks for communications but also for localization and sensing applications. The use of large antenna arrays at high carrier frequencies (in the mmWave range) pushes towards a propagation regime in which the wavefront is no longer plane but spherical. This allows to infer the position and orientation of a transmitting source from the received… ▽ More Multiple antennas arrays play a key role in wireless networks for communications but also for localization and sensing applications. The use of large antenna arrays at high carrier frequencies (in the mmWave range) pushes towards a propagation regime in which the wavefront is no longer plane but spherical. This allows to infer the position and orientation of a transmitting source from the received signal without the need of using multiple anchor nodes, located in known positions. To understand the fundamental limits of large antenna arrays for localization, this paper combines wave propagation theory with estimation theory, and computes the Cramér-Rao Bound (CRB) for the estimation of the source position on the basis of the three Cartesian components of the electric field, observed over a rectangular surface area. The problem is referred to as holographic positioning and is formulated by taking into account the radiation angular pattern of the transmitting source, which is typically ignored in standard signal processing models. We assume that the source is a Hertzian dipole, and address the holographic positioning problem in both cases, that is, with and without a priori knowledge of its orientation. To simplify the analysis and gain further insights, we also consider the case in which the dipole is located on the line perpendicular to the surface center. Numerical and asymptotic results are given to quantify the CRBs, and to quantify the effect of various system parameters on the ultimate estimation accuracy. It turns out that surfaces of practical size may guarantee a centimeter-level accuracy in the mmWave bands. △ Less

Submitted 3 November, 2022; v1 submitted 3 November, 2021; originally announced November 2021.

Comments: 15 pages, 9 figures, IEEE Transactions on Signal Processing

arXiv:2110.05173 [pdf, ps, other]

A Characterization of Totally Compatible Automata

Authors: David Fernando Casas Torres

Abstract: Every function on a finite set defines an equivalence relation and, therefore, a partition called the kernel of the function. Automata such that every possible partition is the kernel of a word are called totally compatible. A characterization of such automata is given together with an algorithm to recognize them in polynomial running time with respect to the number of states. Every function on a finite set defines an equivalence relation and, therefore, a partition called the kernel of the function. Automata such that every possible partition is the kernel of a word are called totally compatible. A characterization of such automata is given together with an algorithm to recognize them in polynomial running time with respect to the number of states. △ Less

Submitted 11 October, 2021; originally announced October 2021.

arXiv:2109.10040 [pdf, other]

doi 10.1109/TSP.2022.3183445

Nyquist-Sampling and Degrees of Freedom of Electromagnetic Fields

Authors: Andrea Pizzo, Andrea de Jesus Torres, Luca Sanguinetti, Thomas L. Marzetta

Abstract: A signal space approach is presented to study the Nyquist sampling, number of degrees of freedom and reconstruction of an electromagnetic field under arbitrary scattering conditions. Conventional signal processing tools, such as the multidimensional sampling theorem and Fourier theory, are used to provide a linear system theoretic interpretation of electromagnetic wave propagation, thereby reveali… ▽ More A signal space approach is presented to study the Nyquist sampling, number of degrees of freedom and reconstruction of an electromagnetic field under arbitrary scattering conditions. Conventional signal processing tools, such as the multidimensional sampling theorem and Fourier theory, are used to provide a linear system theoretic interpretation of electromagnetic wave propagation, thereby revealing the spatially bandlimited nature of electromagnetic fields. Their spatial bandwidth is dictated by the selectivity of the underlying scattering that allows establishing the Nyquist spatial sampling with a reduction of the number of fields samples needed to be processed. △ Less

Submitted 11 October, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

Comments: IEEE Transactions on Signal Processing

arXiv:2106.11107 [pdf, ps, other]

Electromagnetic Interference in RIS-Aided Communications

Authors: Andrea De Jesus Torres, Luca Sanguinetti, Emil Björnson

Abstract: The prospects of using a reconfigurable intelligent surface (RIS) to aid wireless communication systems have recently received much attention. Among the different use cases, the most popular one is where each element of the RIS scatters the incoming signal with a controllable phase-shift, without increasing its power. In prior literature, this setup has been analyzed by neglecting the electromagne… ▽ More The prospects of using a reconfigurable intelligent surface (RIS) to aid wireless communication systems have recently received much attention. Among the different use cases, the most popular one is where each element of the RIS scatters the incoming signal with a controllable phase-shift, without increasing its power. In prior literature, this setup has been analyzed by neglecting the electromagnetic interference, consisting of the inevitable incoming waves from external sources. In this letter, we provide a physically meaningful model for the electromagnetic interference that can be used as a baseline when evaluating RIS-aided communications. The model is used to show that electromagnetic interference has a non-negligible impact on communication performance, especially when the size of the RIS grows large. When the direct link is present (though with a relatively weak gain), the RIS can even reduce the communication performance. Importantly, it turns out that the SNR grows quadratically with the number of RIS elements only when the spatial correlation matrix of the electromagnetic interference is asymptotically orthogonal to that of the effective channel (including RIS phase-shifts) towards the intended receiver. Otherwise, the SNR only increases linearly. △ Less

Submitted 8 December, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

Comments: 5 pages, 5 figures, IEEE Wireless Communication Letters

arXiv:2104.14825 [pdf, ps, other]

Cramér-Rao Bounds for Near-Field Localization

Authors: Andrea De Jesus Torres, Antonio Alberto D'Amico, Luca Sanguinetti, Moe Z. Win

Abstract: Multiple antenna arrays play a key role in wireless networks for communications but also localization and sensing. The use of large antenna arrays pushes towards a propagation regime in which the wavefront is no longer plane but spherical. This allows to infer the position and orientation of an arbitrary source from the received signal without the need of using multiple anchor nodes. To understand… ▽ More Multiple antenna arrays play a key role in wireless networks for communications but also localization and sensing. The use of large antenna arrays pushes towards a propagation regime in which the wavefront is no longer plane but spherical. This allows to infer the position and orientation of an arbitrary source from the received signal without the need of using multiple anchor nodes. To understand the fundamental limits of large antenna arrays for localization, this paper fusions wave propagation theory with estimation theory, and computes the Cram{é}r-Rao Bound (CRB) for the estimation of the three Cartesian coordinates of the source on the basis of the electromagnetic vector field, observed over a rectangular surface area. To simplify the analysis, we assume that the source is a dipole, whose center is located on the line perpendicular to the surface center, with an orientation a priori known. Numerical and asymptotic results are given to quantify the CRBs, and to gain insights into the effect of various system parameters on the ultimate estimation accuracy. It turns out that surfaces of practical size may guarantee a centimeter-level accuracy in the mmWave bands. △ Less

Submitted 3 December, 2021; v1 submitted 30 April, 2021; originally announced April 2021.

Comments: 5 pages, 4 figures, Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, USA, Nov. 2021

arXiv:2104.10345 [pdf, other]

doi 10.1109/JSTARS.2021.3094053

Measuring economic activity from space: a case study using flying airplanes and COVID-19

Authors: Mauricio Pamplona Segundo, Allan Pinto, Rodrigo Minetto, Ricardo da Silva Torres, Sudeep Sarkar

Abstract: This work introduces a novel solution to measure economic activity through remote sensing for a wide range of spatial areas. We hypothesized that disturbances in human behavior caused by major life-changing events leave signatures in satellite imagery that allows devising relevant image-based indicators to estimate their impacts and support decision-makers. We present a case study for the COVID-19… ▽ More This work introduces a novel solution to measure economic activity through remote sensing for a wide range of spatial areas. We hypothesized that disturbances in human behavior caused by major life-changing events leave signatures in satellite imagery that allows devising relevant image-based indicators to estimate their impacts and support decision-makers. We present a case study for the COVID-19 coronavirus outbreak, which imposed severe mobility restrictions and caused worldwide disruptions, using flying airplane detection around the 30 busiest airports in Europe to quantify and analyze the lockdown's effects and post-lockdown recovery. Our solution won the Rapid Action Coronavirus Earth observation (RACE) upscaling challenge, sponsored by the European Space Agency and the European Commission, and now integrates the RACE dashboard. This platform combines satellite data and artificial intelligence to promote a progressive and safe reopening of essential activities. Code and CNN models are available at https://github.com/maups/covid19-custom-script-contest △ Less

Submitted 21 April, 2021; originally announced April 2021.

Comments: 11 pages, 11 figures

arXiv:2011.13835 [pdf, ps, other]

Near- and Far-Field Communications with Large Intelligent Surfaces

Authors: Andrea de Jesus Torres, Luca Sanguinetti, Emil Björnson

Abstract: This paper studies the uplink spectral efficiency (SE) achieved by two single-antenna user equipments (UEs) communicating with a Large Intelligent Surface (LIS), defined as a planar array consisting of $N$ antennas that each has area $A$. The analysis is carried out with a deterministic line-of-sight propagation channel model that captures key fundamental aspects of the so-called geometric near-fi… ▽ More This paper studies the uplink spectral efficiency (SE) achieved by two single-antenna user equipments (UEs) communicating with a Large Intelligent Surface (LIS), defined as a planar array consisting of $N$ antennas that each has area $A$. The analysis is carried out with a deterministic line-of-sight propagation channel model that captures key fundamental aspects of the so-called geometric near-field of the array. Maximum ratio (MR) and minimum mean squared error (MMSE) combining schemes are considered. With both schemes, the signal and interference terms are numerically analyzed as a function of the position of the transmitting devices when the width/height $L = \sqrt{NA}$ of the square-shaped array grows large. The results show that an exact near-field channel model is needed to evaluate the SE whenever the distance of transmitting UEs is comparable with the LIS' dimensions. It is shown that, if $L$ grows, the UEs are eventually in the geometric near-field and the interference does not vanish. MMSE outperforms MR for an LIS of practically large size. △ Less

Submitted 27 November, 2020; originally announced November 2020.

Comments: 5 pages, 10 figures, Asilomar Conference on Signals Systems and Computers

arXiv:2011.05127 [pdf, other]

doi 10.3390/rs12142267

A Soft Computing Approach for Selecting and Combining Spectral Bands

Authors: Juan F. H. Albarracín, Rafael S. Oliveira, Marina Hirota, Jefersson A. dos Santos, Ricardo da S. Torres

Abstract: We introduce a soft computing approach for automatically selecting and combining indices from remote sensing multispectral images that can be used for classification tasks. The proposed approach is based on a Genetic-Programming (GP) framework, a technique successfully used in a wide variety of optimization problems. Through GP, it is possible to learn indices that maximize the separability of sam… ▽ More We introduce a soft computing approach for automatically selecting and combining indices from remote sensing multispectral images that can be used for classification tasks. The proposed approach is based on a Genetic-Programming (GP) framework, a technique successfully used in a wide variety of optimization problems. Through GP, it is possible to learn indices that maximize the separability of samples from two different classes. Once the indices specialized for all the pairs of classes are obtained, they are used in pixelwise classification tasks. We used the GP-based solution to evaluate complex classification problems, such as those that are related to the discrimination of vegetation types within and between tropical biomes. Using time series defined in terms of the learned spectral indices, we show that the GP framework leads to superior results than other indices that are used to discriminate and classify tropical biomes. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: MDPI Remote Sensing - Special Issue "Current Limits and New Challenges and Opportunities in Soft Computing, Machine Learning and Computational Intelligence for Remote Sensing"

Journal ref: Remote Sens. 2020, 12(14), 2267

arXiv:2010.12059 [pdf, other]

doi 10.1007/978-3-030-86520-7_8

Principled Interpolation in Normalizing Flows

Authors: Samuel G. Fadel, Sebastian Mair, Ricardo da S. Torres, Ulf Brefeld

Abstract: Generative models based on normalizing flows are very successful in modeling complex data distributions using simpler ones. However, straightforward linear interpolations show unexpected side effects, as interpolation paths lie outside the area where samples are observed. This is caused by the standard choice of Gaussian base distributions and can be seen in the norms of the interpolated samples.… ▽ More Generative models based on normalizing flows are very successful in modeling complex data distributions using simpler ones. However, straightforward linear interpolations show unexpected side effects, as interpolation paths lie outside the area where samples are observed. This is caused by the standard choice of Gaussian base distributions and can be seen in the norms of the interpolated samples. This observation suggests that correcting the norm should generally result in better interpolations, but it is not clear how to correct the norm in an unambiguous way. In this paper, we solve this issue by enforcing a fixed norm and, hence, change the base distribution, to allow for a principled way of interpolation. Specifically, we use the Dirichlet and von Mises-Fisher base distributions. Our experimental results show superior performance in terms of bits per dimension, Fréchet Inception Distance (FID), and Kernel Inception Distance (KID) scores for interpolation, while maintaining the same generative performance. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: Under review

arXiv:2010.02680 [pdf, other]

doi 10.1109/ICIP40778.2020.9191168

Parallax Motion Effect Generation Through Instance Segmentation And Depth Estimation

Authors: Allan Pinto, Manuel A. Córdova, Luis G. L. Decker, Jose L. Flores-Campana, Marcos R. Souza, Andreza A. dos Santos, Jhonatas S. Conceição, Henrique F. Gagliardi, Diogo C. Luvizon, Ricardo da S. Torres, Helio Pedrini

Abstract: Stereo vision is a growing topic in computer vision due to the innumerable opportunities and applications this technology offers for the development of modern solutions, such as virtual and augmented reality applications. To enhance the user's experience in three-dimensional virtual environments, the motion parallax estimation is a promising technique to achieve this objective. In this paper, we p… ▽ More Stereo vision is a growing topic in computer vision due to the innumerable opportunities and applications this technology offers for the development of modern solutions, such as virtual and augmented reality applications. To enhance the user's experience in three-dimensional virtual environments, the motion parallax estimation is a promising technique to achieve this objective. In this paper, we propose an algorithm for generating parallax motion effects from a single image, taking advantage of state-of-the-art instance segmentation and depth estimation approaches. This work also presents a comparison against such algorithms to investigate the trade-off between efficiency and quality of the parallax motion effects, taking into consideration a multi-task learning network capable of estimating instance segmentation and depth estimation at once. Experimental results and visual quality assessment indicate that the PyD-Net network (depth estimation) combined with Mask R-CNN or FBNet networks (instance segmentation) can produce parallax motion effects with good visual quality. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates

Journal ref: 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2020, pp. 1621-1625

arXiv:1912.10314 [pdf, other]

Multimodal Prediction based on Graph Representations

Authors: Icaro Cavalcante Dourado, Salvatore Tabbone, Ricardo da Silva Torres

Abstract: This paper proposes a learning model, based on rank-fusion graphs, for general applicability in multimodal prediction tasks, such as multimodal regression and image classification. Rank-fusion graphs encode information from multiple descriptors and retrieval models, thus being able to capture underlying relationships between modalities, samples, and the collection itself. The solution is based on… ▽ More This paper proposes a learning model, based on rank-fusion graphs, for general applicability in multimodal prediction tasks, such as multimodal regression and image classification. Rank-fusion graphs encode information from multiple descriptors and retrieval models, thus being able to capture underlying relationships between modalities, samples, and the collection itself. The solution is based on the encoding of multiple ranks for a query (or test sample), defined according to different criteria, into a graph. Later, we project the generated graph into an induced vector space, creating fusion vectors, targeting broader generality and efficiency. A fusion vector estimator is then built to infer whether a multimodal input object refers to a class or not. Our method is capable of promoting a fusion model better than early-fusion and late-fusion alternatives. Performed experiments in the context of multiple multimodal and visual datasets, as well as several descriptors and retrieval models, demonstrate that our learning model is highly effective for different prediction scenarios involving visual, textual, and multimodal features, yielding better effectiveness than state-of-the-art methods. △ Less

Submitted 3 July, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

arXiv:1906.06011 [pdf, other]

Fusion vectors: Embedding Graph Fusions for Efficient Unsupervised Rank Aggregation

Authors: Icaro Cavalcante Dourado, Ricardo da Silva Torres

Abstract: The vast increase in amount and complexity of digital content led to a wide interest in ad-hoc retrieval systems in recent years. Complementary, the existence of heterogeneous data sources and retrieval models stimulated the proliferation of increasingly ingenious and effective rank aggregation functions. Although recently proposed rank aggregation functions are promising with respect to effective… ▽ More The vast increase in amount and complexity of digital content led to a wide interest in ad-hoc retrieval systems in recent years. Complementary, the existence of heterogeneous data sources and retrieval models stimulated the proliferation of increasingly ingenious and effective rank aggregation functions. Although recently proposed rank aggregation functions are promising with respect to effectiveness, existing proposals in the area usually overlook efficiency aspects. We propose an innovative rank aggregation function that is unsupervised, intrinsically multimodal, and targeted for fast retrieval and top effectiveness performance. We introduce the concepts of embedding and indexing of graph-based rank-aggregation representation models, and their application for search tasks. Embedding formulations are also proposed for graph-based rank representations. We introduce the concept of fusion vectors, a late-fusion representation of objects based on ranks, from which an intrinsically rank-aggregation retrieval model is defined. Next, we present an approach for fast retrieval based on fusion vectors, thus promoting an efficient rank aggregation system. Our method presents top effectiveness performance among state-of-the-art related work, while bringing novel aspects of multimodality and effectiveness. Consistent speedups are achieved against the recent baselines in all datasets considered. △ Less

Submitted 1 July, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

arXiv:1903.00774 [pdf, other]

doi 10.1109/LGRS.2019.2903194

Spatio-Temporal Vegetation Pixel Classification By Using Convolutional Networks

Authors: Keiller Nogueira, Jefersson A. dos Santos, Nathalia Menini, Thiago S. F. Silva, Leonor Patricia C. Morellato, Ricardo da S. Torres

Abstract: Plant phenology studies rely on long-term monitoring of life cycles of plants. High-resolution unmanned aerial vehicles (UAVs) and near-surface technologies have been used for plant monitoring, demanding the creation of methods capable of locating and identifying plant species through time and space. However, this is a challenging task given the high volume of data, the constant data missing from… ▽ More Plant phenology studies rely on long-term monitoring of life cycles of plants. High-resolution unmanned aerial vehicles (UAVs) and near-surface technologies have been used for plant monitoring, demanding the creation of methods capable of locating and identifying plant species through time and space. However, this is a challenging task given the high volume of data, the constant data missing from temporal dataset, the heterogeneity of temporal profiles, the variety of plant visual patterns, and the unclear definition of individuals' boundaries in plant communities. In this letter, we propose a novel method, suitable for phenological monitoring, based on Convolutional Networks (ConvNets) to perform spatio-temporal vegetation pixel-classification on high resolution images. We conducted a systematic evaluation using high-resolution vegetation image datasets associated with the Brazilian Cerrado biome. Experimental results show that the proposed approach is effective, overcoming other spatio-temporal pixel-classification strategies. △ Less

Submitted 2 March, 2019; originally announced March 2019.

arXiv:1901.05743 [pdf, other]

doi 10.1016/j.ipm.2019.03.008

Unsupervised Graph-based Rank Aggregation for Improved Retrieval

Authors: Icaro Cavalcante Dourado, Daniel Carlos Guimarães Pedronette, Ricardo da Silva Torres

Abstract: This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or… ▽ More This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations. We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters. A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions. △ Less

Submitted 18 March, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

arXiv:1901.05517 [pdf]

Survey of Bayesian Networks Applications to Intelligent Autonomous Vehicles

Authors: Rocío Díaz de León Torres, Martín Molina, Pascual Campoy

Abstract: This article reviews the applications of Bayesian Networks to Intelligent Autonomous Vehicles (IAV) from the decision making point of view, which represents the final step for fully Autonomous Vehicles (currently under discussion). Until now, when it comes making high level decisions for Autonomous Vehicles (AVs), humans have the last word. Based on the works cited in this article and analysis don… ▽ More This article reviews the applications of Bayesian Networks to Intelligent Autonomous Vehicles (IAV) from the decision making point of view, which represents the final step for fully Autonomous Vehicles (currently under discussion). Until now, when it comes making high level decisions for Autonomous Vehicles (AVs), humans have the last word. Based on the works cited in this article and analysis done here, the modules of a general decision making framework and its variables are inferred. Many efforts have been made in the labs showing Bayesian Networks as a promising computer model for decision making. Further research should go into the direction of testing Bayesian Network models in real situations. In addition to the applications, Bayesian Network fundamentals are introduced as elements to consider when develo** IAVs with the potential of making high level judgement calls. △ Less

Submitted 21 February, 2019; v1 submitted 16 January, 2019; originally announced January 2019.

Comments: 34 pages, 2 figures, 3 tables

arXiv:1811.07174 [pdf, other]

Link Prediction in Dynamic Graphs for Recommendation

Authors: Samuel G. Fadel, Ricardo da S. Torres

Abstract: Recent advances in employing neural networks on graph domains helped push the state of the art in link prediction tasks, particularly in recommendation services. However, the use of temporal contextual information, often modeled as dynamic graphs that encode the evolution of user-item relationships over time, has been overlooked in link prediction problems. In this paper, we consider the hypothesi… ▽ More Recent advances in employing neural networks on graph domains helped push the state of the art in link prediction tasks, particularly in recommendation services. However, the use of temporal contextual information, often modeled as dynamic graphs that encode the evolution of user-item relationships over time, has been overlooked in link prediction problems. In this paper, we consider the hypothesis that leveraging such information enables models to make better predictions, proposing a new neural network approach for this. Our experiments, performed on the widely used ML-100k and ML-1M datasets, show that our approach produces better predictions in scenarios where the pattern of user-item relationships change over time. In addition, they suggest that existing approaches are significantly impacted by those changes. △ Less

Submitted 17 November, 2018; originally announced November 2018.

Comments: Workshop on Relational Representation Learning (R2L), NIPS 2018

arXiv:1711.06809 [pdf, other]

A Genetic Algorithm Approach for ImageRepresentation Learning through Color Quantization

Authors: Érico M. Pereira, Ricardo da S. Torres, Jefersson A. dos Santos

Abstract: Over the last decades, hand-crafted feature extractors have been used to encode image visual properties into feature vectors. Recently, data-driven feature learning approaches have been successfully explored as alternatives for producing more representative visual features. In this work, we combine both research venues, focusing on the color quantization problem. We propose two data-driven approac… ▽ More Over the last decades, hand-crafted feature extractors have been used to encode image visual properties into feature vectors. Recently, data-driven feature learning approaches have been successfully explored as alternatives for producing more representative visual features. In this work, we combine both research venues, focusing on the color quantization problem. We propose two data-driven approaches to learn image representations through the search for optimized quantization schemes, which lead to more effective feature extraction algorithms and compact representations. Our strategy employs Genetic Algorithm, a soft-computing apparatus successfully utilized in Information-retrieval-related optimization problems. We hypothesize that changing the quantization affects the quality of image description approaches, leading to effective and efficient representations. We evaluate our approaches in content-based image retrieval tasks, considering eight well-known datasets with different visual properties. Results indicate that the approach focused on representation effectiveness outperformed baselines in all tested scenarios. The other approach, which also considers the size of created representations, produced competitive results kee** or even reducing the dimensionality of feature vectors up to 25%. △ Less

Submitted 20 November, 2020; v1 submitted 17 November, 2017; originally announced November 2017.

Comments: Submitted to Multimedia Tools and Applications

Report number: MTAP-D-19-02724R2

arXiv:1711.03564 [pdf, other]

doi 10.1109/LGRS.2018.2845549

Exploiting ConvNet Diversity for Flooding Identification

Authors: Keiller Nogueira, Samuel G. Fadel, Ícaro C. Dourado, Rafael de O. Werneck, Javier A. V. Muñoz, Otávio A. B. Penatti, Rodrigo T. Calumby, Lin Tzy Li, Jefersson A. dos Santos, Ricardo da S. Torres

Abstract: Flooding is the world's most costly type of natural disaster in terms of both economic losses and human causalities. A first and essential procedure towards flood monitoring is based on identifying the area most vulnerable to flooding, which gives authorities relevant regions to focus. In this work, we propose several methods to perform flooding identification in high-resolution remote sensing ima… ▽ More Flooding is the world's most costly type of natural disaster in terms of both economic losses and human causalities. A first and essential procedure towards flood monitoring is based on identifying the area most vulnerable to flooding, which gives authorities relevant regions to focus. In this work, we propose several methods to perform flooding identification in high-resolution remote sensing images using deep learning. Specifically, some proposed techniques are based upon unique networks, such as dilated and deconvolutional ones, while other was conceived to exploit diversity of distinct networks in order to extract the maximum performance of each classifier. Evaluation of the proposed algorithms were conducted in a high-resolution remote sensing dataset. Results show that the proposed algorithms outperformed several state-of-the-art baselines, providing improvements ranging from 1 to 4% in terms of the Jaccard Index. △ Less

Submitted 5 June, 2018; v1 submitted 9 November, 2017; originally announced November 2017.

Comments: Work winner of the Flood-Detection in Satellite Images, a subtask of 2017 Multimedia Satellite Task (MediaEval Benchmark) Accepted for publication in the Geoscience and Remote Sensing Letters (GRSL)

arXiv:1511.06704 [pdf, other]

Semantic Diversity versus Visual Diversity in Visual Dictionaries

Authors: Otávio A. B. Penatti, Sandra Avila, Eduardo Valle, Ricardo da S. Torres

Abstract: Visual dictionaries are a critical component for image classification/retrieval systems based on the bag-of-visual-words (BoVW) model. Dictionaries are usually learned without supervision from a training set of images sampled from the collection of interest. However, for large, general-purpose, dynamic image collections (e.g., the Web), obtaining a representative sample in terms of semantic concep… ▽ More Visual dictionaries are a critical component for image classification/retrieval systems based on the bag-of-visual-words (BoVW) model. Dictionaries are usually learned without supervision from a training set of images sampled from the collection of interest. However, for large, general-purpose, dynamic image collections (e.g., the Web), obtaining a representative sample in terms of semantic concepts is not straightforward. In this paper, we evaluate the impact of semantics in the dictionary quality, aiming at verifying the importance of semantic diversity in relation visual diversity for visual dictionaries. In the experiments, we vary the amount of classes used for creating the dictionary and then compute different BoVW descriptors, using multiple codebook sizes and different coding and pooling methods (standard BoVW and Fisher Vectors). Results for image classification show that as visual dictionaries are based on low-level visual appearances, visual diversity is more important than semantic diversity. Our conclusions open the opportunity to alleviate the burden in generating visual dictionaries as we need only a visually diverse set of images instead of the whole collection to create a good dictionary. △ Less

Submitted 20 November, 2015; originally announced November 2015.

arXiv:1205.2663 [pdf, ps, other]

Are visual dictionaries generalizable?

Authors: Otavio A. B. Penatti, Eduardo Valle, Ricardo da S. Torres

Abstract: Mid-level features based on visual dictionaries are today a cornerstone of systems for classification and retrieval of images. Those state-of-the-art representations depend crucially on the choice of a codebook (visual dictionary), which is usually derived from the dataset. In general-purpose, dynamic image collections (e.g., the Web), one cannot have the entire collection in order to extract a re… ▽ More Mid-level features based on visual dictionaries are today a cornerstone of systems for classification and retrieval of images. Those state-of-the-art representations depend crucially on the choice of a codebook (visual dictionary), which is usually derived from the dataset. In general-purpose, dynamic image collections (e.g., the Web), one cannot have the entire collection in order to extract a representative dictionary. However, based on the hypothesis that the dictionary reflects only the diversity of low-level appearances and does not capture semantics, we argue that a dictionary based on a small subset of the data, or even on an entirely different dataset, is able to produce a good representation, provided that the chosen images span a diverse enough portion of the low-level feature space. Our experiments confirm that hypothesis, opening the opportunity to greatly alleviate the burden in generating the codebook, and confirming the feasibility of employing visual dictionaries in large-scale dynamic environments. △ Less

Submitted 11 May, 2012; originally announced May 2012.

arXiv:1201.1623 [pdf, other]

MultiDendrograms: Variable-Group Agglomerative Hierarchical Clusterings

Authors: Sergio Gomez, Justo Montiel, David Torres, Alberto Fernandez

Abstract: MultiDendrograms is a Java-written application that computes agglomerative hierarchical clusterings of data. Starting from a distances (or weights) matrix, MultiDendrograms is able to calculate its dendrograms using the most common agglomerative hierarchical clustering methods. The application implements a variable-group algorithm that solves the non-uniqueness problem found in the standard pair-g… ▽ More MultiDendrograms is a Java-written application that computes agglomerative hierarchical clusterings of data. Starting from a distances (or weights) matrix, MultiDendrograms is able to calculate its dendrograms using the most common agglomerative hierarchical clustering methods. The application implements a variable-group algorithm that solves the non-uniqueness problem found in the standard pair-group algorithm. This problem arises when two or more minimum distances between different clusters are equal during the agglomerative process, because then different output clusterings are possible depending on the criterion used to break ties between distances. MultiDendrograms solves this problem implementing a variable-group algorithm that groups more than two clusters at the same time when ties occur. △ Less

Submitted 4 December, 2012; v1 submitted 8 January, 2012; originally announced January 2012.

Comments: Article upgraded to MultiDendrograms 3.0. Software available at http://deim.urv.cat/~sgomez/multidendrograms.php

arXiv:1104.4723 [pdf, other]

doi 10.1145/2324796.2324815

Bayesian approach for near-duplicate image detection

Authors: Lucas Moutinho Bueno, Eduardo Valle, Ricardo da Silva Torres

Abstract: In this paper we propose a bayesian approach for near-duplicate image detection, and investigate how different probabilistic models affect the performance obtained. The task of identifying an image whose metadata are missing is often demanded for a myriad of applications: metadata retrieval in cultural institutions, detection of copyright violations, investigation of latent cross-links in archives… ▽ More In this paper we propose a bayesian approach for near-duplicate image detection, and investigate how different probabilistic models affect the performance obtained. The task of identifying an image whose metadata are missing is often demanded for a myriad of applications: metadata retrieval in cultural institutions, detection of copyright violations, investigation of latent cross-links in archives and libraries, duplicate elimination in storage management, etc. The majority of current solutions are based either on voting algorithms, which are very precise, but expensive; either on the use of visual dictionaries, which are efficient, but less precise. Our approach, uses local descriptors in a novel way, which by a careful application of decision theory, allows a very fine control of the compromise between precision and efficiency. In addition, the method attains a great compromise between those two axes, with more than 99% accuracy with less than 10 database operations. △ Less

Submitted 25 April, 2011; originally announced April 2011.

Showing 1–30 of 30 results for author: Torres, D