-
Indirect Swarm Control: Characterization and Analysis of Emergent Swarm Behaviors
Authors:
Ricardo Vega,
Connor Mattson,
Daniel S. Brown,
Cameron Nowzari
Abstract:
Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for ch…
▽ More
Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for characterizing the conditions that lead to different macrostates and how to predict/analyze their macroscopic properties, allowing us to indirectly engineer the same behaviors from the bottom up by tuning their environmental conditions rather than local interaction rules. We then apply this framework to a simple system of binary sensing and acting agents as an example to see if a re-framing of this swarms problem can help us push the state of the art forward. By first creating some working definitions of macrostates in a particular swarm system, we show how agent-based modeling may be combined with control theory to enable a generalized understanding of controllable emergent processes without needing to simulate everything. Whereas phase diagrams can generally only be created through Monte Carlo simulations or swee** through ranges of parameters in a simulator, we develop closed-form functions that can immediately produce them revealing an infinite set of swarm parameter combinations that can lead to a specifically chosen self-organized behavior. While the exact methods are still under development, we believe simply laying out a potential path towards solutions that have evaded our traditional methods using a novel method is worth considering. Our results are characterized through both simulations and real experiments on ground robots.
△ Less
Submitted 28 March, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Zespol: A Lightweight Environment for Training Swarming Agents
Authors:
Shay Snyder,
Kevin Zhu,
Ricardo Vega,
Cameron Nowzari,
Maryam Parsa
Abstract:
Agent-based modeling (ABM) and simulation have emerged as important tools for studying emergent behaviors, especially in the context of swarming algorithms for robotic systems. Despite significant research in this area, there is a lack of standardized simulation environments, which hinders the development and deployment of real-world robotic swarms. To address this issue, we present Zespol, a modu…
▽ More
Agent-based modeling (ABM) and simulation have emerged as important tools for studying emergent behaviors, especially in the context of swarming algorithms for robotic systems. Despite significant research in this area, there is a lack of standardized simulation environments, which hinders the development and deployment of real-world robotic swarms. To address this issue, we present Zespol, a modular, Python-based simulation environment that enables the development and testing of multi-agent control algorithms. Zespol provides a flexible and extensible sandbox for initial research, with the potential for scaling to real-world applications. We provide a topological overview of the system and detailed descriptions of its plug-and-play elements. We demonstrate the fidelity of Zespol in simulated and real-word robotics by replicating existing works highlighting the simulation to real gap with the milling behavior. We plan to leverage Zespol's plug-and-play feature for neuromorphic computing in swarming scenarios, which involves using the modules in Zespol to simulate the behavior of neurons and their connections as synapses. This will enable optimizing and studying the emergent behavior of swarm systems in complex environments. Our goal is to gain a better understanding of the interplay between environmental factors and neural-like computations in swarming systems.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Cross-domain Sentiment Classification in Spanish
Authors:
Lautaro Estienne,
Matias Vera,
Leonardo Rey Vega
Abstract:
Sentiment Classification is a fundamental task in the field of Natural Language Processing, and has very important academic and commercial applications. It aims to automatically predict the degree of sentiment present in a text that contains opinions and subjectivity at some level, like product and movie reviews, or tweets. This can be really difficult to accomplish, in part, because different dom…
▽ More
Sentiment Classification is a fundamental task in the field of Natural Language Processing, and has very important academic and commercial applications. It aims to automatically predict the degree of sentiment present in a text that contains opinions and subjectivity at some level, like product and movie reviews, or tweets. This can be really difficult to accomplish, in part, because different domains of text contains different words and expressions. In addition, this difficulty increases when text is written in a non-English language due to the lack of databases and resources. As a consequence, several cross-domain and cross-language techniques are often applied to this task in order to improve the results. In this work we perform a study on the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains. Reviews were collected from the MercadoLibre website from seven Latin American countries, allowing the creation of a large and balanced dataset. Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews, and can be improved by pre-training and fine-tuning the classification model.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Modeling and Forecasting COVID-19 Cases using Latent Subpopulations
Authors:
Roberto Vega,
Zehra Shah,
Pouria Ramazi,
Russell Greiner
Abstract:
Classical epidemiological models assume homogeneous populations. There have been important extensions to model heterogeneous populations, when the identity of the sub-populations is known, such as age group or geographical location. Here, we propose two new methods to model the number of people infected with COVID-19 over time, each as a linear combination of latent sub-populations -- i.e., when w…
▽ More
Classical epidemiological models assume homogeneous populations. There have been important extensions to model heterogeneous populations, when the identity of the sub-populations is known, such as age group or geographical location. Here, we propose two new methods to model the number of people infected with COVID-19 over time, each as a linear combination of latent sub-populations -- i.e., when we do not know which person is in which sub-population, and the only available observations are the aggregates across all sub-populations. Method #1 is a dictionary-based approach, which begins with a large number of pre-defined sub-population models (each with its own starting time, shape, etc), then determines the (positive) weight of small (learned) number of sub-populations. Method #2 is a mixture-of-$M$ fittable curves, where $M$, the number of sub-populations to use, is given by the user. Both methods are compatible with any parametric model; here we demonstrate their use with first (a)~Gaussian curves and then (b)~SIR trajectories. We empirically show the performance of the proposed methods, first in (i) modeling the observed data and then in (ii) forecasting the number of infected people 1 to 4 weeks in advance. Across 187 countries, we show that the dictionary approach had the lowest mean absolute percentage error and also the lowest variance when compared with classical SIR models and moreover, it was a strong baseline that outperforms many of the models developed for COVID-19 forecasting.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Simulate Less, Expect More: Bringing Robot Swarms to Life via Low-Fidelity Simulations
Authors:
Ricardo Vega,
Kevin Zhu,
Sean Luke,
Maryam Parsa,
Cameron Nowzari
Abstract:
This paper proposes a novel methodology for addressing the simulation-reality gap for multi-robot swarm systems. Rather than immediately try to shrink or `bridge the gap' anytime a real-world experiment failed that worked in simulation, we characterize conditions under which this is actually necessary. When these conditions are not satisfied, we show how very simple simulators can still be used to…
▽ More
This paper proposes a novel methodology for addressing the simulation-reality gap for multi-robot swarm systems. Rather than immediately try to shrink or `bridge the gap' anytime a real-world experiment failed that worked in simulation, we characterize conditions under which this is actually necessary. When these conditions are not satisfied, we show how very simple simulators can still be used to both (i) design new multi-robot systems, and (ii) guide real-world swarming experiments towards certain emergent behaviors when the gap is very large. The key ideas are an iterative simulator-in-the-design-loop in which real-world experiments, simulator modifications, and simulated experiments are intimately coupled in a way that minds the gap without needing to shrink it, as well as the use of minimally viable phase diagrams to guide real world experiments. We demonstrate the usefulness of our methods on deploying a real multi-robot swarm system to successfully exhibit an emergent milling behavior.
△ Less
Submitted 21 January, 2023;
originally announced January 2023.
-
Semi-supervised Batch Learning From Logged Data
Authors:
Gholamali Aminian,
Armin Behnamnia,
Roberto Vega,
Laura Toni,
Chengchun Shi,
Hamid R. Rabiee,
Omar Rivasplata,
Miguel R. D. Rodrigues
Abstract:
Off-policy learning methods are intended to learn a policy from logged data, which includes context, action, and feedback (cost or reward) for each sample point. In this work, we build on the counterfactual risk minimization framework, which also assumes access to propensity scores. We propose learning methods for problems where feedback is missing for some samples, so there are samples with feedb…
▽ More
Off-policy learning methods are intended to learn a policy from logged data, which includes context, action, and feedback (cost or reward) for each sample point. In this work, we build on the counterfactual risk minimization framework, which also assumes access to propensity scores. We propose learning methods for problems where feedback is missing for some samples, so there are samples with feedback and samples missing-feedback in the logged data. We refer to this type of learning as semi-supervised batch learning from logged data, which arises in a wide range of application domains. We derive a novel upper bound for the true risk under the inverse propensity score estimator to address this kind of learning problem. Using this bound, we propose a regularized semi-supervised batch learning method with logged data where the regularization term is feedback-independent and, as a result, can be evaluated using the logged missing-feedback data. Consequently, even though feedback is only present for some samples, a learning policy can be learned by leveraging the missing-feedback samples. The results of experiments derived from benchmark datasets indicate that these algorithms achieve policies with better performance in comparison with logging policies.
△ Less
Submitted 18 February, 2024; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Perfectly Accurate Membership Inference by a Dishonest Central Server in Federated Learning
Authors:
Georg Pichler,
Marco Romanelli,
Leonardo Rey Vega,
Pablo Piantanida
Abstract:
Federated Learning is expected to provide strong privacy guarantees, as only gradients or model parameters but no plain text training data is ever exchanged either between the clients or between the clients and the central server. In this paper, we challenge this claim by introducing a simple but still very effective membership inference attack algorithm, which relies only on a single training ste…
▽ More
Federated Learning is expected to provide strong privacy guarantees, as only gradients or model parameters but no plain text training data is ever exchanged either between the clients or between the clients and the central server. In this paper, we challenge this claim by introducing a simple but still very effective membership inference attack algorithm, which relies only on a single training step. In contrast to the popular honest-but-curious model, we investigate a framework with a dishonest central server. Our strategy is applicable to models with ReLU activations and uses the properties of this activation function to achieve perfect accuracy. Empirical evaluation on visual classification tasks with MNIST, CIFAR10, CIFAR100 and CelebA datasets show that our method provides perfect accuracy in identifying one sample in a training set with thousands of samples. Occasional failures of our method lead us to discover duplicate images in the CIFAR100 and CelebA datasets.
△ Less
Submitted 9 November, 2023; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Domain-shift adaptation via linear transformations
Authors:
Roberto Vega,
Russell Greiner
Abstract:
A predictor, $f_A : X \to Y$, learned with data from a source domain (A) might not be accurate on a target domain (B) when their distributions are different. Domain adaptation aims to reduce the negative effects of this distribution mismatch. Here, we analyze the case where $P_A(Y\ |\ X) \neq P_B(Y\ |\ X)$, $P_A(X) \neq P_B(X)$ but $P_A(Y) = P_B(Y)$; where there are affine transformations of $X$ t…
▽ More
A predictor, $f_A : X \to Y$, learned with data from a source domain (A) might not be accurate on a target domain (B) when their distributions are different. Domain adaptation aims to reduce the negative effects of this distribution mismatch. Here, we analyze the case where $P_A(Y\ |\ X) \neq P_B(Y\ |\ X)$, $P_A(X) \neq P_B(X)$ but $P_A(Y) = P_B(Y)$; where there are affine transformations of $X$ that makes all distributions equivalent. We propose an approach to project the source and target domains into a lower-dimensional, common space, by (1) projecting the domains into the eigenvectors of the empirical covariance matrices of each domain, then (2) finding an orthogonal matrix that minimizes the maximum mean discrepancy between the projections of both domains. For arbitrary affine transformations, there is an inherent unidentifiability problem when performing unsupervised domain adaptation that can be alleviated in the semi-supervised case. We show the effectiveness of our approach in simulated data and in binary digit classification tasks, obtaining improvements up to 48% accuracy when correcting for the domain shift in the data.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
PACMAN: PAC-style bounds accounting for the Mismatch between Accuracy and Negative log-loss
Authors:
Matias Vera,
Leonardo Rey Vega,
Pablo Piantanida
Abstract:
The ultimate performance of machine learning algorithms for classification tasks is usually measured in terms of the empirical error probability (or accuracy) based on a testing dataset. Whereas, these algorithms are optimized through the minimization of a typically different--more convenient--loss function based on a training set. For classification tasks, this loss function is often the negative…
▽ More
The ultimate performance of machine learning algorithms for classification tasks is usually measured in terms of the empirical error probability (or accuracy) based on a testing dataset. Whereas, these algorithms are optimized through the minimization of a typically different--more convenient--loss function based on a training set. For classification tasks, this loss function is often the negative log-loss that leads to the well-known cross-entropy risk which is typically better behaved (from a numerical perspective) than the error probability. Conventional studies on the generalization error do not usually take into account the underlying mismatch between losses at training and testing phases. In this work, we introduce an analysis based on point-wise PAC approach over the generalization gap considering the mismatch of testing based on the accuracy metric and training on the negative log-loss. We label this analysis PACMAN. Building on the fact that the mentioned mismatch can be written as a likelihood ratio, concentration inequalities can be used to provide some insights for the generalization problem in terms of some point-wise PAC bounds depending on some meaningful information-theoretic quantities. An analysis of the obtained bounds and a comparison with available results in the literature are also provided.
△ Less
Submitted 10 December, 2021;
originally announced December 2021.
-
SIMLR: Machine Learning inside the SIR model for COVID-19 Forecasting
Authors:
Roberto Vega,
Leonardo Flores,
Russell Greiner
Abstract:
Accurate forecasts of the number of newly infected people during an epidemic are critical for making effective timely decisions. This paper addresses this challenge using the SIMLR model, which incorporates machine learning (ML) into the epidemiological SIR model. For each region, SIMLR tracks the changes in the policies implemented at the government level, which it uses to estimate the time-varyi…
▽ More
Accurate forecasts of the number of newly infected people during an epidemic are critical for making effective timely decisions. This paper addresses this challenge using the SIMLR model, which incorporates machine learning (ML) into the epidemiological SIR model. For each region, SIMLR tracks the changes in the policies implemented at the government level, which it uses to estimate the time-varying parameters of an SIR model for forecasting the number of new infections 1- to 4-weeks in advance.It also forecasts the probability of changes in those government policies at each of these future times, which is essential for the longer-range forecasts. We applied SIMLR to data from regions in Canada and in the United States,and show that its MAPE (mean average percentage error) performance is as good as SOTA forecasting models, with the added advantage of being an interpretable model. We expect that this approach will be useful not only for forecasting COVID-19 infections, but also in predicting the evolution of other infectious diseases.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.
-
Sample Efficient Learning of Image-Based Diagnostic Classifiers Using Probabilistic Labels
Authors:
Roberto Vega,
Pouneh Gorji,
Zichen Zhang,
Xuebin Qin,
Abhilash Rakkunedeth Hareendranathan,
Jeevesh Kapur,
Jacob L. Jaremko,
Russell Greiner
Abstract:
Deep learning approaches often require huge datasets to achieve good generalization. This complicates its use in tasks like image-based medical diagnosis, where the small training datasets are usually insufficient to learn appropriate data representations. For such sensitive tasks it is also important to provide the confidence in the predictions. Here, we propose a way to learn and use probabilist…
▽ More
Deep learning approaches often require huge datasets to achieve good generalization. This complicates its use in tasks like image-based medical diagnosis, where the small training datasets are usually insufficient to learn appropriate data representations. For such sensitive tasks it is also important to provide the confidence in the predictions. Here, we propose a way to learn and use probabilistic labels to train accurate and calibrated deep networks from relatively small datasets. We observe gains of up to 22% in the accuracy of models trained with these labels, as compared with traditional approaches, in three classification tasks: diagnosis of hip dysplasia, fatty liver, and glaucoma. The outputs of models trained with probabilistic labels are calibrated, allowing the interpretation of its predictions as proper probabilities. We anticipate this approach will apply to other tasks where few training instances are available and expert knowledge can be encoded as probabilities.
△ Less
Submitted 11 February, 2021;
originally announced February 2021.
-
The Role of Mutual Information in Variational Classifiers
Authors:
Matias Vera,
Leonardo Rey Vega,
Pablo Piantanida
Abstract:
Overfitting data is a well-known phenomenon related with the generation of a model that mimics too closely (or exactly) a particular instance of data, and may therefore fail to predict future observations reliably. In practice, this behaviour is controlled by various--sometimes heuristics--regularization techniques, which are motivated by develo** upper bounds to the generalization error. In thi…
▽ More
Overfitting data is a well-known phenomenon related with the generation of a model that mimics too closely (or exactly) a particular instance of data, and may therefore fail to predict future observations reliably. In practice, this behaviour is controlled by various--sometimes heuristics--regularization techniques, which are motivated by develo** upper bounds to the generalization error. In this work, we study the generalization error of classifiers relying on stochastic encodings trained on the cross-entropy loss, which is often used in deep learning for classification problems. We derive bounds to the generalization error showing that there exists a regime where the generalization error is bounded by the mutual information between input features and the corresponding representations in the latent space, which are randomly generated according to the encoding distribution. Our bounds provide an information-theoretic understanding of generalization in the so-called class of variational classifiers, which are regularized by a Kullback-Leibler (KL) divergence term. These results give theoretical grounds for the highly popular KL term in variational inference methods that was already recognized to act effectively as a regularization penalty. We further observe connections with well studied notions such as Variational Autoencoders, Information Dropout, Information Bottleneck and Boltzmann Machines. Finally, we perform numerical experiments on MNIST and CIFAR datasets and show that mutual information is indeed highly representative of the behaviour of the generalization error.
△ Less
Submitted 13 April, 2023; v1 submitted 22 October, 2020;
originally announced October 2020.
-
Efficient GPU Thread Map** on Embedded 2D Fractals
Authors:
Cristóbal A. Navarro,
Felipe A. Quezada,
Nancy Hitschfeld,
Raimundo Vega,
Benjamin Bustos
Abstract:
This work proposes a new approach for map** GPU threads onto a family of discrete embedded 2D fractals. A block-space map $λ: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is proposed, from Euclidean parallel space $\mathbb{E}$ to embedded fractal space $\mathbb{F}$, that maps in $\mathcal{O}(\log_2 \log_2(n))$ time and uses no more than $\mathcal{O}(n^\mathbb{H})$ threads wit…
▽ More
This work proposes a new approach for map** GPU threads onto a family of discrete embedded 2D fractals. A block-space map $λ: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is proposed, from Euclidean parallel space $\mathbb{E}$ to embedded fractal space $\mathbb{F}$, that maps in $\mathcal{O}(\log_2 \log_2(n))$ time and uses no more than $\mathcal{O}(n^\mathbb{H})$ threads with $\mathbb{H}$ being the Hausdorff dimension of the fractal, making it parallel space efficient. When compared to a bounding-box (BB) approach, $λ(ω)$ offers a sub-exponential improvement in parallel space and a monotonically increasing speedup $n \ge n_0$. The Sierpinski gasket fractal is used as a particular case study and the experimental performance results show that $λ(ω)$ reaches up to $9\times$ of speedup over the bounding-box approach. A tensor-core based implementation of $λ(ω)$ is also proposed for modern GPUs, providing up to $\sim40\%$ of extra performance. The results obtained in this work show that doing efficient GPU thread map** on fractal domains can significantly improve the performance of several applications that work with this type of geometry.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
GPU Tensor Cores for fast Arithmetic Reductions
Authors:
Cristóbal A. Navarro,
Roberto Carrasco,
Ricardo J. Barrientos,
Javier A. Riquelme,
Raimundo Vega
Abstract:
This work proposes a GPU tensor core approach that encodes the arithmetic reduction of $n$ numbers as a set of chained $m \times m$ matrix multiply accumulate (MMA) operations executed in parallel by GPU tensor cores. The asymptotic running time of the proposed chained tensor core approach is $T(n)=5 log_{m^2}{n}$ and its speedup is $S=\dfrac{4}{5} log_{2}{m^2}$ over the classic $O(n \log n)$ para…
▽ More
This work proposes a GPU tensor core approach that encodes the arithmetic reduction of $n$ numbers as a set of chained $m \times m$ matrix multiply accumulate (MMA) operations executed in parallel by GPU tensor cores. The asymptotic running time of the proposed chained tensor core approach is $T(n)=5 log_{m^2}{n}$ and its speedup is $S=\dfrac{4}{5} log_{2}{m^2}$ over the classic $O(n \log n)$ parallel reduction algorithm. Experimental performance results show that the proposed reduction method is $\sim 3.2 \times$ faster than a conventional GPU reduction implementation, and preserves the numerical precision because the sub-results of each chain of $R$ MMAs is kept as a 32-bit floating point value, before being all reduced into as a final 32-bit result. The chained MMA design allows a flexible configuration of thread-blocks; small thread-blocks of 32 or 128 threads can still achieve maximum performance using a chain of $R=4,5$ MMAs per block, while large thread-blocks work best with $R=1$. The results obtained in this work show that tensor cores can indeed provide a significant performance improvement to non-Machine Learning applications such as the arithmetic reduction, which is an integration tool for studying many scientific phenomena.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
A Resource for Computational Experiments on Mapudungun
Authors:
Mingjun Duan,
Carlos Fasola,
Sai Krishna Rallabandi,
Rodolfo M. Vega,
Antonios Anastasopoulos,
Lori Levin,
Alan W Black
Abstract:
We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers. We provide 142 hours of culturally significant conversations in the domain of medical treatment. The conversations are fully transcribed and translated into Spanish. The transcriptions also include annotations for code-switching and non-stand…
▽ More
We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers. We provide 142 hours of culturally significant conversations in the domain of medical treatment. The conversations are fully transcribed and translated into Spanish. The transcriptions also include annotations for code-switching and non-standard pronunciations. We also provide baseline results on three core NLP tasks: speech recognition, speech synthesis, and machine translation between Spanish and Mapudungun. We further explore other applications for which the corpus will be suitable, including the study of code-switching, historical orthography change, linguistic structure, and sociological and anthropological studies.
△ Less
Submitted 4 April, 2020; v1 submitted 3 December, 2019;
originally announced December 2019.
-
Extreme coverage in 5G Narrowband IoT: a LUT-based strategy to optimize shared channels
Authors:
Emmanuel Luján,
Juan A. Zuloaga Mellino,
Alejandro D. Otero,
Leonardo Rey Vega,
Cecilia G. Galarza,
Esteban E. Mocskos
Abstract:
One of the main challenges in IoT is providing communication support to an increasing number of connected devices. In recent years, narrowband radio technology has emerged to address this situation: Narrowband Internet of Things (NB-IoT), which is now part of 5G. Supporting massive connectivity becomes particularly demanding in extreme coverage scenarios such as underground or deep inside building…
▽ More
One of the main challenges in IoT is providing communication support to an increasing number of connected devices. In recent years, narrowband radio technology has emerged to address this situation: Narrowband Internet of Things (NB-IoT), which is now part of 5G. Supporting massive connectivity becomes particularly demanding in extreme coverage scenarios such as underground or deep inside buildings sites. We propose a novel strategy for these situations focused on optimizing NB-IoT shared channels through the selection of link parameters: modulation and coding scheme, as well as the number of repetitions. These parameters are established by the base station (BS) for each block transmitted until reaching a target block error rate (BLER_t ). A wrong selection of these magnitudes leads to radio resource waste and a decrease in the number of possible concurrent connections. Specifically, our strategy is based on a look-up table (LUT) scheme which is used for rapidly delivering the optimal link parameters given a target QoS. To validate our proposal, we compare with alternative strategies using an open source NB-IoT uplink simulator. The experiments are based on transmitting blocks of 256 bits using an AWGN channel over the NPUSCH. Results show that, especially under extreme conditions, only a few options for link parameters are available, favoring robustness against measurement uncertainties. Our strategy minimizes resource usage in all scenarios of acknowledged mode and remarkably reduces losses in the unacknowledged mode, presenting also substantial gains in performance. We expect to influence future BS software design and implementation, favoring connection support under extreme environments.
△ Less
Submitted 24 December, 2019; v1 submitted 7 August, 2019;
originally announced August 2019.
-
Understanding the Behaviour of the Empirical Cross-Entropy Beyond the Training Distribution
Authors:
Matias Vera,
Pablo Piantanida,
Leonardo Rey Vega
Abstract:
Machine learning theory has mostly focused on generalization to samples from the same distribution as the training data. Whereas a better understanding of generalization beyond the training distribution where the observed distribution changes is also fundamentally important to achieve a more powerful form of generalization. In this paper, we attempt to study through the lens of information measure…
▽ More
Machine learning theory has mostly focused on generalization to samples from the same distribution as the training data. Whereas a better understanding of generalization beyond the training distribution where the observed distribution changes is also fundamentally important to achieve a more powerful form of generalization. In this paper, we attempt to study through the lens of information measures how a particular architecture behaves when the true probability law of the samples is potentially different at training and testing times. Our main result is that the testing gap between the empirical cross-entropy and its statistical expectation (measured with respect to the testing probability law) can be bounded with high probability by the mutual information between the input testing samples and the corresponding representations, generated by the encoder obtained at training time. These results of theoretical nature are supported by numerical simulations showing that the mentioned mutual information is representative of the testing gap, capturing qualitatively the dynamic in terms of the hyperparameters of the network.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Analyzing GPU Tensor Core Potential for Fast Reductions
Authors:
Roberto Carrasco,
Raimundo Vega,
Cristóbal A. Navarro
Abstract:
The Nvidia GPU architecture has introduced new computing elements such as the \textit{tensor cores}, which are special processing units dedicated to perform fast matrix-multiply-accumulate (MMA) operations and accelerate \textit{Deep Learning} applications. In this work we present the idea of using tensor cores for a different purpose such as the parallel arithmetic reduction problem, and propose…
▽ More
The Nvidia GPU architecture has introduced new computing elements such as the \textit{tensor cores}, which are special processing units dedicated to perform fast matrix-multiply-accumulate (MMA) operations and accelerate \textit{Deep Learning} applications. In this work we present the idea of using tensor cores for a different purpose such as the parallel arithmetic reduction problem, and propose a new GPU tensor-core based algorithm as well as analyze its potential performance benefits in comparison to a traditional GPU-based one. The proposed method, encodes the reduction of $n$ numbers as a set of $m\times m$ MMA tensor-core operations (for Nvidia's Volta architecture $m=16$) and takes advantage from the fact that each MMA operation takes just one GPU cycle. When analyzing the cost under a simplified GPU computing model, the result is that the new algorithm manages to reduce a problem of $n$ numbers in $T(n) = 5\log_{m^2}(n)$ steps with a speedup of $S = \frac{4}{5}\log_2(m^2)$.
△ Less
Submitted 8 March, 2019;
originally announced March 2019.
-
The Role of Information Complexity and Randomization in Representation Learning
Authors:
Matías Vera,
Pablo Piantanida,
Leonardo Rey Vega
Abstract:
A grand challenge in representation learning is to learn the different explanatory factors of variation behind the high dimen- sional data. Encoder models are often determined to optimize performance on training data when the real objective is to generalize well to unseen data. Although there is enough numerical evidence suggesting that noise injection (during training) at the representation level…
▽ More
A grand challenge in representation learning is to learn the different explanatory factors of variation behind the high dimen- sional data. Encoder models are often determined to optimize performance on training data when the real objective is to generalize well to unseen data. Although there is enough numerical evidence suggesting that noise injection (during training) at the representation level might improve the generalization ability of encoders, an information-theoretic understanding of this principle remains elusive. This paper presents a sample-dependent bound on the generalization gap of the cross-entropy loss that scales with the information complexity (IC) of the representations, meaning the mutual information between inputs and their representations. The IC is empirically investigated for standard multi-layer neural networks with SGD on MNIST and CIFAR-10 datasets; the behaviour of the gap and the IC appear to be in direct correlation, suggesting that SGD selects encoders to implicitly minimize the IC. We specialize the IC to study the role of Dropout on the generalization capacity of deep encoders which is shown to be directly related to the encoder capacity, being a measure of the distinguishability among samples from their representations. Our results support some recent regularization methods.
△ Less
Submitted 14 February, 2018;
originally announced February 2018.
-
Compression-Based Regularization with an Application to Multi-Task Learning
Authors:
Matías Vera,
Leonardo Rey Vega,
Pablo Piantanida
Abstract:
This paper investigates, from information theoretic grounds, a learning problem based on the principle that any regularity in a given dataset can be exploited to extract compact features from data, i.e., using fewer bits than needed to fully describe the data itself, in order to build meaningful representations of a relevant content (multiple labels). We begin by introducing the noisy lossy source…
▽ More
This paper investigates, from information theoretic grounds, a learning problem based on the principle that any regularity in a given dataset can be exploited to extract compact features from data, i.e., using fewer bits than needed to fully describe the data itself, in order to build meaningful representations of a relevant content (multiple labels). We begin by introducing the noisy lossy source coding paradigm with the log-loss fidelity criterion which provides the fundamental tradeoffs between the \emph{cross-entropy loss} (average risk) and the information rate of the features (model complexity). Our approach allows an information theoretic formulation of the \emph{multi-task learning} (MTL) problem which is a supervised learning framework in which the prediction models for several related tasks are learned jointly from common representations to achieve better generalization performance. Then, we present an iterative algorithm for computing the optimal tradeoffs and its global convergence is proven provided that some conditions hold. An important property of this algorithm is that it provides a natural safeguard against overfitting, because it minimizes the average risk taking into account a penalization induced by the model complexity. Remarkably, empirical results illustrate that there exists an optimal information rate minimizing the \emph{excess risk} which depends on the nature and the amount of available training data. An application to hierarchical text categorization is also investigated, extending previous works.
△ Less
Submitted 19 November, 2017;
originally announced November 2017.
-
Block-space GPU Map** for Embedded Sierpiński Gasket Fractals
Authors:
Cristóbal A. Navarro,
Benjamín Bustos,
Raimundo Vega,
Nancy Hitschfeld
Abstract:
This work studies the problem of GPU thread map** for a Sierpiński gasket fractal embedded in a discrete Euclidean space of $n \times n$. A block-space map $λ: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is proposed, from Euclidean parallel space $\mathbb{E}$ to embedded fractal space $\mathbb{F}$, that maps in $\mathcal{O}(\log_2 \log_2(n))$ time and uses no more than…
▽ More
This work studies the problem of GPU thread map** for a Sierpiński gasket fractal embedded in a discrete Euclidean space of $n \times n$. A block-space map $λ: \mathbb{Z}_{\mathbb{E}}^{2} \mapsto \mathbb{Z}_{\mathbb{F}}^{2}$ is proposed, from Euclidean parallel space $\mathbb{E}$ to embedded fractal space $\mathbb{F}$, that maps in $\mathcal{O}(\log_2 \log_2(n))$ time and uses no more than $\mathcal{O}(n^\mathbb{H})$ threads with $\mathbb{H} \approx 1.58...$ being the Hausdorff dimension, making it parallel space efficient. When compared to a bounding-box map, $λ(ω)$ offers a sub-exponential improvement in parallel space and a monotonically increasing speedup once $n > n_0$. Experimental performance tests show that in practice $λ(ω)$ can produce performance improvement at any block-size once $n > n_0 = 2^8$, reaching approximately $10\times$ of speedup for $n=2^{16}$ under optimal block configurations.
△ Less
Submitted 14 June, 2017;
originally announced June 2017.
-
Collaborative Information Bottleneck
Authors:
Matías Vera,
Leonardo Rey Vega,
Pablo Piantanida
Abstract:
This paper investigates a multi-terminal source coding problem under a logarithmic loss fidelity which does not necessarily lead to an additive distortion measure. The problem is motivated by an extension of the Information Bottleneck method to a multi-source scenario where several encoders have to build cooperatively rate-limited descriptions of their sources in order to maximize information with…
▽ More
This paper investigates a multi-terminal source coding problem under a logarithmic loss fidelity which does not necessarily lead to an additive distortion measure. The problem is motivated by an extension of the Information Bottleneck method to a multi-source scenario where several encoders have to build cooperatively rate-limited descriptions of their sources in order to maximize information with respect to other unobserved (hidden) sources. More precisely, we study fundamental information-theoretic limits of the so-called: (i) Two-way Collaborative Information Bottleneck (TW-CIB) and (ii) the Collaborative Distributed Information Bottleneck (CDIB) problems. The TW-CIB problem consists of two distant encoders that separately observe marginal (dependent) components $X_1$ and $X_2$ and can cooperate through multiple exchanges of limited information with the aim of extracting information about hidden variables $(Y_1,Y_2)$, which can be arbitrarily dependent on $(X_1,X_2)$. On the other hand, in CDIB there are two cooperating encoders which separately observe $X_1$ and $X_2$ and a third node which can listen to the exchanges between the two encoders in order to obtain information about a hidden variable $Y$. The relevance (figure-of-merit) is measured in terms of a normalized (per-sample) multi-letter mutual information metric (log-loss fidelity) and an interesting tradeoff arises by constraining the complexity of descriptions, measured in terms of the rates needed for the exchanges between the encoders and decoders involved. Inner and outer bounds to the complexity-relevance region of these problems are derived from which optimality is characterized for several cases of interest. Our resulting theoretical complexity-relevance regions are finally evaluated for binary symmetric and Gaussian statistical models.
△ Less
Submitted 24 November, 2021; v1 submitted 5 April, 2016;
originally announced April 2016.
-
Cooperative spectrum sensing schemes with partial statistics knowledge
Authors:
Juan Augusto Maya,
Leonardo Rey Vega,
Cecilia G. Galarza
Abstract:
In this letter, we analyze the problem of detecting spectrum holes in cognitive radio systems. We consider that a group of unlicensed users can sense the radio signal energy, perform some simple processing and transmit the result to a central entity, where the decision about the presence or not of licensed users is made. We show that the proposed cooperative schemes present good performances even…
▽ More
In this letter, we analyze the problem of detecting spectrum holes in cognitive radio systems. We consider that a group of unlicensed users can sense the radio signal energy, perform some simple processing and transmit the result to a central entity, where the decision about the presence or not of licensed users is made. We show that the proposed cooperative schemes present good performances even without any knowledge about the measurements statistics in the unlicensed users and with only partial knowledge of them in the central entity.
△ Less
Submitted 2 November, 2015; v1 submitted 5 October, 2015;
originally announced October 2015.
-
Exploiting Spatial Correlation in Energy Constrained Distributed Detection
Authors:
Juan Augusto Maya,
Cecilia G. Galarza,
Leonardo Rey Vega
Abstract:
We consider the detection of a correlated random process immersed in noise in a wireless sensor network. Each node has an individual energy constraint and the communication with the processing central units are affected by the path loss propagation effect. Guided by energy efficiency concerns, we consider the partition of the whole network into clusters, each one with a coordination node or \emph{…
▽ More
We consider the detection of a correlated random process immersed in noise in a wireless sensor network. Each node has an individual energy constraint and the communication with the processing central units are affected by the path loss propagation effect. Guided by energy efficiency concerns, we consider the partition of the whole network into clusters, each one with a coordination node or \emph{cluster head}. Thus, the nodes transmit their measurements to the corresponding cluster heads, which after some processing, communicate a summary of the received information to the fusion center, which takes the final decision about the state of the nature. As the network has a fixed size, communication within smaller clusters will be less affected by the path loss effect, reducing energy consumption in the information exchange process between nodes and cluster heads. However, this limits the capability of the network of beneficially exploiting the spatial correlation of the process, specially when the spatial correlation coherence of the process is of the same scale as the clusters size. Therefore, a trade-off is established between the energy efficiency and the beneficial use of spatial correlation. The study of this trade-off is the main goal of this paper. We derive tight approximations of the false alarm and miss-detection error probabilities under the Neyman-Pearson framework for the above scenario. We also consider the application of these results to a particular network and correlation model obtaining closed form expressions. Finally, we validate the results for more general network and correlation models through numerical simulations.
△ Less
Submitted 14 September, 2015;
originally announced September 2015.
-
The Three-Terminal Interactive Lossy Source Coding Problem
Authors:
Leonardo Rey Vega,
Pablo Piantanida,
Alfred Hero III
Abstract:
The three-node multiterminal lossy source coding problem is investigated. We derive an inner bound to the general rate-distortion region of this problem which is a natural extension of the seminal work by Kaspi'85 on the interactive two-terminal source coding problem. It is shown that this (rather involved) inner bound contains several rate-distortion regions of some relevant source coding setting…
▽ More
The three-node multiterminal lossy source coding problem is investigated. We derive an inner bound to the general rate-distortion region of this problem which is a natural extension of the seminal work by Kaspi'85 on the interactive two-terminal source coding problem. It is shown that this (rather involved) inner bound contains several rate-distortion regions of some relevant source coding settings. In this way, besides the non-trivial extension of the interactive two terminal problem, our results can be seen as a generalization and hence unification of several previous works in the field. Specializing to particular cases we obtain novel rate-distortion regions for several lossy source coding problems. We finish by describing some of the open problems and challenges. However, the general three-node multiterminal lossy source coding problem seems to offer a formidable mathematical complexity.
△ Less
Submitted 18 January, 2016; v1 submitted 4 February, 2015;
originally announced February 2015.
-
Computer-assisted polyp matching between optical colonoscopy and CT colonography: a phantom study
Authors:
Holger R. Roth,
Thomas E. Hampshire,
Emma Helbren,
Mingxing Hu,
Roser Vega,
Steve Halligan,
David J. Hawkes
Abstract:
Potentially precancerous polyps detected with CT colonography (CTC) need to be removed subsequently, using an optical colonoscope (OC). Due to large colonic deformations induced by the colonoscope, even very experienced colonoscopists find it difficult to pinpoint the exact location of the colonoscope tip in relation to polyps reported on CTC. This can cause unduly prolonged OC examinations that a…
▽ More
Potentially precancerous polyps detected with CT colonography (CTC) need to be removed subsequently, using an optical colonoscope (OC). Due to large colonic deformations induced by the colonoscope, even very experienced colonoscopists find it difficult to pinpoint the exact location of the colonoscope tip in relation to polyps reported on CTC. This can cause unduly prolonged OC examinations that are stressful for the patient, colonoscopist and supporting staff.
We developed a method, based on monocular 3D reconstruction from OC images, that automatically matches polyps observed in OC with polyps reported on prior CTC. A matching cost is computed, using rigid point-based registration between surface point clouds extracted from both modalities. A 3D printed and painted phantom of a 25 cm long transverse colon segment was used to validate the method on two medium sized polyps. Results indicate that the matching cost is smaller at the correct corresponding polyp between OC and CTC: the value is 3.9 times higher at the incorrect polyp, comparing the correct match between polyps to the incorrect match. Furthermore, we evaluate the matching of the reconstructed polyp from OC with other colonic endoluminal surface structures such as haustral folds and show that there is a minimum at the correct polyp from CTC.
Automated matching between polyps observed at OC and prior CTC would facilitate the biopsy or removal of true-positive pathology or exclusion of false-positive CTC findings, and would reduce colonoscopy false-negative (missed) polyps. Ultimately, such a method might reduce healthcare costs, patient inconvenience and discomfort.
△ Less
Submitted 15 January, 2015;
originally announced January 2015.
-
Distributed Detection of a Random Process over a Multiple Access Channel under Energy and Bandwidth Constraints
Authors:
Juan Augusto Maya,
Leonardo Rey Vega,
Cecilia G. Galarza
Abstract:
We analyze a binary hypothesis testing problem built on a wireless sensor network (WSN) for detecting a stationary random process distributed both in space and time with circularly-symmetric complex Gaussian distribution under the Neyman-Pearson framework. Using an analog scheme, the sensors transmit different linear combinations of their measurements through a multiple access channel (MAC) to rea…
▽ More
We analyze a binary hypothesis testing problem built on a wireless sensor network (WSN) for detecting a stationary random process distributed both in space and time with circularly-symmetric complex Gaussian distribution under the Neyman-Pearson framework. Using an analog scheme, the sensors transmit different linear combinations of their measurements through a multiple access channel (MAC) to reach the fusion center (FC), whose task is to decide whether the process is present or not. Considering an energy constraint on each node transmission and a limited amount of channel uses, we compute the miss error exponent of the proposed scheme using Large Deviation Theory (LDT) and show that the proposed strategy is asymptotically optimal (when the number of sensors approaches to infinity) among linear orthogonal schemes. We also show that the proposed scheme obtains significant energy saving in the low signal-to-noise ratio regime, which is the typical scenario of WSNs. Finally, a Monte Carlo simulation of a 2-dimensional process in space validates the analytical results.
△ Less
Submitted 15 October, 2014;
originally announced October 2014.
-
On Fundamental Trade-offs of Device-to-Device Communications in Large Wireless Networks
Authors:
Andrés Altieri,
Pablo Piantanida,
Leonardo Rey Vega,
Cecilia G. Galarza
Abstract:
This paper studies the gains, in terms of served requests, attainable through out-of-band device-to-device (D2D) video exchanges in large cellular networks. A stochastic framework, in which users are clustered to exchange videos, is introduced, considering several aspects of this problem: the video-caching policy, user matching for exchanges, aspects regarding scheduling and transmissions. A famil…
▽ More
This paper studies the gains, in terms of served requests, attainable through out-of-band device-to-device (D2D) video exchanges in large cellular networks. A stochastic framework, in which users are clustered to exchange videos, is introduced, considering several aspects of this problem: the video-caching policy, user matching for exchanges, aspects regarding scheduling and transmissions. A family of \emph{admissible protocols} is introduced: in each protocol the users are clustered by means of a hard-core point process and, within the clusters, video exchanges take place. Two metrics, quantifying the "local" and "global" fraction of video requests served through D2D are defined, and relevant trade-off regions involving these metrics, as well as quality-of-service constraints, are identified. A simple communication strategy is proposed and analyzed, to obtain inner bounds to the trade-off regions, and draw conclusions on the performance attainable through D2D. To this end, an analysis of the time-varying interference that the nodes experience, and tight approximations of its Laplace transform are derived.
△ Less
Submitted 4 May, 2015; v1 submitted 9 May, 2014;
originally announced May 2014.
-
On the Outage Probability of the Full-Duplex Interference-Limited Relay Channel
Authors:
Andres Altieri,
Leonardo Rey Vega,
Pablo Piantanida,
Cecilia G. Galarza
Abstract:
In this paper, we study the performance, in terms of the asymptotic error probability, of a user which communicates with a destination with the aid of a full-duplex in-band relay. We consider that the network is interference-limited, and interfering users are distributed as a Poisson point process. In this case, the asymptotic error probability is upper bounded by the outage probability (OP). We i…
▽ More
In this paper, we study the performance, in terms of the asymptotic error probability, of a user which communicates with a destination with the aid of a full-duplex in-band relay. We consider that the network is interference-limited, and interfering users are distributed as a Poisson point process. In this case, the asymptotic error probability is upper bounded by the outage probability (OP). We investigate the outage behavior for well-known cooperative schemes, namely, decode-and-forward (DF) and compress-and-forward (CF) considering fading and path loss. For DF we determine the exact OP and develop upper bounds which are tight in typical operating conditions. Also, we find the correlation coefficient between source and relay signals which minimizes the OP when the density of interferers is small. For CF, the achievable rates are determined by the spatial correlation of the interferences, and a straightforward analysis isn't possible. To handle this issue, we show the rate with correlated noises is at most one bit worse than with uncorrelated noises, and thus find an upper bound on the performance of CF. These results are useful to evaluate the performance and to optimize relaying schemes in the context of full-duplex wireless networks.
△ Less
Submitted 15 May, 2014; v1 submitted 28 March, 2014;
originally announced March 2014.
-
Analysis of a Cooperative Strategy for a Large Decentralized Wireless Network
Authors:
Andrés Altieri,
Leonardo Rey Vega,
Pablo Piantanida,
Cecilia Galarza
Abstract:
This paper investigates the benefits of cooperation and proposes a relay activation strategy for a large wireless network with multiple transmitters. In this framework, some nodes cooperate with a nearby node that acts as a relay, using the decode-and-forward protocol, and others use direct transmission. The network is modeled as an independently marked Poisson point process and the source nodes m…
▽ More
This paper investigates the benefits of cooperation and proposes a relay activation strategy for a large wireless network with multiple transmitters. In this framework, some nodes cooperate with a nearby node that acts as a relay, using the decode-and-forward protocol, and others use direct transmission. The network is modeled as an independently marked Poisson point process and the source nodes may choose their relays from the set of inactive nodes. Although cooperation can potentially lead to significant improvements in the performance of a communication pair, relaying causes additional interference in the network, increasing the average noise that other nodes see. We investigate how source nodes should balance cooperation vs. interference to obtain reliable transmissions, and for this purpose we study and optimize a relay activation strategy with respect to the outage probability. Surprisingly, in the high reliability regime, the optimized strategy consists on the activation of all the relays or none at all, depending on network parameters. We provide a simple closed-form expression that indicates when the relays should be active, and we introduce closed form expressions that quantify the performance gains of this scheme with respect to a network that only uses direct transmission.
△ Less
Submitted 12 June, 2013; v1 submitted 15 March, 2012;
originally announced March 2012.
-
Cooperative Strategies for Interference-Limited Wireless Networks
Authors:
Andres Altieri,
Leonardo Rey Vega,
Cecilia G. Galarza,
Pablo Piantanida
Abstract:
Consider the communication of a single-user aided by a nearby relay involved in a large wireless network where the nodes form an homogeneous Poisson point process. Since this network is interference-limited the asymptotic error probability is bounded from above by the outage probability experienced by the user. We investigate the outage behavior for the well-known cooperative schemes, namely, deco…
▽ More
Consider the communication of a single-user aided by a nearby relay involved in a large wireless network where the nodes form an homogeneous Poisson point process. Since this network is interference-limited the asymptotic error probability is bounded from above by the outage probability experienced by the user. We investigate the outage behavior for the well-known cooperative schemes, namely, decode-and-forward (DF) and compress-and-forward (CF). In this setting, the outage events are induced by both fading and the spatial proximity of neighbor nodes who generate the strongest interference and hence the worst communication case. Upper and lower bounds on the asymptotic error probability which are tight in some cases are derived. It is shown that there exists a clear trade off between the network density and the benefits of user cooperation. These results are useful to evaluate performances and to optimize relaying schemes in the context of large wireless networks.
△ Less
Submitted 10 March, 2011;
originally announced March 2011.