-
Revealing Vision-Language Integration in the Brain with Multimodal Networks
Authors:
Vighnesh Subramaniam,
Colin Conwell,
Christopher Wang,
Gabriel Kreiman,
Boris Katz,
Ignacio Cases,
Andrei Barbu
Abstract:
We use (multi)modal deep neural networks (DNNs) to probe for sites of multimodal integration in the human brain by predicting stereoencephalography (SEEG) recordings taken while human subjects watched movies. We operationalize sites of multimodal integration as regions where a multimodal vision-language model predicts recordings better than unimodal language, unimodal vision, or linearly-integrate…
▽ More
We use (multi)modal deep neural networks (DNNs) to probe for sites of multimodal integration in the human brain by predicting stereoencephalography (SEEG) recordings taken while human subjects watched movies. We operationalize sites of multimodal integration as regions where a multimodal vision-language model predicts recordings better than unimodal language, unimodal vision, or linearly-integrated language-vision models. Our target DNN models span different architectures (e.g., convolutional networks and transformers) and multimodal training techniques (e.g., cross-attention and contrastive learning). As a key enabling step, we first demonstrate that trained vision and language models systematically outperform their randomly initialized counterparts in their ability to predict SEEG signals. We then compare unimodal and multimodal models against one another. Because our target DNN models often have different architectures, number of parameters, and training sets (possibly obscuring those differences attributable to integration), we carry out a controlled comparison of two models (SLIP and SimCLR), which keep all of these attributes the same aside from input modality. Using this approach, we identify a sizable number of neural sites (on average 141 out of 1090 total sites or 12.94%) and brain regions where multimodal integration seems to occur. Additionally, we find that among the variants of multimodal training techniques we assess, CLIP-style training is the best suited for downstream prediction of the neural activity in these sites.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Population Transformer: Learning Population-level Representations of Intracranial Activity
Authors:
Geeling Chau,
Christopher Wang,
Sabera Talukder,
Vighnesh Subramaniam,
Saraswati Soedarmadji,
Yisong Yue,
Boris Katz,
Andrei Barbu
Abstract:
We present a self-supervised framework that learns population-level codes for intracranial neural recordings at scale, unlocking the benefits of representation learning for a key neuroscience recording modality. The Population Transformer (PopT) lowers the amount of data required for decoding experiments, while increasing accuracy, even on never-before-seen subjects and tasks. We address two key c…
▽ More
We present a self-supervised framework that learns population-level codes for intracranial neural recordings at scale, unlocking the benefits of representation learning for a key neuroscience recording modality. The Population Transformer (PopT) lowers the amount of data required for decoding experiments, while increasing accuracy, even on never-before-seen subjects and tasks. We address two key challenges in develo** PopT: sparse electrode distribution and varying electrode location across patients. PopT stacks on top of pretrained representations and enhances downstream tasks by enabling learned aggregation of multiple spatially-sparse data channels. Beyond decoding, we interpret the pretrained PopT and fine-tuned models to show how it can be used to provide neuroscience insights learned from massive amounts of data. We release a pretrained PopT to enable off-the-shelf improvements in multi-channel intracranial data decoding and interpretability, and code is available at https://github.com/czlwang/PopulationTransformer.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
SecureLLM: Using Compositionality to Build Provably Secure Language Models for Private, Sensitive, and Secret Data
Authors:
Abdulrahman Alabdulkareem,
Christian M Arnold,
Yerim Lee,
Pieter M Feenstra,
Boris Katz,
Andrei Barbu
Abstract:
Traditional security mechanisms isolate resources from users who should not access them. We reflect the compositional nature of such security mechanisms back into the structure of LLMs to build a provably secure LLM; that we term SecureLLM. Other approaches to LLM safety attempt to protect against bad actors or bad outcomes, but can only do so to an extent making them inappropriate for sensitive d…
▽ More
Traditional security mechanisms isolate resources from users who should not access them. We reflect the compositional nature of such security mechanisms back into the structure of LLMs to build a provably secure LLM; that we term SecureLLM. Other approaches to LLM safety attempt to protect against bad actors or bad outcomes, but can only do so to an extent making them inappropriate for sensitive data. SecureLLM blends access security with fine-tuning methods. Each data silo has associated with it a separate fine-tuning and a user has access only to the collection of fine-tunings that they have permission for. The model must then perform on compositional tasks at the intersection of those data silos with the combination of those individual fine-tunings. While applicable to any task like document QA or making API calls, in this work we concern ourselves with models that learn the layouts of new SQL databases to provide natural-language-to-SQL translation capabilities. Existing fine-tuning composition methods fail in this challenging environment, as they are not well-equipped for handling compositional tasks. Compositionality remains a challenge for LLMs. We contribute both a difficult new compositional natural-language-to-SQL translation task and a new perspective on LLM security that allows models to be deployed to secure environments today.
△ Less
Submitted 13 June, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
A Study of Shape Modeling Against Noise
Authors:
Cheng Long,
Adrian Barbu
Abstract:
Shape modeling is a challenging task with many potential applications in computer vision and medical imaging. There are many shape modeling methods in the literature, each with its advantages and applications. However, many shape modeling methods have difficulties handling shapes that have missing pieces or outliers. In this regard, this paper introduces shape denoising, a fundamental problem in s…
▽ More
Shape modeling is a challenging task with many potential applications in computer vision and medical imaging. There are many shape modeling methods in the literature, each with its advantages and applications. However, many shape modeling methods have difficulties handling shapes that have missing pieces or outliers. In this regard, this paper introduces shape denoising, a fundamental problem in shape modeling that lies at the core of many computer vision and medical imaging applications and has not received enough attention in the literature. The paper introduces six types of noise that can be used to perturb shapes as well as an objective measure for the noise level and for comparing methods on their shape denoising capabilities. Finally, the paper evaluates seven methods capable of accomplishing this task, of which six are based on deep learning, including some generative models.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Slow Kill for Big Data Learning
Authors:
Yiyuan She,
Jianhui Shen,
Adrian Barbu
Abstract:
Big-data applications often involve a vast number of observations and features, creating new challenges for variable selection and parameter estimation. This paper presents a novel technique called ``slow kill,'' which utilizes nonconvex constrained optimization, adaptive $\ell_2$-shrinkage, and increasing learning rates. The fact that the problem size can decrease during the slow kill iterations…
▽ More
Big-data applications often involve a vast number of observations and features, creating new challenges for variable selection and parameter estimation. This paper presents a novel technique called ``slow kill,'' which utilizes nonconvex constrained optimization, adaptive $\ell_2$-shrinkage, and increasing learning rates. The fact that the problem size can decrease during the slow kill iterations makes it particularly effective for large-scale variable screening. The interaction between statistics and optimization provides valuable insights into controlling quantiles, stepsize, and shrinkage parameters in order to relax the regularity conditions required to achieve the desired level of statistical accuracy. Experimental results on real and synthetic data show that slow kill outperforms state-of-the-art algorithms in various situations while being computationally efficient for large-scale data.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
Training a Two Layer ReLU Network Analytically
Authors:
Adrian Barbu
Abstract:
Neural networks are usually trained with different variants of gradient descent based optimization algorithms such as stochastic gradient descent or the Adam optimizer. Recent theoretical work states that the critical points (where the gradient of the loss is zero) of two-layer ReLU networks with the square loss are not all local minima. However, in this work we will explore an algorithm for train…
▽ More
Neural networks are usually trained with different variants of gradient descent based optimization algorithms such as stochastic gradient descent or the Adam optimizer. Recent theoretical work states that the critical points (where the gradient of the loss is zero) of two-layer ReLU networks with the square loss are not all local minima. However, in this work we will explore an algorithm for training two-layer neural networks with ReLU-like activation and the square loss that alternatively finds the critical points of the loss function analytically for one layer while kee** the other layer and the neuron activation pattern fixed. Experiments indicate that this simple algorithm can find deeper optima than Stochastic Gradient Descent or the Adam optimizer, obtaining significantly smaller training loss values on four out of the five real datasets evaluated. Moreover, the method is faster than the gradient descent methods and has virtually no tuning parameters.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Feature Selection with Annealing for Forecasting Financial Time Series
Authors:
Hakan Pabuccu,
Adrian Barbu
Abstract:
Stock market and cryptocurrency forecasting is very important to investors as they aspire to achieve even the slightest improvement to their buy or hold strategies so that they may increase profitability. However, obtaining accurate and reliable predictions is challenging, noting that accuracy does not equate to reliability, especially when financial time-series forecasting is applied owing to its…
▽ More
Stock market and cryptocurrency forecasting is very important to investors as they aspire to achieve even the slightest improvement to their buy or hold strategies so that they may increase profitability. However, obtaining accurate and reliable predictions is challenging, noting that accuracy does not equate to reliability, especially when financial time-series forecasting is applied owing to its complex and chaotic tendencies. To mitigate this complexity, this study provides a comprehensive method for forecasting financial time series based on tactical input output feature map** techniques using machine learning (ML) models. During the prediction process, selecting the relevant indicators is vital to obtaining the desired results. In the financial field, limited attention has been paid to this problem with ML solutions. We investigate the use of feature selection with annealing (FSA) for the first time in this field, and we apply the least absolute shrinkage and selection operator (Lasso) method to select the features from more than 1,000 candidates obtained from 26 technical classifiers with different periods and lags. Boruta (BOR) feature selection, a wrapper method, is used as a baseline for comparison. Logistic regression (LR), extreme gradient boosting (XGBoost), and long short-term memory (LSTM) are then applied to the selected features for forecasting purposes using 10 different financial datasets containing cryptocurrencies and stocks. The dependent variables consisted of daily logarithmic returns and trends. The mean-squared error for regression, area under the receiver operating characteristic curve, and classification accuracy were used to evaluate model performance, and the statistical significance of the forecasting results was tested using paired t-tests. Experiments indicate that the FSA algorithm increased the performance of ML models, regardless of problem type.
△ Less
Submitted 23 February, 2024; v1 submitted 3 March, 2023;
originally announced March 2023.
-
Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers
Authors:
Yijia Zhou,
Kyle A. Gallivan,
Adrian Barbu
Abstract:
Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a provably robust clustering algorithm based on loss minimization that performs well on Gaussian mixture models with outliers. It provides theoretical guarantees t…
▽ More
Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. This paper introduces a provably robust clustering algorithm based on loss minimization that performs well on Gaussian mixture models with outliers. It provides theoretical guarantees that the algorithm obtains high accuracy with high probability under certain assumptions. Moreover, it can also be used as an initialization strategy for $k$-means clustering. Experiments on real-world large-scale datasets demonstrate the effectiveness of the algorithm when clustering a large number of clusters, and a $k$-means algorithm initialized by the algorithm outperforms many of the classic clustering methods in both speed and accuracy, while scaling well to large datasets such as ImageNet.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
BrainBERT: Self-supervised representation learning for intracranial recordings
Authors:
Christopher Wang,
Vighnesh Subramaniam,
Adam Uri Yaari,
Gabriel Kreiman,
Boris Katz,
Ignacio Cases,
Andrei Barbu
Abstract:
We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, i.e., decoding neural data, with higher accuracy and with much less data by being pretrained in an unsupervised manner on a large corpus of unannotated neural reco…
▽ More
We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, i.e., decoding neural data, with higher accuracy and with much less data by being pretrained in an unsupervised manner on a large corpus of unannotated neural recordings. Our approach generalizes to new subjects with electrodes in new positions and to unrelated tasks showing that the representations robustly disentangle the neural signal. Just like in NLP where one can study language by investigating what a language model learns, this approach opens the door to investigating the brain by what a model of the brain learns. As a first step along this path, we demonstrate a new analysis of the intrinsic dimensionality of the computations in different areas of the brain. To construct these representations, we combine a technique for producing super-resolution spectrograms of neural data with an approach designed for generating contextual representations of audio by masking. In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Human or Machine? Turing Tests for Vision and Language
Authors:
Mengmi Zhang,
Giorgia Dellaferrera,
Ankur Sikarwar,
Marcelo Armendariz,
Noga Mudrik,
Prachi Agrawal,
Spandan Madan,
Andrei Barbu,
Haochen Yang,
Tanishq Kumar,
Meghna Sadwani,
Stella Dellaferrera,
Michele Pizzochero,
Hanspeter Pfister,
Gabriel Kreiman
Abstract:
As AI algorithms increasingly participate in daily activities that used to be the sole province of humans, we are inevitably called upon to consider how much machines are really like us. To address this question, we turn to the Turing test and systematically benchmark current AIs in their abilities to imitate humans. We establish a methodology to evaluate humans versus machines in Turing-like test…
▽ More
As AI algorithms increasingly participate in daily activities that used to be the sole province of humans, we are inevitably called upon to consider how much machines are really like us. To address this question, we turn to the Turing test and systematically benchmark current AIs in their abilities to imitate humans. We establish a methodology to evaluate humans versus machines in Turing-like tests and systematically evaluate a representative set of selected domains, parameters, and variables. The experiments involved testing 769 human agents, 24 state-of-the-art AI agents, 896 human judges, and 8 AI judges, in 21,570 Turing tests across 6 tasks encompassing vision and language modalities. Surprisingly, the results reveal that current AIs are not far from being able to impersonate human judges across different ages, genders, and educational levels in complex visual and language challenges. In contrast, simple AI judges outperform human judges in distinguishing human answers versus machine answers. The curated large-scale Turing test datasets introduced here and their evaluation metrics provide valuable insights to assess whether an agent is human or not. The proposed formulation to benchmark human imitation ability in current AIs paves a way for the research community to expand Turing tests to other research areas and conditions. All of source code and data are publicly available at https://tinyurl.com/8x8nha7p
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Query The Agent: Improving sample efficiency through epistemic uncertainty estimation
Authors:
Julian Alverio,
Boris Katz,
Andrei Barbu
Abstract:
Curricula for goal-conditioned reinforcement learning agents typically rely on poor estimates of the agent's epistemic uncertainty or fail to consider the agents' epistemic uncertainty altogether, resulting in poor sample efficiency. We propose a novel algorithm, Query The Agent (QTA), which significantly improves sample efficiency by estimating the agent's epistemic uncertainty throughout the sta…
▽ More
Curricula for goal-conditioned reinforcement learning agents typically rely on poor estimates of the agent's epistemic uncertainty or fail to consider the agents' epistemic uncertainty altogether, resulting in poor sample efficiency. We propose a novel algorithm, Query The Agent (QTA), which significantly improves sample efficiency by estimating the agent's epistemic uncertainty throughout the state space and setting goals in highly uncertain areas. Encouraging the agent to collect data in highly uncertain states allows the agent to improve its estimation of the value function rapidly. QTA utilizes a novel technique for estimating epistemic uncertainty, Predictive Uncertainty Networks (PUN), to allow QTA to assess the agent's uncertainty in all previously observed states. We demonstrate that QTA offers decisive sample efficiency improvements over preexisting methods.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Develo** a Series of AI Challenges for the United States Department of the Air Force
Authors:
Vijay Gadepally,
Gregory Angelides,
Andrei Barbu,
Andrew Bowne,
Laura J. Brattain,
Tamara Broderick,
Armando Cabrera,
Glenn Carl,
Ronisha Carter,
Miriam Cha,
Emilie Cowen,
Jesse Cummings,
Bill Freeman,
James Glass,
Sam Goldberg,
Mark Hamilton,
Thomas Heldt,
Kuan Wei Huang,
Phillip Isola,
Boris Katz,
Jamie Koerner,
Yen-Chen Lin,
David Mayo,
Kyle McAlpin,
Taylor Perron
, et al. (17 additional authors not shown)
Abstract:
Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requireme…
▽ More
Through a series of federal initiatives and orders, the U.S. Government has been making a concerted effort to ensure American leadership in AI. These broad strategy documents have influenced organizations such as the United States Department of the Air Force (DAF). The DAF-MIT AI Accelerator is an initiative between the DAF and MIT to bridge the gap between AI researchers and DAF mission requirements. Several projects supported by the DAF-MIT AI Accelerator are develo** public challenge problems that address numerous Federal AI research priorities. These challenges target priorities by making large, AI-ready datasets publicly available, incentivizing open-source solutions, and creating a demand signal for dual use technologies that can stimulate further research. In this article, we describe these public challenges being developed and how their application contributes to scientific advances.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation
Authors:
Binxu Wang,
David Mayo,
Arturo Deza,
Andrei Barbu,
Colin Conwell
Abstract:
Self-supervised learning is a powerful way to learn useful representations from natural data. It has also been suggested as one possible means of building visual representation in humans, but the specific objective and algorithm are unknown. Currently, most self-supervised methods encourage the system to learn an invariant representation of different transformations of the same image in contrast t…
▽ More
Self-supervised learning is a powerful way to learn useful representations from natural data. It has also been suggested as one possible means of building visual representation in humans, but the specific objective and algorithm are unknown. Currently, most self-supervised methods encourage the system to learn an invariant representation of different transformations of the same image in contrast to those of other images. However, such transformations are generally non-biologically plausible, and often consist of contrived perceptual schemes such as random crop** and color jittering. In this paper, we attempt to reverse-engineer these augmentations to be more biologically or perceptually plausible while still conferring the same benefits for encouraging robust representation. Critically, we find that random crop** can be substituted by cortical magnification, and saccade-like sampling of the image could also assist the representation learning. The feasibility of these transformations suggests a potential way that biological visual systems could implement self-supervision. Further, they break the widely accepted spatially-uniform processing assumption used in many computer vision algorithms, suggesting a role for spatially-adaptive computation in humans and machines alike. Our code and demo can be found here.
△ Less
Submitted 14 December, 2021;
originally announced December 2021.
-
Incorporating Rich Social Interactions Into MDPs
Authors:
Ravi Tejwani,
Yen-Ling Kuo,
Tianmin Shu,
Bennett Stankovits,
Dan Gutfreund,
Joshua B. Tenenbaum,
Boris Katz,
Andrei Barbu
Abstract:
Much of what we do as humans is engage socially with other agents, a skill that robots must also eventually possess. We demonstrate that a rich theory of social interactions originating from microsociology and economics can be formalized by extending a nested MDP where agents reason about arbitrary functions of each other's hidden rewards. This extended Social MDP allows us to encode the five basi…
▽ More
Much of what we do as humans is engage socially with other agents, a skill that robots must also eventually possess. We demonstrate that a rich theory of social interactions originating from microsociology and economics can be formalized by extending a nested MDP where agents reason about arbitrary functions of each other's hidden rewards. This extended Social MDP allows us to encode the five basic interactions that underlie microsociology: cooperation, conflict, coercion, competition, and exchange. The result is a robotic agent capable of executing social interactions zero-shot in new environments; like humans it can engage socially in novel ways even without a single example of that social interaction. Moreover, the judgments of these Social MDPs align closely with those of humans when considering which social interaction is taking place in an environment. This method both sheds light on the nature of social interactions, by providing concrete mathematical definitions, and brings rich social interactions into a mathematical framework that has proven to be natural for robotics, MDPs.
△ Less
Submitted 7 February, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
Trajectory Prediction with Linguistic Representations
Authors:
Yen-Ling Kuo,
Xin Huang,
Andrei Barbu,
Stephen G. McGill,
Boris Katz,
John J. Leonard,
Guy Rosman
Abstract:
Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories, and is trained using trajectory samples with partially-annotated captions. The model learns the meaning of each of the words without dir…
▽ More
Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories, and is trained using trajectory samples with partially-annotated captions. The model learns the meaning of each of the words without direct per-word supervision. At inference time, it generates a linguistic description of trajectories which captures maneuvers and interactions over an extended time interval. This generated description is used to refine predictions of the trajectories of multiple agents. We train and validate our model on the Argoverse dataset, and demonstrate improved accuracy results in trajectory prediction. In addition, our model is more interpretable: it presents part of its reasoning in plain language as captions, which can aid model development and can aid in building confidence in the model before deploying it.
△ Less
Submitted 9 March, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
Authors:
Ian Palmer,
Andrew Rouditchenko,
Andrei Barbu,
Boris Katz,
James Glass
Abstract:
Visually-grounded spoken language datasets can enable models to learn cross-modal correspondences with very weak supervision. However, modern audio-visual datasets contain biases that undermine the real-world performance of models trained on that data. We introduce Spoken ObjectNet, which is designed to remove some of these biases and provide a way to better evaluate how effectively models will pe…
▽ More
Visually-grounded spoken language datasets can enable models to learn cross-modal correspondences with very weak supervision. However, modern audio-visual datasets contain biases that undermine the real-world performance of models trained on that data. We introduce Spoken ObjectNet, which is designed to remove some of these biases and provide a way to better evaluate how effectively models will perform in real-world scenarios. This dataset expands upon ObjectNet, which is a bias-controlled image dataset that features similar image classes to those present in ImageNet. We detail our data collection pipeline, which features several methods to improve caption quality, including automated language model checks. Lastly, we show baseline results on image retrieval and audio retrieval tasks. These results show that models trained on other datasets and then evaluated on Spoken ObjectNet tend to perform poorly due to biases in other datasets that the models have learned. We also show evidence that the performance decrease is due to the dataset controls, and not the transfer setting.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Online Feature Screening for Data Streams with Concept Drift
Authors:
Mingyuan Wang,
Adrian Barbu
Abstract:
Screening feature selection methods are often used as a preprocessing step for reducing the number of variables before training step. Traditional screening methods only focus on dealing with complete high dimensional datasets. Modern datasets not only have higher dimension and larger sample size, but also have properties such as streaming input, sparsity and concept drift. Therefore a considerable…
▽ More
Screening feature selection methods are often used as a preprocessing step for reducing the number of variables before training step. Traditional screening methods only focus on dealing with complete high dimensional datasets. Modern datasets not only have higher dimension and larger sample size, but also have properties such as streaming input, sparsity and concept drift. Therefore a considerable number of online feature selection methods were introduced to handle these kind of problems in recent years. Online screening methods are one of the categories of online feature selection methods. The methods that we proposed in this research are capable of handling all three situations mentioned above. Our research study focuses on classification datasets. Our experiments show proposed methods can generate the same feature importance as their offline version with faster speed and less storage consumption. Furthermore, the results show that online screening methods with integrated model adaptation have a higher true feature detection rate than without model adaptation on data streams with the concept drift property. Among the two large real datasets that potentially have the concept drift property, online screening methods with model adaptation show advantages in either saving computing time and space, reducing model complexity, or improving prediction accuracy.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
The Compact Support Neural Network
Authors:
Adrian Barbu,
Hongyu Mou
Abstract:
Neural networks are popular and useful in many fields, but they have the problem of giving high confidence responses for examples that are away from the training data. This makes the neural networks very confident in their prediction while making gross mistakes, thus limiting their reliability for safety-critical applications such as autonomous driving, space exploration, etc. This paper introduce…
▽ More
Neural networks are popular and useful in many fields, but they have the problem of giving high confidence responses for examples that are away from the training data. This makes the neural networks very confident in their prediction while making gross mistakes, thus limiting their reliability for safety-critical applications such as autonomous driving, space exploration, etc. This paper introduces a novel neuron generalization that has the standard dot-product-based neuron and the {\color{black} radial basis function (RBF)} neuron as two extreme cases of a shape parameter. Using a rectified linear unit (ReLU) as the activation function results in a novel neuron that has compact support, which means its output is zero outside a bounded domain. To address the difficulties in training the proposed neural network, it introduces a novel training method that takes a pretrained standard neural network that is fine-tuned while gradually increasing the shape parameter to the desired value. The theoretical findings of the paper are a bound on the gradient of the proposed neuron and a proof that a neural network with such neurons has the universal approximation property. This means that the network can approximate any continuous and integrable function with an arbitrary degree of accuracy. The experimental findings on standard benchmark datasets show that the proposed approach has smaller test errors than state-of-the-art competing methods and outperforms the competing methods in detecting out-of-distribution samples on two out of three datasets.
△ Less
Submitted 7 December, 2021; v1 submitted 1 April, 2021;
originally announced April 2021.
-
PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception
Authors:
Aviv Netanyahu,
Tianmin Shu,
Boris Katz,
Andrei Barbu,
Joshua B. Tenenbaum
Abstract:
The ability to perceive and reason about social interactions in the context of physical environments is core to human social intelligence and human-machine cooperation. However, no prior dataset or benchmark has systematically evaluated physically grounded perception of complex social interactions that go beyond short actions, such as high-fiving, or simple group activities, such as gathering. In…
▽ More
The ability to perceive and reason about social interactions in the context of physical environments is core to human social intelligence and human-machine cooperation. However, no prior dataset or benchmark has systematically evaluated physically grounded perception of complex social interactions that go beyond short actions, such as high-fiving, or simple group activities, such as gathering. In this work, we create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions by including social concepts such as hel** another agent. PHASE consists of 2D animations of pairs of agents moving in a continuous space generated procedurally using a physics engine and a hierarchical planner. Agents have a limited field of view, and can interact with multiple objects, in an environment that has multiple landmarks and obstacles. Using PHASE, we design a social recognition task and a social prediction task. PHASE is validated with human experiments demonstrating that humans perceive rich interactions in the social events, and that the simulated agents behave similarly to humans. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE (SIMulation, Planning and Local Estimation), which outperforms state-of-the-art feed-forward neural networks. We hope that PHASE can serve as a difficult new challenge for develo** new models that can recognize complex social interactions.
△ Less
Submitted 19 March, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.
-
Learning a natural-language to LTL executable semantic parser for grounded robotics
Authors:
Christopher Wang,
Candace Ross,
Yen-Ling Kuo,
Boris Katz,
Andrei Barbu
Abstract:
Children acquire their native language with apparent ease by observing how language is used in context and attempting to use it themselves. They do so without laborious annotations, negative examples, or even direct corrections. We take a step toward robots that can do the same by training a grounded semantic parser, which discovers latent linguistic representations that can be used for the execut…
▽ More
Children acquire their native language with apparent ease by observing how language is used in context and attempting to use it themselves. They do so without laborious annotations, negative examples, or even direct corrections. We take a step toward robots that can do the same by training a grounded semantic parser, which discovers latent linguistic representations that can be used for the execution of natural-language commands. In particular, we focus on the difficult domain of commands with a temporal aspect, whose semantics we capture with Linear Temporal Logic, LTL. Our parser is trained with pairs of sentences and executions as well as an executor. At training time, the parser hypothesizes a meaning representation for the input as a formula in LTL. Three competing pressures allow the parser to discover meaning from language. First, any hypothesized meaning for a sentence must be permissive enough to reflect all the annotated execution trajectories. Second, the executor -- a pretrained end-to-end LTL planner -- must find that the observe trajectories are likely executions of the meaning. Finally, a generator, which reconstructs the original input, encourages the model to find representations that conserve knowledge about the command. Together these ensure that the meaning is neither too general nor too specific. Our model generalizes well, being able to parse and execute both machine-generated and human-generated commands, with near-equal accuracy, despite the fact that the human-generated sentences are much more varied and complex with an open lexicon. The approach presented here is not specific to LTL: it can be applied to any domain where sentence meanings can be hypothesized and an executor can verify these meanings, thus opening the door to many applications for robotic agents.
△ Less
Submitted 16 March, 2021; v1 submitted 7 August, 2020;
originally announced August 2020.
-
Compositional Networks Enable Systematic Generalization for Grounded Language Understanding
Authors:
Yen-Ling Kuo,
Boris Katz,
Andrei Barbu
Abstract:
Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before. Recent work has shown that while deep networks can mimic some human language abilities when presented with novel sentences, systematic variation uncovers the limitations in the language-understanding abilities of networks. We demonstrate that these limitations c…
▽ More
Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before. Recent work has shown that while deep networks can mimic some human language abilities when presented with novel sentences, systematic variation uncovers the limitations in the language-understanding abilities of networks. We demonstrate that these limitations can be overcome by addressing the generalization challenges in the gSCAN dataset, which explicitly measures how well an agent is able to interpret novel linguistic commands grounded in vision, e.g., novel pairings of adjectives and nouns. The key principle we employ is compositionality: that the compositional structure of networks should reflect the compositional structure of the problem domain they address, while allowing other parameters to be learned end-to-end. We build a general-purpose mechanism that enables agents to generalize their language understanding to compositional domains. Crucially, our network has the same state-of-the-art performance as prior work while generalizing its knowledge when prior work does not. Our network also provides a level of interpretability that enables users to inspect what each part of networks learns. Robust grounded language understanding without dramatic failures and without corner cases is critical to building safe and fair robots; we demonstrate the significant role that compositionality can play in achieving that goal.
△ Less
Submitted 19 October, 2021; v1 submitted 6 August, 2020;
originally announced August 2020.
-
Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas
Authors:
Yen-Ling Kuo,
Boris Katz,
Andrei Barbu
Abstract:
We demonstrate a reinforcement learning agent which uses a compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions. The input LTL formulas have never been seen before, yet the network performs zero-shot generalization to satisfy them. This is a novel form of multi-task learning for RL agents where agents learn from one diverse set of tasks and ge…
▽ More
We demonstrate a reinforcement learning agent which uses a compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions. The input LTL formulas have never been seen before, yet the network performs zero-shot generalization to satisfy them. This is a novel form of multi-task learning for RL agents where agents learn from one diverse set of tasks and generalize to a new set of diverse tasks. The formulation of the network enables this capacity to generalize. We demonstrate this ability in two domains. In a symbolic domain, the agent finds a sequence of letters that is accepted. In a Minecraft-like environment, the agent finds a sequence of actions that conform to the formula. While prior work could learn to execute one formula reliably given examples of that formula, we demonstrate how to encode all formulas reliably. This could form the basis of new multitask agents that discover sub-tasks and execute them without any additional training, as well as the agents which follow more complex linguistic commands. The structures required for this generalization are specific to LTL formulas, which opens up an interesting theoretical question: what structures are required in neural networks for zero-shot generalization to different logics?
△ Less
Submitted 6 August, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Measuring Social Biases in Grounded Vision and Language Embeddings
Authors:
Candace Ross,
Boris Katz,
Andrei Barbu
Abstract:
We generalize the notion of social biases from language embeddings to grounded vision and language embeddings. Biases are present in grounded embeddings, and indeed seem to be equally or more significant than for ungrounded embeddings. This is despite the fact that vision and language can suffer from different biases, which one might hope could attenuate the biases in both. Multiple ways exist to…
▽ More
We generalize the notion of social biases from language embeddings to grounded vision and language embeddings. Biases are present in grounded embeddings, and indeed seem to be equally or more significant than for ungrounded embeddings. This is despite the fact that vision and language can suffer from different biases, which one might hope could attenuate the biases in both. Multiple ways exist to generalize metrics measuring bias in word embeddings to this new setting. We introduce the space of generalizations (Grounded-WEAT and Grounded-SEAT) and demonstrate that three generalizations answer different yet important questions about how biases, language, and vision interact. These metrics are used on a new dataset, the first for grounded bias, created by augmenting extending standard linguistic bias benchmarks with 10,228 images from COCO, Conceptual Captions, and Google Images. Dataset construction is challenging because vision datasets are themselves very biased. The presence of these biases in systems will begin to have real-world consequences as they are deployed, making carefully measuring bias and then mitigating it critical to building a fair society.
△ Less
Submitted 21 August, 2023; v1 submitted 20 February, 2020;
originally announced February 2020.
-
Deep compositional robotic planners that follow natural language commands
Authors:
Yen-Ling Kuo,
Boris Katz,
Andrei Barbu
Abstract:
We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipulate objects. Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes, with a sampling-based planner, RRT. A recur…
▽ More
We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipulate objects. Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes, with a sampling-based planner, RRT. A recurrent hierarchical deep network controls how the planner explores the environment, determines when a planned path is likely to achieve a goal, and estimates the confidence of each move to trade off exploitation and exploration between the network and the planner. Planners are designed to have near-optimal behavior when information about the task is missing, while networks learn to exploit observations which are available from the environment, making the two naturally complementary. Combining the two enables generalization to new maps, new kinds of obstacles, and more complex sentences that do not occur in the training set. Little data is required to train the model despite it jointly acquiring a CNN that extracts features from the environment as it learns the meanings of words. The model provides a level of interpretability through the use of attention maps allowing users to see its reasoning steps despite being an end-to-end model. This end-to-end model allows robots to learn to follow natural language commands in challenging continuous environments.
△ Less
Submitted 19 February, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
A study of local optima for learning feature interactions using neural networks
Authors:
Yangzi Guo,
Adrian Barbu
Abstract:
In many fields such as bioinformatics, high energy physics, power distribution, etc., it is desirable to learn non-linear models where a small number of variables are selected and the interaction between them is explicitly modeled to predict the response. In principle, neural networks (NNs) could accomplish this task since they can model non-linear feature interactions very well. However, NNs requ…
▽ More
In many fields such as bioinformatics, high energy physics, power distribution, etc., it is desirable to learn non-linear models where a small number of variables are selected and the interaction between them is explicitly modeled to predict the response. In principle, neural networks (NNs) could accomplish this task since they can model non-linear feature interactions very well. However, NNs require large amounts of training data to have a good generalization. In this paper we study the datastarved regime where a NN is trained on a relatively small amount of training data. For that purpose we study feature selection for NNs, which is known to improve generalization for linear models. As an extreme case of data with feature selection and feature interactions we study the XOR-like data with irrelevant variables. We experimentally observed that the cross-entropy loss function on XOR-like data has many non-equivalent local optima, and the number of local optima grows exponentially with the number of irrelevant variables. To deal with the local minima and for feature selection we propose a node pruning and feature selection algorithm that improves the capability of NNs to find better local minima even when there are irrelevant variables. Finally, we show that the performance of a NN on real datasets can be improved using pruning, obtaining compact networks on a small number of features, with good prediction and interpretability.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Neural Rule Ensembles: Encoding Sparse Feature Interactions into Neural Networks
Authors:
Gitesh Dawer,
Yangzi Guo,
Sida Liu,
Adrian Barbu
Abstract:
Artificial Neural Networks form the basis of very powerful learning methods. It has been observed that a naive application of fully connected neural networks to data with many irrelevant variables often leads to overfitting. In an attempt to circumvent this issue, a prior knowledge pertaining to what features are relevant and their possible feature interactions can be encoded into these networks.…
▽ More
Artificial Neural Networks form the basis of very powerful learning methods. It has been observed that a naive application of fully connected neural networks to data with many irrelevant variables often leads to overfitting. In an attempt to circumvent this issue, a prior knowledge pertaining to what features are relevant and their possible feature interactions can be encoded into these networks. In this work, we use decision trees to capture such relevant features and their interactions and define a map** to encode extracted relationships into a neural network. This addresses the initialization related concern of fully connected neural networks. At the same time through feature selection it enables learning of compact representations compared to state of the art tree-based approaches. Empirical evaluations and simulation studies show the superiority of such an approach over fully connected neural networks and tree-based approaches
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
Network Pruning via Annealing and Direct Sparsity Control
Authors:
Yangzi Guo,
Yiyuan She,
Adrian Barbu
Abstract:
Artificial neural networks (ANNs) especially deep convolutional networks are very popular these days and have been proved to successfully offer quite reliable solutions to many vision problems. However, the use of deep neural networks is widely impeded by their intensive computational and memory cost. In this paper, we propose a novel efficient network pruning method that is suitable for both non-…
▽ More
Artificial neural networks (ANNs) especially deep convolutional networks are very popular these days and have been proved to successfully offer quite reliable solutions to many vision problems. However, the use of deep neural networks is widely impeded by their intensive computational and memory cost. In this paper, we propose a novel efficient network pruning method that is suitable for both non-structured and structured channel-level pruning. Our proposed method tightens a sparsity constraint by gradually removing network parameters or filter channels based on a criterion and a schedule. The attractive fact that the network size keeps drop** throughout the iterations makes it suitable for the pruning of any untrained or pre-trained network. Because our method uses a $L_0$ constraint instead of the $L_1$ penalty, it does not introduce any bias in the training parameters or filter channels. Furthermore, the $L_0$ constraint makes it easy to directly specify the desired sparsity level during the network pruning process. Finally, experimental validation on extensive synthetic and real vision datasets show that the proposed method obtains better or competitive performance compared to other states of art network pruning methods.
△ Less
Submitted 26 July, 2020; v1 submitted 11 February, 2020;
originally announced February 2020.
-
Playing Atari Ball Games with Hierarchical Reinforcement Learning
Authors:
Hua Huang,
Adrian Barbu
Abstract:
Human beings are particularly good at reasoning and inference from just a few examples. When facing new tasks, humans will leverage knowledge and skills learned before, and quickly integrate them with the new task. In addition to learning by experimentation, human also learn socio-culturally through instructions and learning by example. In this way humans can learn much faster compared with most c…
▽ More
Human beings are particularly good at reasoning and inference from just a few examples. When facing new tasks, humans will leverage knowledge and skills learned before, and quickly integrate them with the new task. In addition to learning by experimentation, human also learn socio-culturally through instructions and learning by example. In this way humans can learn much faster compared with most current artificial intelligence algorithms in many tasks. In this paper, we test the idea of speeding up machine learning through social learning. We argue that in solving real-world problems, especially when the task is designed by humans, and/or for humans, there are typically instructions from user manuals and/or human experts which give guidelines on how to better accomplish the tasks. We argue that these instructions have tremendous value in designing a reinforcement learning system which can learn in human fashion, and we test the idea by playing the Atari games Tennis and Pong. We experimentally demonstrate that the instructions provide key information about the task, which can be used to decompose the learning task into sub-systems and construct options for the temporally extended planning, and dramatically accelerate the learning process.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
The Generalization-Stability Tradeoff In Neural Network Pruning
Authors:
Brian R. Bartoldson,
Ari S. Morcos,
Adrian Barbu,
Gordon Erlebacher
Abstract:
Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting. This motivation is particularly relevant given the perhaps surprising observation that a wide variety of pruning approaches increase test accuracy despite sometimes massive reductions in parameter counts. To better understand this phenomenon, we…
▽ More
Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting. This motivation is particularly relevant given the perhaps surprising observation that a wide variety of pruning approaches increase test accuracy despite sometimes massive reductions in parameter counts. To better understand this phenomenon, we analyze the behavior of pruning over the course of training, finding that pruning's benefit to generalization increases with pruning's instability (defined as the drop in test accuracy immediately following pruning). We demonstrate that this "generalization-stability tradeoff" is present across a wide variety of pruning settings and propose a mechanism for its cause: pruning regularizes similarly to noise injection. Supporting this, we find less pruning stability leads to more model flatness and the benefits of pruning do not depend on permanent parameter removal. These results explain the compatibility of pruning-based generalization improvements and the high generalization recently observed in overparameterized networks.
△ Less
Submitted 22 October, 2020; v1 submitted 9 June, 2019;
originally announced June 2019.
-
Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context
Authors:
Rohan Paul,
Andrei Barbu,
Sue Felshin,
Boris Katz,
Nicholas Roy
Abstract:
A robot's ability to understand or ground natural language instructions is fundamentally tied to its knowledge about the surrounding world. We present an approach to grounding natural language utterances in the context of factual information gathered through natural-language interactions and past visual observations. A probabilistic model estimates, from a natural language utterance, the objects,r…
▽ More
A robot's ability to understand or ground natural language instructions is fundamentally tied to its knowledge about the surrounding world. We present an approach to grounding natural language utterances in the context of factual information gathered through natural-language interactions and past visual observations. A probabilistic model estimates, from a natural language utterance, the objects,relations, and actions that the utterance refers to, the objectives for future robotic actions it implies, and generates a plan to execute those actions while updating a state representation to include newly acquired knowledge from the visual-linguistic context. Grounding a command necessitates a representation for past observations and interactions; however, maintaining the full context consisting of all possible observed objects, attributes, spatial relations, actions, etc., over time is intractable. Instead, our model, Temporal Grounding Graphs, maintains a learned state representation for a belief over factual groundings, those derived from natural-language interactions, and lazily infers new groundings from visual observations using the context implied by the utterance. This work significantly expands the range of language that a robot can understand by incorporating factual knowledge and observations of its workspace in its inference about the meaning and grounding of natural-language utterances.
△ Less
Submitted 16 November, 2018;
originally announced November 2018.
-
Deep sequential models for sampling-based planning
Authors:
Yen-Ling Kuo,
Andrei Barbu,
Boris Katz
Abstract:
We demonstrate how a sequence model and a sampling-based planner can influence each other to produce efficient plans and how such a model can automatically learn to take advantage of observations of the environment. Sampling-based planners such as RRT generally know nothing of their environments even if they have traversed similar spaces many times. A sequence model, such as an HMM or LSTM, guides…
▽ More
We demonstrate how a sequence model and a sampling-based planner can influence each other to produce efficient plans and how such a model can automatically learn to take advantage of observations of the environment. Sampling-based planners such as RRT generally know nothing of their environments even if they have traversed similar spaces many times. A sequence model, such as an HMM or LSTM, guides the search for good paths. The resulting model, called DeRRT*, observes the state of the planner and the local environment to bias the next move and next planner state. The neural-network-based models avoid manual feature engineering by co-training a convolutional network which processes map features and observations from sensors. We incorporate this sequence model in a manner that combines its likelihood with the existing bias for searching large unexplored Voronoi regions. This leads to more efficient trajectories with fewer rejected samples even in difficult domains such as when esca** bug traps. This model can also be used for dimensionality reduction in multi-agent environments with dynamic obstacles. Instead of planning in a high-dimensional space that includes the configurations of the other agents, we plan in a low-dimensional subspace relying on the sequence model to bias samples using the observed behavior of the other agents. The techniques presented here are general, include both graphical models and deep learning approaches, and can be adapted to a range of planners.
△ Less
Submitted 1 October, 2018;
originally announced October 2018.
-
Are screening methods useful in feature selection? An empirical study
Authors:
Mingyuan Wang,
Adrian Barbu
Abstract:
Filter or screening methods are often used as a preprocessing step for reducing the number of variables used by a learning algorithm in obtaining a classification or regression model. While there are many such filter methods, there is a need for an objective evaluation of these methods. Such an evaluation is needed to compare them with each other and also to answer whether they are at all useful,…
▽ More
Filter or screening methods are often used as a preprocessing step for reducing the number of variables used by a learning algorithm in obtaining a classification or regression model. While there are many such filter methods, there is a need for an objective evaluation of these methods. Such an evaluation is needed to compare them with each other and also to answer whether they are at all useful, or a learning algorithm could do a better job without them. For this purpose, many popular screening methods are partnered in this paper with three regression learners and five classification learners and evaluated on ten real datasets to obtain accuracy criteria such as R-square and area under the ROC curve (AUC). The obtained results are compared through curve plots and comparison tables in order to find out whether screening methods help improve the performance of learning algorithms and how they fare with each other. Our findings revealed that the screening methods were useful in improving the prediction of the best learner on two regression and two classification datasets out of the ten datasets evaluated.
△ Less
Submitted 8 July, 2019; v1 submitted 14 September, 2018;
originally announced September 2018.
-
Enhancing the Regularization Effect of Weight Pruning in Artificial Neural Networks
Authors:
Brian Bartoldson,
Adrian Barbu,
Gordon Erlebacher
Abstract:
Artificial neural networks (ANNs) may not be worth their computational/memory costs when used in mobile phones or embedded devices. Parameter-pruning algorithms combat these costs, with some algorithms capable of removing over 90% of an ANN's weights without harming the ANN's performance. Removing weights from an ANN is a form of regularization, but existing pruning algorithms do not significantly…
▽ More
Artificial neural networks (ANNs) may not be worth their computational/memory costs when used in mobile phones or embedded devices. Parameter-pruning algorithms combat these costs, with some algorithms capable of removing over 90% of an ANN's weights without harming the ANN's performance. Removing weights from an ANN is a form of regularization, but existing pruning algorithms do not significantly improve generalization error. We show that pruning ANNs can improve generalization if pruning targets large weights instead of small weights. Applying our pruning algorithm to an ANN leads to a higher image classification accuracy on CIFAR-10 data than applying the popular regularizer dropout. The pruning couples this higher accuracy with an 85% reduction of the ANN's parameter count.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
Unsupervised Learning of GMM with a Uniform Background Component
Authors:
Sida Liu,
Adrian Barbu
Abstract:
Gaussian Mixture Models are one of the most studied and mature models in unsupervised learning. However, outliers are often present in the data and could influence the cluster estimation. In this paper, we study a new model that assumes that data comes from a mixture of a number of Gaussians as well as a uniform ``background'' component assumed to contain outliers and other non-interesting observa…
▽ More
Gaussian Mixture Models are one of the most studied and mature models in unsupervised learning. However, outliers are often present in the data and could influence the cluster estimation. In this paper, we study a new model that assumes that data comes from a mixture of a number of Gaussians as well as a uniform ``background'' component assumed to contain outliers and other non-interesting observations. We develop a novel method based on robust loss minimization that performs well in clustering such GMM with a uniform background. We give theoretical guarantees for our clustering algorithm to obtain best clustering results with high probability. Besides, we show that the result of our algorithm does not depend on initialization or local optima, and the parameter tuning is an easy task. By numeric simulations, we demonstrate that our algorithm enjoys high accuracy and achieves the best clustering results given a large enough sample size. Finally, experimental comparisons with typical clustering methods on real datasets witness the potential of our algorithm in real applications.
△ Less
Submitted 20 March, 2020; v1 submitted 8 April, 2018;
originally announced April 2018.
-
A Novel Framework for Online Supervised Learning with Feature Selection
Authors:
Lizhe Sun,
Mingyuan Wang,
Siquan Zhu,
Adrian Barbu
Abstract:
Current online learning methods suffer issues such as lower convergence rates and limited capability to select important features compared to their offline counterparts. In this paper, a novel framework for online learning based on running averages is proposed. Many popular offline regularized methods such as Lasso, Elastic Net, Minimax Concave Penalty (MCP), and Feature Selection with Annealing (…
▽ More
Current online learning methods suffer issues such as lower convergence rates and limited capability to select important features compared to their offline counterparts. In this paper, a novel framework for online learning based on running averages is proposed. Many popular offline regularized methods such as Lasso, Elastic Net, Minimax Concave Penalty (MCP), and Feature Selection with Annealing (FSA) have their online versions introduced in this framework. The equivalence between the proposed online methods and their offline counterparts is proved, and then novel theoretical true support recovery and convergence guarantees are provided for some of the methods in this framework. Numerical experiments indicate that the proposed methods enjoy high true support recovery accuracy and a faster convergence rate compared with conventional online and offline algorithms. Finally, applications to large datasets are presented, where again the proposed framework shows competitive results compared to popular online and offline algorithms.
△ Less
Submitted 19 May, 2024; v1 submitted 30 March, 2018;
originally announced March 2018.
-
Random Hinge Forest for Differentiable Learning
Authors:
Nathan Lay,
Adam P. Harrison,
Sharon Schreiber,
Gitesh Dawer,
Adrian Barbu
Abstract:
We propose random hinge forests, a simple, efficient, and novel variant of decision forests. Importantly, random hinge forests can be readily incorporated as a general component within arbitrary computation graphs that are optimized end-to-end with stochastic gradient descent or variants thereof. We derive random hinge forest and ferns, focusing on their sparse and efficient nature, their min-max…
▽ More
We propose random hinge forests, a simple, efficient, and novel variant of decision forests. Importantly, random hinge forests can be readily incorporated as a general component within arbitrary computation graphs that are optimized end-to-end with stochastic gradient descent or variants thereof. We derive random hinge forest and ferns, focusing on their sparse and efficient nature, their min-max margin property, strategies to initialize them for arbitrary network architectures, and the class of optimizers most suitable for optimizing random hinge forest. The performance and versatility of random hinge forests are demonstrated by experiments incorporating a variety of of small and large UCI machine learning data sets and also ones involving the MNIST, Letter, and USPS image datasets. We compare random hinge forests with random forests and the more recent backpropagating deep neural decision forests.
△ Less
Submitted 1 March, 2018; v1 submitted 11 February, 2018;
originally announced February 2018.
-
Generating Compact Tree Ensembles via Annealing
Authors:
Gitesh Dawer,
Yangzi Guo,
Adrian Barbu
Abstract:
Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is op…
▽ More
Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a more complex and less interpretable model. In this paper we present a novel method for obtaining compact tree ensembles by growing a large pool of trees in parallel with many independent boosting threads and then selecting a small subset and updating their leaf weights by loss optimization. We allow for the trees in the initial pool to have different depths which further helps with generalization. Experiments on real datasets show that the obtained model has usually a smaller loss than boosting, which is also reflected in a lower misclassification error on the test set.
△ Less
Submitted 19 February, 2020; v1 submitted 16 September, 2017;
originally announced September 2017.
-
Parameterized Principal Component Analysis
Authors:
Ajay Gupta,
Adrian Barbu
Abstract:
When modeling multivariate data, one might have an extra parameter of contextual information that could be used to treat some observations as more similar to others. For example, images of faces can vary by age, and one would expect the face of a 40 year old to be more similar to the face of a 30 year old than to a baby face. We introduce a novel manifold approximation method, parameterized princi…
▽ More
When modeling multivariate data, one might have an extra parameter of contextual information that could be used to treat some observations as more similar to others. For example, images of faces can vary by age, and one would expect the face of a 40 year old to be more similar to the face of a 30 year old than to a baby face. We introduce a novel manifold approximation method, parameterized principal component analysis (PPCA) that models data with linear subspaces that change continuously according to the extra parameter of contextual information (e.g. age), instead of ad-hoc atlases. Special care has been taken in the loss function and the optimization method to encourage smoothly changing subspaces across the parameter values. The approach ensures that each observation's projection will share information with observations that have similar parameter values, but not with observations that have large parameter differences. We tested PPCA on artificial data based on known, smooth functions of an added parameter, as well as on three real datasets with different types of parameters. We compared PPCA to PCA, sparse PCA and to independent principal component analysis (IPCA), which groups observations by their parameter values and projects each group using PCA with no sharing of information for different groups. PPCA recovers the known functions with less error and projects the datasets' test set observations with consistently less reconstruction error than IPCA does. In some cases where the manifold is truly nonlinear, PCA outperforms all the other manifold approximation methods compared.
△ Less
Submitted 2 May, 2017; v1 submitted 16 August, 2016;
originally announced August 2016.
-
Anchoring and Agreement in Syntactic Annotations
Authors:
Yevgeni Berzak,
Yan Huang,
Andrei Barbu,
Anna Korhonen,
Boris Katz
Abstract:
We present a study on two key characteristics of human syntactic annotations: anchoring and agreement. Anchoring is a well known cognitive bias in human decision making, where judgments are drawn towards pre-existing values. We study the influence of anchoring on a standard approach to creation of syntactic resources where syntactic annotations are obtained via human editing of tagger and parser o…
▽ More
We present a study on two key characteristics of human syntactic annotations: anchoring and agreement. Anchoring is a well known cognitive bias in human decision making, where judgments are drawn towards pre-existing values. We study the influence of anchoring on a standard approach to creation of syntactic resources where syntactic annotations are obtained via human editing of tagger and parser output. Our experiments demonstrate a clear anchoring effect and reveal unwanted consequences, including overestimation of parsing performance and lower quality of annotations in comparison with human-based annotations. Using sentences from the Penn Treebank WSJ, we also report systematically obtained inter-annotator agreement estimates for English dependency parsing. Our agreement results control for parser bias, and are consequential in that they are on par with state of the art parsing performance for English newswire. We discuss the impact of our findings on strategies for future annotation efforts and parser evaluations.
△ Less
Submitted 21 September, 2016; v1 submitted 14 May, 2016;
originally announced May 2016.
-
Do You See What I Mean? Visual Resolution of Linguistic Ambiguities
Authors:
Yevgeni Berzak,
Andrei Barbu,
Daniel Harari,
Boris Katz,
Shimon Ullman
Abstract:
Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, r…
▽ More
Understanding language goes hand in hand with the ability to integrate complex contextual information obtained via perception. In this work, we present a novel task for grounded language understanding: disambiguating a sentence given a visual scene which depicts one of the possible interpretations of that sentence. To this end, we introduce a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence. We address this task by extending a vision model which determines if a sentence is depicted by a video. We demonstrate how such a model can be adjusted to recognize different interpretations of the same underlying sentence, allowing to disambiguate sentences in a unified fashion across the different ambiguity types.
△ Less
Submitted 26 March, 2016;
originally announced March 2016.
-
RENOIR - A Dataset for Real Low-Light Image Noise Reduction
Authors:
Josue Anaya,
Adrian Barbu
Abstract:
Image denoising algorithms are evaluated using images corrupted by artificial noise, which may lead to incorrect conclusions about their performances on real noise. In this paper we introduce a dataset of color images corrupted by natural noise due to low-light conditions, together with spatially and intensity-aligned low noise images of the same scenes. We also introduce a method for estimating t…
▽ More
Image denoising algorithms are evaluated using images corrupted by artificial noise, which may lead to incorrect conclusions about their performances on real noise. In this paper we introduce a dataset of color images corrupted by natural noise due to low-light conditions, together with spatially and intensity-aligned low noise images of the same scenes. We also introduce a method for estimating the true noise level in our images, since even the low noise images contain small amounts of noise. We evaluate the accuracy of our noise estimation method on real and artificial noise, and investigate the Poisson-Gaussian noise model. Finally, we use our dataset to evaluate six denoising algorithms: Active Random Field, BM3D, Bilevel-MRF, Multi-Layer Perceptron, and two versions of NL-means. We show that while the Multi-Layer Perceptron, Bilevel-MRF, and NL-means with soft threshold outperform BM3D on gray images with synthetic noise, they lag behind on our dataset.
△ Less
Submitted 8 May, 2017; v1 submitted 29 September, 2014;
originally announced September 2014.
-
Video In Sentences Out
Authors:
Andrei Barbu,
Alexander Bridge,
Zachary Burchill,
Dan Coroian,
Sven Dickinson,
Sanja Fidler,
Aaron Michaux,
Sam Mussman,
Siddharth Narayanaswamy,
Dhaval Salvi,
Lara Schmidt,
Jiangnan Shangguan,
Jeffrey Mark Siskind,
Jarrell Waggoner,
Song Wang,
**lian Wei,
Yifan Yin,
Zhiqi Zhang
Abstract:
We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases, spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adj…
▽ More
We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases, spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the trackto-role assignments, and changing body posture.
△ Less
Submitted 9 August, 2014;
originally announced August 2014.
-
Face Detection with a 3D Model
Authors:
Adrian Barbu,
Nathan Lay,
Gary Gramajo
Abstract:
This paper presents a part-based face detection approach where the spatial relationship between the face parts is represented by a hidden 3D model with six parameters. The computational complexity of the search in the six dimensional pose space is addressed by proposing meaningful 3D pose candidates by image-based regression from detected face keypoint locations. The 3D pose candidates are evaluat…
▽ More
This paper presents a part-based face detection approach where the spatial relationship between the face parts is represented by a hidden 3D model with six parameters. The computational complexity of the search in the six dimensional pose space is addressed by proposing meaningful 3D pose candidates by image-based regression from detected face keypoint locations. The 3D pose candidates are evaluated using a parameter sensitive classifier based on difference features relative to the 3D pose. A compatible subset of candidates is then obtained by non-maximal suppression. Experiments on two standard face detection datasets show that the proposed 3D model based approach obtains results comparable to or better than state of the art.
△ Less
Submitted 3 November, 2015; v1 submitted 14 April, 2014;
originally announced April 2014.
-
Feature Selection with Annealing for Computer Vision and Big Data Learning
Authors:
Adrian Barbu,
Yiyuan She,
Liang**g Ding,
Gary Gramajo
Abstract:
Many computer vision and medical imaging problems are faced with learning from large-scale datasets, with millions of observations and features. In this paper we propose a novel efficient learning scheme that tightens a sparsity constraint by gradually removing variables based on a criterion and a schedule. The attractive fact that the problem size keeps drop** throughout the iterations makes it…
▽ More
Many computer vision and medical imaging problems are faced with learning from large-scale datasets, with millions of observations and features. In this paper we propose a novel efficient learning scheme that tightens a sparsity constraint by gradually removing variables based on a criterion and a schedule. The attractive fact that the problem size keeps drop** throughout the iterations makes it particularly suitable for big data learning. Our approach applies generically to the optimization of any differentiable loss function, and finds applications in regression, classification and ranking. The resultant algorithms build variable screening into estimation and are extremely simple to implement. We provide theoretical guarantees of convergence and selection consistency. In addition, one dimensional piecewise linear response functions are used to account for nonlinearity and a second order prior is imposed on these functions to avoid overfitting. Experiments on real and synthetic data show that the proposed method compares very well with other state of the art methods in regression, classification and ranking while being computationally very efficient and scalable.
△ Less
Submitted 17 March, 2016; v1 submitted 10 October, 2013;
originally announced October 2013.
-
Saying What You're Looking For: Linguistics Meets Video Search
Authors:
Andrei Barbu,
N. Siddharth,
Jeffrey Mark Siskind
Abstract:
We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identical words but entirely different meaning: "The person rode the horse} vs. \emph{The horse rode the per…
▽ More
We present an approach to searching large video corpora for video clips which depict a natural-language query in the form of a sentence. This approach uses compositional semantics to encode subtle meaning that is lost in other systems, such as the difference between two sentences which have identical words but entirely different meaning: "The person rode the horse} vs. \emph{The horse rode the person". Given a video-sentence pair and a natural-language parser, along with a grammar that describes the space of sentential queries, we produce a score which indicates how well the video depicts the sentence. We produce such a score for each video clip in a corpus and return a ranked list of clips. Furthermore, this approach addresses two fundamental problems simultaneously: detecting and tracking objects, and recognizing whether those tracks depict the query. Because both tracking and object detection are unreliable, this uses knowledge about the intended sentential query to focus the tracker on the relevant participants and ensures that the resulting tracks are described by the sentential query. While earlier work was limited to single-word queries which correspond to either verbs or nouns, we show how one can search for complex queries which contain multiple phrases, such as prepositional phrases, and modifiers, such as adverbs. We demonstrate this approach by searching for 141 queries involving people and horses interacting with each other in 10 full-length Hollywood movies.
△ Less
Submitted 20 September, 2013;
originally announced September 2013.
-
Seeing What You're Told: Sentence-Guided Activity Recognition In Video
Authors:
N. Siddharth,
Andrei Barbu,
Jeffrey Mark Siskind
Abstract:
We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, thereby providing a medium, not only for top-down and bottom-up integration, but also for multi-modal integration between vision and language. We show how the roles played by part…
▽ More
We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, thereby providing a medium, not only for top-down and bottom-up integration, but also for multi-modal integration between vision and language. We show how the roles played by participants (nouns), their characteristics (adjectives), the actions performed (verbs), the manner of such actions (adverbs), and changing spatial relations between participants (prepositions) in the form of whole sentential descriptions mediated by a grammar, guides the activity-recognition process. Further, the utility and expressiveness of our framework is demonstrated by performing three separate tasks in the domain of multi-activity videos: sentence-guided focus of attention, generation of sentential descriptions of video, and query-based video search, simply by leveraging the framework in different manners.
△ Less
Submitted 28 May, 2014; v1 submitted 19 August, 2013;
originally announced August 2013.
-
Large-Scale Automatic Labeling of Video Events with Verbs Based on Event-Participant Interaction
Authors:
Andrei Barbu,
Alexander Bridge,
Dan Coroian,
Sven Dickinson,
Sam Mussman,
Siddharth Narayanaswamy,
Dhaval Salvi,
Lara Schmidt,
Jiangnan Shangguan,
Jeffrey Mark Siskind,
Jarrell Waggoner,
Song Wang,
**lian Wei,
Yifan Yin,
Zhiqi Zhang
Abstract:
We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstracting away all object-class information and fine-grained image characteristics, and relying solely on…
▽ More
We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstracting away all object-class information and fine-grained image characteristics, and relying solely on the coarse-grained motion of the event participants. We apply our approach to a large set of 22 distinct verb classes and a corpus of 2,584 videos, yielding two surprising outcomes. First, a classification accuracy of greater than 70% on a 1-out-of-22 labeling task and greater than 85% on a variety of 1-out-of-10 subsets of this labeling task is independent of the choice of which of two different time-series classifiers we employ. Second, we achieve this level of accuracy using a highly impoverished intermediate representation consisting solely of the bounding boxes of one or two event participants as a function of time. This indicates that successful event recognition depends more on the choice of appropriate features that characterize the linguistic invariants of the event classes than on the particular classifier algorithms.
△ Less
Submitted 16 April, 2012;
originally announced April 2012.
-
Seeing Unseeability to See the Unseeable
Authors:
Siddharth Narayanaswamy,
Andrei Barbu,
Jeffrey Mark Siskind
Abstract:
We present a framework that allows an observer to determine occluded portions of a structure by finding the maximum-likelihood estimate of those occluded portions consistent with visible image evidence and a consistency model. Doing this requires determining which portions of the structure are occluded in the first place. Since each process relies on the other, we determine a solution to both prob…
▽ More
We present a framework that allows an observer to determine occluded portions of a structure by finding the maximum-likelihood estimate of those occluded portions consistent with visible image evidence and a consistency model. Doing this requires determining which portions of the structure are occluded in the first place. Since each process relies on the other, we determine a solution to both problems in tandem. We extend our framework to determine confidence of one's assessment of which portions of an observed structure are occluded, and the estimate of that occluded structure, by determining the sensitivity of one's assessment to potential new observations. We further extend our framework to determine a robotic action whose execution would allow a new observation that would maximally increase one's confidence.
△ Less
Submitted 12 April, 2012;
originally announced April 2012.
-
Video In Sentences Out
Authors:
Andrei Barbu,
Alexander Bridge,
Zachary Burchill,
Dan Coroian,
Sven Dickinson,
Sanja Fidler,
Aaron Michaux,
Sam Mussman,
Siddharth Narayanaswamy,
Dhaval Salvi,
Lara Schmidt,
Jiangnan Shangguan,
Jeffrey Mark Siskind,
Jarrell Waggoner,
Song Wang,
**lian Wei,
Yifan Yin,
Zhiqi Zhang
Abstract:
We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases,spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adju…
▽ More
We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases,spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the track-to-role assignments, and changing body posture.
△ Less
Submitted 12 April, 2012;
originally announced April 2012.
-
Simultaneous Object Detection, Tracking, and Event Recognition
Authors:
Andrei Barbu,
Aaron Michaux,
Siddharth Narayanaswamy,
Jeffrey Mark Siskind
Abstract:
The common internal structure and algorithmic organization of object detection, detection-based tracking, and event recognition facilitates a general approach to integrating these three components. This supports multidirectional information flow between these components allowing object detection to influence tracking and event recognition and event recognition to influence tracking and object dete…
▽ More
The common internal structure and algorithmic organization of object detection, detection-based tracking, and event recognition facilitates a general approach to integrating these three components. This supports multidirectional information flow between these components allowing object detection to influence tracking and event recognition and event recognition to influence tracking and object detection. The performance of the combination can exceed the performance of the components in isolation. This can be done with linear asymptotic complexity.
△ Less
Submitted 12 April, 2012;
originally announced April 2012.