-
Min-Max Bilevel Multi-objective Optimization with Applications in Machine Learning
Authors:
Alex Gu,
Songtao Lu,
Parikshit Ram,
Lily Weng
Abstract:
We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. We design MORBiT, a novel single-loop gradient descent-ascent bilevel optimization algorithm, to solve the generic problem and present a novel analysis showing that MORBiT converges to the first-order stationary poi…
▽ More
We consider a generic min-max multi-objective bilevel optimization problem with applications in robust machine learning such as representation learning and hyperparameter optimization. We design MORBiT, a novel single-loop gradient descent-ascent bilevel optimization algorithm, to solve the generic problem and present a novel analysis showing that MORBiT converges to the first-order stationary point at a rate of $\widetilde{\mathcal{O}}(n^{1/2} K^{-2/5})$ for a class of weakly convex problems with $n$ objectives upon $K$ iterations of the algorithm. Our analysis utilizes novel results to handle the non-smooth min-max multi-objective setup and to obtain a sublinear dependence in the number of objectives $n$. Experimental results on robust representation learning and robust hyperparameter optimization showcase (i) the advantages of considering the min-max multi-objective setup, and (ii) convergence properties of the proposed MORBiT. Our code is at https://github.com/minimario/MORBiT.
△ Less
Submitted 7 March, 2023; v1 submitted 3 March, 2022;
originally announced March 2022.
-
It's Raw! Audio Generation with State-Space Models
Authors:
Karan Goel,
Albert Gu,
Chris Donahue,
Christopher Ré
Abstract:
Develo** architectures suitable for modeling raw audio is a challenging problem due to the high sampling rates of audio waveforms. Standard sequence modeling approaches like RNNs and CNNs have previously been tailored to fit the demands of audio, but the resultant architectures make undesirable computational tradeoffs and struggle to model waveforms effectively. We propose SaShiMi, a new multi-s…
▽ More
Develo** architectures suitable for modeling raw audio is a challenging problem due to the high sampling rates of audio waveforms. Standard sequence modeling approaches like RNNs and CNNs have previously been tailored to fit the demands of audio, but the resultant architectures make undesirable computational tradeoffs and struggle to model waveforms effectively. We propose SaShiMi, a new multi-scale architecture for waveform modeling built around the recently introduced S4 model for long sequence modeling. We identify that S4 can be unstable during autoregressive generation, and provide a simple improvement to its parameterization by drawing connections to Hurwitz matrices. SaShiMi yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting. Additionally, SaShiMi improves non-autoregressive generation performance when used as the backbone architecture for a diffusion model. Compared to prior architectures in the autoregressive generation setting, SaShiMi generates piano and speech waveforms which humans find more musical and coherent respectively, e.g. 2x better mean opinion scores than WaveNet on an unconditional speech generation task. On a music generation task, SaShiMi outperforms WaveNet on density estimation and speed at both training and inference even when using 3x fewer parameters. Code can be found at https://github.com/HazyResearch/state-spaces and samples at https://hazyresearch.stanford.edu/sashimi-examples.
△ Less
Submitted 19 February, 2022;
originally announced February 2022.
-
GIGA-Lens: Fast Bayesian Inference for Strong Gravitational Lens Modeling
Authors:
A. Gu,
X. Huang,
W. Sheu,
G. Aldering,
A. S. Bolton,
K. Boone,
A. Dey,
A. Filipp,
E. Jullo,
S. Perlmutter,
D. Rubin,
E. F. Schlafly,
D. J. Schlegel,
Y. Shu,
S. H. Suyu
Abstract:
We present GIGA-Lens: a gradient-informed, GPU-accelerated Bayesian framework for modeling strong gravitational lensing systems, implemented in TensorFlow and JAX. The three components, optimization using multi-start gradient descent, posterior covariance estimation with variational inference, and sampling via Hamiltonian Monte Carlo, all take advantage of gradient information through automatic di…
▽ More
We present GIGA-Lens: a gradient-informed, GPU-accelerated Bayesian framework for modeling strong gravitational lensing systems, implemented in TensorFlow and JAX. The three components, optimization using multi-start gradient descent, posterior covariance estimation with variational inference, and sampling via Hamiltonian Monte Carlo, all take advantage of gradient information through automatic differentiation and massive parallelization on graphics processing units (GPUs). We test our pipeline on a large set of simulated systems and demonstrate in detail its high level of performance. The average time to model a single system on four Nvidia A100 GPUs is 105 seconds. The robustness, speed, and scalability offered by this framework make it possible to model the large number of strong lenses found in current surveys and present a very promising prospect for the modeling of $\mathcal{O}(10^5)$ lensing systems expected to be discovered in the era of the Vera C. Rubin Observatory, Euclid, and the Nancy Grace Roman Space Telescope.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
Authors:
Satyapriya Krishna,
Tessa Han,
Alex Gu,
Javin Pombra,
Shahin Jabbari,
Steven Wu,
Himabindu Lakkaraju
Abstract:
As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questio…
▽ More
As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we introduce and study the disagreement problem in explainable machine learning. More specifically, we formalize the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how do practitioners resolve these disagreements. To this end, we first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative framework to formalize this understanding. We then leverage this framework to carry out a rigorous empirical analysis with four real-world datasets, six state-of-the-art post hoc explanation methods, and eight different predictive models, to measure the extent of disagreement between the explanations generated by various popular explanation methods. In addition, we carry out an online user study with data scientists to understand how they resolve the aforementioned disagreements. Our results indicate that state-of-the-art explanation methods often disagree in terms of the explanations they output. Our findings also underscore the importance of develo** principled evaluation metrics that enable practitioners to effectively compare explanations.
△ Less
Submitted 8 February, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
Efficiently Modeling Long Sequences with Structured State Spaces
Authors:
Albert Gu,
Karan Goel,
Christopher Ré
Abstract:
A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of $10000$ or more steps. A promis…
▽ More
A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies. Although conventional models including RNNs, CNNs, and Transformers have specialized variants for capturing long dependencies, they still struggle to scale to very long sequences of $10000$ or more steps. A promising recent approach proposed modeling sequences by simulating the fundamental state space model (SSM) \( x'(t) = Ax(t) + Bu(t), y(t) = Cx(t) + Du(t) \), and showed that for appropriate choices of the state matrix \( A \), this system could handle long-range dependencies mathematically and empirically. However, this method has prohibitive computation and memory requirements, rendering it infeasible as a general sequence modeling solution. We propose the Structured State Space sequence model (S4) based on a new parameterization for the SSM, and show that it can be computed much more efficiently than prior approaches while preserving their theoretical strengths. Our technique involves conditioning \( A \) with a low-rank correction, allowing it to be diagonalized stably and reducing the SSM to the well-studied computation of a Cauchy kernel. S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91\% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet, (ii) substantially closing the gap to Transformers on image and language modeling tasks, while performing generation $60\times$ faster (iii) SoTA on every task from the Long Range Arena benchmark, including solving the challenging Path-X task of length 16k that all prior work fails on, while being as efficient as all competitors.
△ Less
Submitted 5 August, 2022; v1 submitted 30 October, 2021;
originally announced November 2021.
-
Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers
Authors:
Albert Gu,
Isys Johnson,
Karan Goel,
Khaled Saab,
Tri Dao,
Atri Rudra,
Christopher Ré
Abstract:
Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear…
▽ More
Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear State-Space Layer (LSSL) maps a sequence $u \mapsto y$ by simply simulating a linear continuous-time state-space representation $\dot{x} = Ax + Bu, y = Cx + Du$. Theoretically, we show that LSSL models are closely related to the three aforementioned families of models and inherit their strengths. For example, they generalize convolutions to continuous-time, explain common RNN heuristics, and share features of NDEs such as time-scale adaptation. We then incorporate and generalize recent theory on continuous-time memorization to introduce a trainable subset of structured matrices $A$ that endow LSSLs with long-range memory. Empirically, stacking LSSL layers into a simple deep neural network obtains state-of-the-art results across time series benchmarks for long dependencies in sequential image classification, real-world healthcare regression tasks, and speech. On a difficult speech classification task with length-16000 sequences, LSSL outperforms prior approaches by 24 accuracy points, and even outperforms baselines that use hand-crafted features on 100x shorter sequences.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates
Authors:
Alp Yurtsever,
Alex Gu,
Suvrit Sra
Abstract:
Three Operator Splitting (TOS) (Davis & Yin, 2017) can minimize the sum of multiple convex functions effectively when an efficient gradient oracle or proximal operator is available for each term. This requirement often fails in machine learning applications: (i) instead of full gradients only stochastic gradients may be available; and (ii) instead of proximal operators, using subgradients to handl…
▽ More
Three Operator Splitting (TOS) (Davis & Yin, 2017) can minimize the sum of multiple convex functions effectively when an efficient gradient oracle or proximal operator is available for each term. This requirement often fails in machine learning applications: (i) instead of full gradients only stochastic gradients may be available; and (ii) instead of proximal operators, using subgradients to handle complex penalty functions may be more efficient and realistic. Motivated by these concerns, we analyze three potentially valuable extensions of TOS. The first two permit using subgradients and stochastic gradients, and are shown to ensure a $\mathcal{O}(1/\sqrt{t})$ convergence rate. The third extension AdapTOS endows TOS with adaptive step-sizes. For the important setting of optimizing a convex loss over the intersection of convex sets AdapTOS attains universal convergence rates, i.e., the rate adapts to the unknown smoothness degree of the objective. We compare our proposed methods with competing methods on various applications.
△ Less
Submitted 18 February, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Adaptive shot allocation for fast convergence in variational quantum algorithms
Authors:
Andi Gu,
Angus Lowe,
Pavel A. Dub,
Patrick J. Coles,
Andrew Arrasmith
Abstract:
Variational Quantum Algorithms (VQAs) are a promising approach for practical applications like chemistry and materials science on near-term quantum computers as they typically reduce quantum resource requirements. However, in order to implement VQAs, an efficient classical optimization strategy is required. Here we present a new stochastic gradient descent method using an adaptive number of shots…
▽ More
Variational Quantum Algorithms (VQAs) are a promising approach for practical applications like chemistry and materials science on near-term quantum computers as they typically reduce quantum resource requirements. However, in order to implement VQAs, an efficient classical optimization strategy is required. Here we present a new stochastic gradient descent method using an adaptive number of shots at each step, called the global Coupled Adaptive Number of Shots (gCANS) method, which improves on prior art in both the number of iterations as well as the number of shots required. These improvements reduce both the time and money required to run VQAs on current cloud platforms. We analytically prove that in a convex setting gCANS achieves geometric convergence to the optimum. Further, we numerically investigate the performance of gCANS on some chemical configuration problems. We also consider finding the ground state for an Ising model with different numbers of spins to examine the scaling of the method. We find that for these problems, gCANS compares favorably to all of the other optimizers we consider.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections
Authors:
Ines Chami,
Albert Gu,
Dat Nguyen,
Christopher Ré
Abstract:
This paper studies Principal Component Analysis (PCA) for data lying in hyperbolic spaces. Given directions, PCA relies on: (1) a parameterization of subspaces spanned by these directions, (2) a method of projection onto subspaces that preserves information in these directions, and (3) an objective to optimize, namely the variance explained by projections. We generalize each of these concepts to t…
▽ More
This paper studies Principal Component Analysis (PCA) for data lying in hyperbolic spaces. Given directions, PCA relies on: (1) a parameterization of subspaces spanned by these directions, (2) a method of projection onto subspaces that preserves information in these directions, and (3) an objective to optimize, namely the variance explained by projections. We generalize each of these concepts to the hyperbolic space and propose HoroPCA, a method for hyperbolic dimensionality reduction. By focusing on the core problem of extracting principal directions, HoroPCA theoretically better preserves information in the original data such as distances, compared to previous generalizations of PCA. Empirically, we validate that HoroPCA outperforms existing dimensionality reduction methods, significantly reducing error in distance preservation. As a data whitening method, it improves downstream classification by up to 3.9% compared to methods that don't use whitening. Finally, we show that HoroPCA can be used to visualize hyperbolic data in two dimensions.
△ Less
Submitted 6 June, 2021;
originally announced June 2021.
-
k-Mixup Regularization for Deep Learning via Optimal Transport
Authors:
Kristjan Greenewald,
Anming Gu,
Mikhail Yurochkin,
Justin Solomon,
Edward Chien
Abstract:
Mixup is a popular regularization technique for training deep neural networks that improves generalization and increases robustness to certain distribution shifts. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup in a simple, broadly applicable way to \emph{$k$-mixup}, which pertur…
▽ More
Mixup is a popular regularization technique for training deep neural networks that improves generalization and increases robustness to certain distribution shifts. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup in a simple, broadly applicable way to \emph{$k$-mixup}, which perturbs $k$-batches of training points in the direction of other $k$-batches. The perturbation is done with displacement interpolation, i.e. interpolation under the Wasserstein metric. We demonstrate theoretically and in simulations that $k$-mixup preserves cluster and manifold structures, and we extend theory studying the efficacy of standard mixup to the $k$-mixup case. Our empirical results show that training with $k$-mixup further improves generalization and robustness across several network architectures and benchmark datasets of differing modalities. For the wide variety of real datasets considered, the performance gains of $k$-mixup over standard mixup are similar to or larger than the gains of mixup itself over standard ERM after hyperparameter optimization. In several instances, in fact, $k$-mixup achieves gains in settings where standard mixup has negligible to zero improvement over ERM.
△ Less
Submitted 7 October, 2023; v1 submitted 5 June, 2021;
originally announced June 2021.
-
Deep Transfer Learning for Infectious Disease Case Detection Using Electronic Medical Records
Authors:
Ye Ye,
Andrew Gu
Abstract:
During an infectious disease pandemic, it is critical to share electronic medical records or models (learned from these records) across regions. Applying one region's data/model to another region often have distribution shift issues that violate the assumptions of traditional machine learning techniques. Transfer learning can be a solution. To explore the potential of deep transfer learning algori…
▽ More
During an infectious disease pandemic, it is critical to share electronic medical records or models (learned from these records) across regions. Applying one region's data/model to another region often have distribution shift issues that violate the assumptions of traditional machine learning techniques. Transfer learning can be a solution. To explore the potential of deep transfer learning algorithms, we applied two data-based algorithms (domain adversarial neural networks and maximum classifier discrepancy) and model-based transfer learning algorithms to infectious disease detection tasks. We further studied well-defined synthetic scenarios where the data distribution differences between two regions are known. Our experiments show that, in the context of infectious disease classification, transfer learning may be useful when (1) the source and target are similar and the target training data is insufficient and (2) the target training data does not have labels. Model-based transfer learning works well in the first situation, in which case the performance closely matched that of the data-based transfer learning models. Still, further investigation of the domain shift in real world research data to account for the drop in performance is needed.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Reproducibility Report: La-MAML: Look-ahead Meta Learning for Continual Learning
Authors:
Joel Joseph,
Alex Gu
Abstract:
The Continual Learning (CL) problem involves performing well on a sequence of tasks under limited compute. Current algorithms in the domain are either slow, offline or sensitive to hyper-parameters. La-MAML, an optimization-based meta-learning algorithm claims to be better than other replay-based, prior-based and meta-learning based approaches. According to the MER paper [1], metrics to measure pe…
▽ More
The Continual Learning (CL) problem involves performing well on a sequence of tasks under limited compute. Current algorithms in the domain are either slow, offline or sensitive to hyper-parameters. La-MAML, an optimization-based meta-learning algorithm claims to be better than other replay-based, prior-based and meta-learning based approaches. According to the MER paper [1], metrics to measure performance in the continual learning arena are Retained Accuracy (RA) and Backward Transfer-Interference (BTI). La-MAML claims to perform better in these values when compared to the SOTA in the domain. This is the main claim of the paper, which we shall be verifying in this report.
△ Less
Submitted 20 May, 2021; v1 submitted 10 February, 2021;
originally announced February 2021.
-
U-LanD: Uncertainty-Driven Video Landmark Detection
Authors:
Mohammad H. Jafari,
Christina Luong,
Michael Tsang,
Ang Nan Gu,
Nathan Van Woudenberg,
Robert Rohling,
Teresa Tsang,
Purang Abolmaesumi
Abstract:
This paper presents U-LanD, a framework for joint detection of key frames and landmarks in videos. We tackle a specifically challenging problem, where training labels are noisy and highly sparse. U-LanD builds upon a pivotal observation: a deep Bayesian landmark detector solely trained on key video frames, has significantly lower predictive uncertainty on those frames vs. other frames in videos. W…
▽ More
This paper presents U-LanD, a framework for joint detection of key frames and landmarks in videos. We tackle a specifically challenging problem, where training labels are noisy and highly sparse. U-LanD builds upon a pivotal observation: a deep Bayesian landmark detector solely trained on key video frames, has significantly lower predictive uncertainty on those frames vs. other frames in videos. We use this observation as an unsupervised signal to automatically recognize key frames on which we detect landmarks. As a test-bed for our framework, we use ultrasound imaging videos of the heart, where sparse and noisy clinical labels are only available for a single frame in each video. Using data from 4,493 patients, we demonstrate that U-LanD can exceedingly outperform the state-of-the-art non-Bayesian counterpart by a noticeable absolute margin of 42% in R2 score, with almost no overhead imposed on the model size. Our approach is generic and can be potentially applied to other challenging data with noisy and sparse training labels.
△ Less
Submitted 2 February, 2021;
originally announced February 2021.
-
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
Authors:
Tri Dao,
Nimit S. Sohoni,
Albert Gu,
Matthew Eichhorn,
Amit Blonder,
Megan Leszczynski,
Atri Rudra,
Christopher Ré
Abstract:
Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps. However, choosing which of the myriad structured transformations to use (and its associated parameterization) is a laborious task that requires trading off…
▽ More
Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps. However, choosing which of the myriad structured transformations to use (and its associated parameterization) is a laborious task that requires trading off speed, space, and accuracy. We consider a different approach: we introduce a family of matrices called kaleidoscope matrices (K-matrices) that provably capture any structured matrix with near-optimal space (parameter) and time (arithmetic operation) complexity. We empirically validate that K-matrices can be automatically learned within end-to-end pipelines to replace hand-crafted procedures, in order to improve model quality. For example, replacing channel shuffles in ShuffleNet improves classification accuracy on ImageNet by up to 5%. K-matrices can also simplify hand-engineered pipelines -- we replace filter bank feature computation in speech data preprocessing with a learnable kaleidoscope layer, resulting in only 0.4% loss in accuracy on the TIMIT speech recognition task. In addition, K-matrices can capture latent structure in models: for a challenging permuted image classification task, a K-matrix based representation of permutations is able to learn the right latent structure and improves accuracy of a downstream convolutional model by over 9%. We provide a practically efficient implementation of our approach, and use K-matrices in a Transformer network to attain 36% faster end-to-end inference speed on a language translation task.
△ Less
Submitted 5 January, 2021; v1 submitted 29 December, 2020;
originally announced December 2020.
-
No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems
Authors:
Nimit S. Sohoni,
Jared A. Dunnmon,
Geoffrey Angus,
Albert Gu,
Christopher Ré
Abstract:
In real-world classification tasks, each class often comprises multiple finer-grained "subclasses." As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses. This phenomenon, known as hidden stratification, has important consequences for models deployed in safety-critical appl…
▽ More
In real-world classification tasks, each class often comprises multiple finer-grained "subclasses." As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses. This phenomenon, known as hidden stratification, has important consequences for models deployed in safety-critical applications such as medicine. We propose GEORGE, a method to both measure and mitigate hidden stratification even when subclass labels are unknown. We first observe that unlabeled subclasses are often separable in the feature space of deep neural networks, and exploit this fact to estimate subclass labels for the training data via clustering techniques. We then use these approximate subclass labels as a form of noisy supervision in a distributionally robust optimization objective. We theoretically characterize the performance of GEORGE in terms of the worst-case generalization error across any subclass. We empirically validate GEORGE on a mix of real-world and benchmark image classification datasets, and show that our approach boosts worst-case subclass accuracy by up to 22 percentage points compared to standard training techniques, without requiring any prior information about the subclasses.
△ Less
Submitted 10 April, 2022; v1 submitted 25 November, 2020;
originally announced November 2020.
-
From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering
Authors:
Ines Chami,
Albert Gu,
Vaggos Chatziafratis,
Christopher Ré
Abstract:
Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's di…
▽ More
Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees. The key idea of our method, HypHC, is showing a direct correspondence from discrete trees to continuous representations (via the hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm that maps leaf embeddings to a dendrogram), allowing us to search the space of discrete binary trees with continuous optimization. Building on analogies between trees and hyperbolic space, we derive a continuous analogue for the notion of lowest common ancestor, which leads to a continuous relaxation of Dasgupta's discrete objective. We can show that after decoding, the global minimizer of our continuous relaxation yields a discrete tree with a (1 + epsilon)-factor approximation for Dasgupta's optimal tree, where epsilon can be made arbitrarily small and controls optimization challenges. We experimentally evaluate HypHC on a variety of HC benchmarks and find that even approximate solutions found with gradient descent have superior clustering quality than agglomerative heuristics or other gradient based algorithms. Finally, we highlight the flexibility of HypHC using end-to-end training in a downstream classification task.
△ Less
Submitted 1 October, 2020;
originally announced October 2020.
-
Using Undersampling with Ensemble Learning to Identify Factors Contributing to Preterm Birth
Authors:
Shi Dong,
Zlatan Feric,
Guangyu Li,
Chieh Wu,
April Z. Gu,
Jennifer Dy,
John Meeker,
Ingrid Y. Padilla,
Jose Cordero,
Carmen Velez Vega,
Zaira Rosario,
Akram Alshawabkeh,
David Kaeli
Abstract:
In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant a…
▽ More
In this paper, we propose Ensemble Learning models to identify factors contributing to preterm birth. Our work leverages a rich dataset collected by a NIEHS P42 Center that is trying to identify the dominant factors responsible for the high rate of premature births in northern Puerto Rico. We investigate analytical models addressing two major challenges present in the dataset: 1) the significant amount of incomplete data in the dataset, and 2) class imbalance in the dataset. First, we leverage and compare two types of missing data imputation methods: 1) mean-based and 2) similarity-based, increasing the completeness of this dataset. Second, we propose a feature selection and evaluation model based on using undersampling with Ensemble Learning to address class imbalance present in the dataset. We leverage and compare multiple Ensemble Feature selection methods, including Complete Linear Aggregation (CLA), Weighted Mean Aggregation (WMA), Feature Occurrence Frequency (OFA), and Classification Accuracy Based Aggregation (CAA). To further address missing data present in each feature, we propose two novel methods: 1) Missing Data Rate and Accuracy Based Aggregation (MAA), and 2) Entropy and Accuracy Based Aggregation (EAA). Both proposed models balance the degree of data variance introduced by the missing data handling during the feature selection process while maintaining model performance. Our results show a 42\% improvement in sensitivity versus fallout over previous state-of-the-art methods.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Authors:
Albert Gu,
Tri Dao,
Stefano Ermon,
Atri Rudra,
Christopher Re
Abstract:
A central problem in learning from sequential data is representing cumulative history in an incremental fashion as more data is processed. We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto polynomial bases. Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal soluti…
▽ More
A central problem in learning from sequential data is representing cumulative history in an incremental fashion as more data is processed. We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto polynomial bases. Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal solution to a natural online function approximation problem. As special cases, our framework yields a short derivation of the recent Legendre Memory Unit (LMU) from first principles, and generalizes the ubiquitous gating mechanism of recurrent neural networks such as GRUs. This formal framework yields a new memory update mechanism (HiPPO-LegS) that scales through time to remember all history, avoiding priors on the timescale. HiPPO-LegS enjoys the theoretical benefits of timescale robustness, fast updates, and bounded gradients. By incorporating the memory dynamics into recurrent neural networks, HiPPO RNNs can empirically capture complex temporal dependencies. On the benchmark permuted MNIST dataset, HiPPO-LegS sets a new state-of-the-art accuracy of 98.3%. Finally, on a novel trajectory classification task testing robustness to out-of-distribution timescales and missing data, HiPPO-LegS outperforms RNN and neural ODE baselines by 25-40% accuracy.
△ Less
Submitted 22 October, 2020; v1 submitted 17 August, 2020;
originally announced August 2020.
-
Rotation-Invariant Gait Identification with Quaternion Convolutional Neural Networks
Authors:
Bowen **g,
Vinay Prabhu,
Angela Gu,
John Whaley
Abstract:
A desireable property of accelerometric gait-based identification systems is robustness to new device orientations presented by users during testing but unseen during the training phase. However, traditional Convolutional neural networks (CNNs) used in these systems compensate poorly for such transformations. In this paper, we target this problem by introducing Quaternion CNN, a network architectu…
▽ More
A desireable property of accelerometric gait-based identification systems is robustness to new device orientations presented by users during testing but unseen during the training phase. However, traditional Convolutional neural networks (CNNs) used in these systems compensate poorly for such transformations. In this paper, we target this problem by introducing Quaternion CNN, a network architecture which is intrinsically layer-wise equivariant and globally invariant under 3D rotations of an array of input vectors. We show empirically that this network indeed significantly outperforms a traditional CNN in a multi-user rotation-invariant gait classification setting .Lastly, we demonstrate how the kernels learned by this QCNN can also be visualized as basis-independent but origin- and chirality-dependent trajectory fragments in the euclidean space, thus yielding a novel mode of feature visualization and extraction.
△ Less
Submitted 4 August, 2020;
originally announced August 2020.
-
Model Patching: Closing the Subgroup Performance Gap with Data Augmentation
Authors:
Karan Goel,
Albert Gu,
Yixuan Li,
Christopher Ré
Abstract:
Classifiers in machine learning are often brittle when deployed. Particularly concerning are models with inconsistent performance on specific subgroups of a class, e.g., exhibiting disparities in skin cancer classification in the presence or absence of a spurious bandage. To mitigate these performance differences, we introduce model patching, a two-stage framework for improving robustness that enc…
▽ More
Classifiers in machine learning are often brittle when deployed. Particularly concerning are models with inconsistent performance on specific subgroups of a class, e.g., exhibiting disparities in skin cancer classification in the presence or absence of a spurious bandage. To mitigate these performance differences, we introduce model patching, a two-stage framework for improving robustness that encourages the model to be invariant to subgroup differences, and focus on class information shared by subgroups. Model patching first models subgroup features within a class and learns semantic transformations between them, and then trains a classifier with data augmentations that deliberately manipulate subgroup features. We instantiate model patching with CAMEL, which (1) uses a CycleGAN to learn the intra-class, inter-subgroup augmentations, and (2) balances subgroup performance using a theoretically-motivated subgroup consistency regularizer, accompanied by a new robust objective. We demonstrate CAMEL's effectiveness on 3 benchmark datasets, with reductions in robust error of up to 33% relative to the best baseline. Lastly, CAMEL successfully patches a model that fails due to spurious features on a real-world skin cancer dataset.
△ Less
Submitted 15 August, 2020;
originally announced August 2020.
-
Weak Pullback Mean Random Attractors for Stochastic Evolution Equations and Applications
Authors:
Anhui Gu
Abstract:
In this paper, we investigate the existence and uniqueness of weak pullback mean random attractors for abstract stochastic evolution equations with general diffusion terms in Bochner spaces. As applications, the existence and uniqueness of weak pullback mean random attractors for some stochastic models such as stochastic reaction-diffusion equations, the stochastic $p$-Laplace equation and stochas…
▽ More
In this paper, we investigate the existence and uniqueness of weak pullback mean random attractors for abstract stochastic evolution equations with general diffusion terms in Bochner spaces. As applications, the existence and uniqueness of weak pullback mean random attractors for some stochastic models such as stochastic reaction-diffusion equations, the stochastic $p$-Laplace equation and stochastic porous media equations are established.
△ Less
Submitted 15 September, 2020; v1 submitted 21 July, 2020;
originally announced July 2020.
-
Discovering New Strong Gravitational Lenses in the DESI Legacy Imaging Surveys
Authors:
X. Huang,
C. Storfer,
A. Gu,
V. Ravi,
A. Pilon,
W. Sheu,
R. Venguswamy,
S. Banka,
A. Dey,
M. Landriau,
D. Lang,
A. Meisner,
J. Moustakas,
A. D. Myers,
R. Sajith,
E. F. Schlafly,
D. J. Schlegel
Abstract:
We have conducted a search for new strong gravitational lensing systems in the Dark Energy Spectroscopic Instrument Legacy Imaging Surveys' Data Release 8. We use deep residual neural networks, building on previous work presented in Huang et al. (2020). These surveys together cover approximately one third of the sky visible from the northern hemisphere, reaching a z band AB magnitude of ~22.5. We…
▽ More
We have conducted a search for new strong gravitational lensing systems in the Dark Energy Spectroscopic Instrument Legacy Imaging Surveys' Data Release 8. We use deep residual neural networks, building on previous work presented in Huang et al. (2020). These surveys together cover approximately one third of the sky visible from the northern hemisphere, reaching a z band AB magnitude of ~22.5. We compile a training sample that consists of known lensing systems as well as non-lenses in the Legacy Surveys and the Dark Energy Survey. After applying our trained neural networks to the survey data, we visually inspect and rank images with probabilities above a threshold. Here we present 1210 new strong lens candidates.
△ Less
Submitted 10 January, 2021; v1 submitted 7 May, 2020;
originally announced May 2020.
-
Rough Path Theory to approximate Random Dynamical Systems
Authors:
Hongjun Gao,
María J. Garrido-Atienza,
Anhui Gu,
Kening Lu,
Björn Schmalfuss
Abstract:
We consider the rough differential equation $dY=f(Y)d\bm \om$ where $\bm \om=(ω,\bbomega)$ is a rough path defined by a Brownian motion $ω$ on $\RR^m$. Under the usual regularity assumption on $f$, namely $f\in C^3_b (\RR^d, \RR^{d\times m})$, the rough differential equation has a unique solution that defines a random dynamical system $φ_0$. On the other hand, we also consider an ordinary random d…
▽ More
We consider the rough differential equation $dY=f(Y)d\bm \om$ where $\bm \om=(ω,\bbomega)$ is a rough path defined by a Brownian motion $ω$ on $\RR^m$. Under the usual regularity assumption on $f$, namely $f\in C^3_b (\RR^d, \RR^{d\times m})$, the rough differential equation has a unique solution that defines a random dynamical system $φ_0$. On the other hand, we also consider an ordinary random differential equation $dY_δ=f(Y_δ)dω_\de$, where $ω_\de$ is a random process with stationary increments and continuously differentiable paths that approximates $ω$. The latter differential equation generates a random dynamical system $φ_δ$ as well. We show the convergence of the random dynamical system $φ_δ$ to $φ_0$ for $δ\to 0$ in Hölder norm.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Improving the Gating Mechanism of Recurrent Neural Networks
Authors:
Albert Gu,
Caglar Gulcehre,
Tom Le Paine,
Matt Hoffman,
Razvan Pascanu
Abstract:
Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gra…
▽ More
Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gradient-based learning of the gate mechanism. We address this problem by deriving two synergistic modifications to the standard gating mechanism that are easy to implement, introduce no additional hyperparameters, and improve learnability of the gates when they are close to saturation. We show how these changes are related to and improve on alternative recently proposed gating mechanisms such as chrono initialization and Ordered Neurons. Empirically, our simple gating mechanisms robustly improve the performance of recurrent models on a range of applications, including synthetic memorization tasks, sequential image classification, language modeling, and reinforcement learning, particularly when long-term dependencies are involved.
△ Less
Submitted 18 June, 2020; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Arithmetic of weighted Catalan numbers
Authors:
Yibo Gao,
Andrew Gu
Abstract:
In this paper, we study arithmetic properties of weighted Catalan numbers. Previously, Postnikov and Sagan found conditions under which the $2$-adic valuations of the weighted Catalan numbers are equal to the $2$-adic valutations of the Catalan numbers. We obtain the same result under weaker conditions by considering a map from a class of functions to $2$-adic integers. These methods are also exte…
▽ More
In this paper, we study arithmetic properties of weighted Catalan numbers. Previously, Postnikov and Sagan found conditions under which the $2$-adic valuations of the weighted Catalan numbers are equal to the $2$-adic valutations of the Catalan numbers. We obtain the same result under weaker conditions by considering a map from a class of functions to $2$-adic integers. These methods are also extended to $q$-weighted Catalan numbers, strengthening a previous result by Konvalinka. Finally, we prove some results on the periodicity of weighted Catalan numbers modulo an integer and apply them to the specific case of the number of combinatorial types of Morse links. Many open questions are mentioned.
△ Less
Submitted 11 August, 2019;
originally announced August 2019.
-
Sparse Recovery for Orthogonal Polynomial Transforms
Authors:
Anna Gilbert,
Albert Gu,
Christopher Re,
Atri Rudra,
Mary Wootters
Abstract:
In this paper we consider the following sparse recovery problem. We have query access to a vector $\vx \in \R^N$ such that $\vhx = \vF \vx$ is $k$-sparse (or nearly $k$-sparse) for some orthogonal transform $\vF$. The goal is to output an approximation (in an $\ell_2$ sense) to $\vhx$ in sublinear time. This problem has been well-studied in the special case that $\vF$ is the Discrete Fourier Trans…
▽ More
In this paper we consider the following sparse recovery problem. We have query access to a vector $\vx \in \R^N$ such that $\vhx = \vF \vx$ is $k$-sparse (or nearly $k$-sparse) for some orthogonal transform $\vF$. The goal is to output an approximation (in an $\ell_2$ sense) to $\vhx$ in sublinear time. This problem has been well-studied in the special case that $\vF$ is the Discrete Fourier Transform (DFT), and a long line of work has resulted in sparse Fast Fourier Transforms that run in time $O(k \cdot \mathrm{polylog} N)$. However, for transforms $\vF$ other than the DFT (or closely related transforms like the Discrete Cosine Transform), the question is much less settled.
In this paper we give sublinear-time algorithms---running in time $\poly(k \log(N))$---for solving the sparse recovery problem for orthogonal transforms $\vF$ that arise from orthogonal polynomials. More precisely, our algorithm works for any $\vF$ that is an orthogonal polynomial transform derived from Jacobi polynomials. The Jacobi polynomials are a large class of classical orthogonal polynomials (and include Chebyshev and Legendre polynomials as special cases), and show up extensively in applications like numerical analysis and signal processing. One caveat of our work is that we require an assumption on the sparsity structure of the sparse vector, although we note that vectors with random support have this property with high probability.
Our approach is to give a very general reduction from the $k$-sparse sparse recovery problem to the $1$-sparse sparse recovery problem that holds for any flat orthogonal polynomial transform; then we solve this one-sparse recovery problem for transforms derived from Jacobi polynomials.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Authors:
Tri Dao,
Albert Gu,
Matthew Eichhorn,
Atri Rudra,
Christopher Ré
Abstract:
Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and…
▽ More
Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and implementations is necessary, what structural priors they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the $O(N \log N)$ Cooley-Tukey FFT algorithm to machine precision, for dimensions $N$ up to $1024$. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hidden-layer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR-10 by 3.9 points -- the first time a structured approach has done so -- with 4X faster inference speed and 40X fewer parameters.
△ Less
Submitted 28 December, 2020; v1 submitted 14 March, 2019;
originally announced March 2019.
-
Elliptical flow coalescence to identify the $f_{0}$(980) content
Authors:
An Gu,
Terrence Edmonds,
Jie Zhao,
Fuqiang Wang
Abstract:
We use a simple coalescence model to generate $f_{0}$(980) particles for three configurations: a ${s\bar{s}}$ meson, a ${u\bar{u}s\bar{s}}$ tetraquark, and a ${K^{+}K^{-}}$ molecule. The phase-space information of the coalescing constituents is taken from a multi-phase transport (AMPT) simulation of heavy-ion collisions. It is shown that the number of constituent quarks scaling of the elliptic flo…
▽ More
We use a simple coalescence model to generate $f_{0}$(980) particles for three configurations: a ${s\bar{s}}$ meson, a ${u\bar{u}s\bar{s}}$ tetraquark, and a ${K^{+}K^{-}}$ molecule. The phase-space information of the coalescing constituents is taken from a multi-phase transport (AMPT) simulation of heavy-ion collisions. It is shown that the number of constituent quarks scaling of the elliptic flow anisotropy can be used to discern ${s\bar{s}}$ from ${u\bar{u}s\bar{s}}$ and ${K^{+}K^{-}}$ configurations.
△ Less
Submitted 19 February, 2019;
originally announced February 2019.
-
Learning Compressed Transforms with Low Displacement Rank
Authors:
Anna T. Thomas,
Albert Gu,
Tri Dao,
Atri Rudra,
Christopher Ré
Abstract:
The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual. Existing use of LDR matrices in deep learning has applied fixed displacement operators encoding forms of shift invariance akin to convolutions. We introduce a class of LDR matrices with more general displacement operators, and explicitly learn over both…
▽ More
The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual. Existing use of LDR matrices in deep learning has applied fixed displacement operators encoding forms of shift invariance akin to convolutions. We introduce a class of LDR matrices with more general displacement operators, and explicitly learn over both the operators and the low-rank component. This class generalizes several previous constructions while preserving compression and efficient computation. We prove bounds on the VC dimension of multi-layer neural networks with structured weight matrices and show empirically that our compact parameterization can reduce the sample complexity of learning. When replacing weight layers in fully-connected, convolutional, and recurrent neural networks for image classification and language modeling tasks, our new classes exceed the accuracy of existing compression approaches, and on some tasks also outperform general unstructured layers while using more than 20x fewer parameters.
△ Less
Submitted 1 January, 2019; v1 submitted 4 October, 2018;
originally announced October 2018.
-
Representation Tradeoffs for Hyperbolic Embeddings
Authors:
Christopher De Sa,
Albert Gu,
Christopher Ré,
Frederic Sala
Abstract:
Hyperbolic embeddings offer excellent quality with few dimensions when embedding hierarchical data structures like synonym or type hierarchies. Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization. On WordNet, our combinatorial embedding obtains a mean-average-precision of 0.989 with only two dimensio…
▽ More
Hyperbolic embeddings offer excellent quality with few dimensions when embedding hierarchical data structures like synonym or type hierarchies. Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization. On WordNet, our combinatorial embedding obtains a mean-average-precision of 0.989 with only two dimensions, while Nickel et al.'s recent construction obtains 0.87 using 200 dimensions. We provide upper and lower bounds that allow us to characterize the precision-dimensionality tradeoff inherent in any hyperbolic embedding. To embed general metric spaces, we propose a hyperbolic generalization of multidimensional scaling (h-MDS). We show how to perform exact recovery of hyperbolic points from distances, provide a perturbation analysis, and give a recovery result that allows us to reduce dimensionality. The h-MDS approach offers consistently low distortion even with few dimensions across several datasets. Finally, we extract lessons from the algorithms and theory above to design a PyTorch-based implementation that can handle incomplete information and is scalable.
△ Less
Submitted 24 April, 2018; v1 submitted 9 April, 2018;
originally announced April 2018.
-
A Kernel Theory of Modern Data Augmentation
Authors:
Tri Dao,
Albert Gu,
Alexander J. Ratner,
Virginia Smith,
Christopher De Sa,
Christopher Ré
Abstract:
Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear natural…
▽ More
Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear naturally with respect to this model, even when we do not employ kernel classification. Next, we analyze more directly the effect of augmentation on kernel classifiers, showing that data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. These frameworks both serve to illustrate the ways in which data augmentation affects the downstream learning model, and the resulting analyses provide novel connections between prior work in invariant kernels, tangent propagation, and robust optimization. Finally, we provide several proof-of-concept applications showing that our theory can be useful for accelerating machine learning workflows, such as reducing the amount of computation needed to train using augmented data, and predicting the utility of a transformation prior to training.
△ Less
Submitted 20 March, 2019; v1 submitted 16 March, 2018;
originally announced March 2018.
-
Locally Adaptive Learning Loss for Semantic Image Segmentation
Authors:
**jiang Guo,
Pengyuan Ren,
Aiguo Gu,
Jian Xu,
Weixin Wu
Abstract:
We propose a novel locally adaptive learning estimator for enhancing the inter- and intra- discriminative capabilities of Deep Neural Networks, which can be used as improved loss layer for semantic image segmentation tasks. Most loss layers compute pixel-wise cost between feature maps and ground truths, ignoring spatial layouts and interactions between neighboring pixels with same object category,…
▽ More
We propose a novel locally adaptive learning estimator for enhancing the inter- and intra- discriminative capabilities of Deep Neural Networks, which can be used as improved loss layer for semantic image segmentation tasks. Most loss layers compute pixel-wise cost between feature maps and ground truths, ignoring spatial layouts and interactions between neighboring pixels with same object category, and thus networks cannot be effectively sensitive to intra-class connections. Stride by stride, our method firstly conducts adaptive pooling filter operating over predicted feature maps, aiming to merge predicted distributions over a small group of neighboring pixels with same category, and then it computes cost between the merged distribution vector and their category label. Such design can make groups of neighboring predictions from same category involved into estimations on predicting correctness with respect to their category, and hence train networks to be more sensitive to regional connections between adjacent pixels based on their categories. In the experiments on Pascal VOC 2012 segmentation datasets, the consistently improved results show that our proposed approach achieves better segmentation masks against previous counterparts.
△ Less
Submitted 15 April, 2018; v1 submitted 23 February, 2018;
originally announced February 2018.
-
A Two Pronged Progress in Structured Dense Matrix Multiplication
Authors:
Christopher De Sa,
Albert Gu,
Rohan Puttagunta,
Christopher Ré,
Atri Rudra
Abstract:
Matrix-vector multiplication is one of the most fundamental computing primitives. Given a matrix $A\in\mathbb{F}^{N\times N}$ and a vector $b$, it is known that in the worst case $Θ(N^2)$ operations over $\mathbb{F}$ are needed to compute $Ab$. A broad question is to identify classes of structured dense matrices that can be represented with $O(N)$ parameters, and for which matrix-vector multiplica…
▽ More
Matrix-vector multiplication is one of the most fundamental computing primitives. Given a matrix $A\in\mathbb{F}^{N\times N}$ and a vector $b$, it is known that in the worst case $Θ(N^2)$ operations over $\mathbb{F}$ are needed to compute $Ab$. A broad question is to identify classes of structured dense matrices that can be represented with $O(N)$ parameters, and for which matrix-vector multiplication can be performed sub-quadratically. One such class of structured matrices is the orthogonal polynomial transforms, whose rows correspond to a family of orthogonal polynomials. Other well known classes include the Toeplitz, Hankel, Vandermonde, Cauchy matrices and their extensions that are all special cases of a ldisplacement rank property. In this paper, we make progress on two fronts:
1. We introduce the notion of recurrence width of matrices. For matrices with constant recurrence width, we design algorithms to compute $Ab$ and $A^Tb$ with a near-linear number of operations. This notion of width is finer than all the above classes of structured matrices and thus we can compute multiplication for all of them using the same core algorithm.
2. We additionally adapt this algorithm to an algorithm for a much more general class of matrices with displacement structure: those with low displacement rank with respect to quasiseparable matrices. This class includes Toeplitz-plus-Hankel-like matrices, Discrete Cosine/Sine Transforms, and more, and captures all previously known matrices with displacement structure that we are aware of under a unified parametrization and algorithm.
Our work unifies, generalizes, and simplifies existing state-of-the-art results in structured matrix-vector multiplication. Finally, we show how applications in areas such as multipoint evaluations of multivariate polynomials can be reduced to problems involving low recurrence width matrices.
△ Less
Submitted 17 November, 2017; v1 submitted 4 November, 2016;
originally announced November 2016.
-
Regularity of pullback attractors and equilibrium for non-autonomous stochastic FitzHugh-Nagumo system on unbounded domains
Authors:
Wenqiang Zhao,
Anhui Gu
Abstract:
A theory on bi-spatial random attractors developed recently by Li \emph{et al.} is extended to study stochastic Fitzhugh-Nagumo system driven by a non-autonomous term as well as a general multiplicative noise. By using the so-called notions of uniform absorption and uniformly pullback asymptotic compactness, it is showed that every generated random cocycle has a pullback attractor in…
▽ More
A theory on bi-spatial random attractors developed recently by Li \emph{et al.} is extended to study stochastic Fitzhugh-Nagumo system driven by a non-autonomous term as well as a general multiplicative noise. By using the so-called notions of uniform absorption and uniformly pullback asymptotic compactness, it is showed that every generated random cocycle has a pullback attractor in $L^l(\mathbb{R}^N)\times L^2(\mathbb{R}^N)$ with $l\in(2,p]$, and the family of obtained attractors is upper semi-continuous at any intensity of noise. Moreover, if some additional conditions are added, then the system possesses a unique equilibrium and is attracted by a single point.
△ Less
Submitted 27 April, 2015; v1 submitted 2 December, 2014;
originally announced December 2014.
-
A random attractor for stochastic porous media equations on infinite lattices
Authors:
Anhui Gu,
Yangrong Li,
Jia Li
Abstract:
The paper is devoted to studying the existence of a random attractor for stochastic porous media equations on infinite lattices under some conditions.
The paper is devoted to studying the existence of a random attractor for stochastic porous media equations on infinite lattices under some conditions.
△ Less
Submitted 22 September, 2014; v1 submitted 21 September, 2014;
originally announced September 2014.
-
Random Attractors of Stochastic Lattice Dynamical Systems Driven by Fractional Brownian Motions and its Erratum
Authors:
Anhui Gu
Abstract:
This paper is devoted to considering the stochastic lattice dynamical systems (SLDS) driven by fractional Brownian motions with Hurst parameter bigger than $1/2$. Under usual dissipativity conditions these SLDS are shown to generate a random dynamical system for which the existence and unique of a random attractor is established. Furthermore, the random attractor is in fact a singleton sets random…
▽ More
This paper is devoted to considering the stochastic lattice dynamical systems (SLDS) driven by fractional Brownian motions with Hurst parameter bigger than $1/2$. Under usual dissipativity conditions these SLDS are shown to generate a random dynamical system for which the existence and unique of a random attractor is established. Furthermore, the random attractor is in fact a singleton sets random attractor. Next, we give an erratum because of the misused theory.
△ Less
Submitted 27 August, 2014; v1 submitted 26 August, 2014;
originally announced August 2014.
-
Sector-Based Factor Models for Asset Returns
Authors:
Angela Gu,
Patrick Zeng
Abstract:
Factor analysis is a statistical technique employed to evaluate how observed variables correlate through common factors and unique variables. While it is often used to analyze price movement in the unstable stock market, it does not always yield easily interpretable results. In this study, we develop improved factor models by explicitly incorporating sector information on our studied stocks. We ad…
▽ More
Factor analysis is a statistical technique employed to evaluate how observed variables correlate through common factors and unique variables. While it is often used to analyze price movement in the unstable stock market, it does not always yield easily interpretable results. In this study, we develop improved factor models by explicitly incorporating sector information on our studied stocks. We add eleven sectors of stocks as defined by the IBES, represented by respective sector-specific factors, to non-specific market factors to revise the factor model. We then develop an expectation maximization (EM) algorithm to compute our revised model with 15 years' worth of S&P 500 stocks' daily close prices. Our results in most sectors show that nearly all of these factor components have the same sign, consistent with the intuitive idea that stocks in the same sector tend to rise and fall in coordination over time. Results obtained by the classic factor model, in contrast, had a homogeneous blend of positive and negative components. We conclude that results produced by our sector-based factor model are more interpretable than those produced by the classic non-sector-based model for at least some stock sectors.
△ Less
Submitted 11 August, 2014;
originally announced August 2014.
-
Sufficient Criteria for Existence of Pullback Attractors for Stochastic Lattice Dynamical Systems with Deterministic Non-autonomous Terms
Authors:
Anhui Gu,
Yangrong Li
Abstract:
We consider the pullback attractors for non-autonomous dynamical systems generated by stochastic lattice differential equations with non-autonomous deterministic terms. We first establish a sufficient condition for existence of pullback attractors of lattice dynamical systems with both non-autonomous deterministic and random forcing terms. As an application of the abstract theory, we prove the exi…
▽ More
We consider the pullback attractors for non-autonomous dynamical systems generated by stochastic lattice differential equations with non-autonomous deterministic terms. We first establish a sufficient condition for existence of pullback attractors of lattice dynamical systems with both non-autonomous deterministic and random forcing terms. As an application of the abstract theory, we prove the existence of a unique pullback attractor for the first-order lattice dynamical systems with both deterministic non-autonomous forcing terms and multiplicative white noise. Our results recover many existing ones on the existences of pullback attractors for lattice dynamical systems with autonomous terms or white noises.
△ Less
Submitted 2 April, 2014;
originally announced April 2014.
-
Random Attractor For Stochastic Lattice FitzHugh-Nagumo System Driven By $α$-stable Lévy Noises
Authors:
Anhui Gu,
Yangrong Li,
Jia Li
Abstract:
The present paper is devoted to the existence of a random attractor for stochastic lattice FitzHugh-Nagumo system driven by $α$-stable Lévy noises under some dissipative conditions.
The present paper is devoted to the existence of a random attractor for stochastic lattice FitzHugh-Nagumo system driven by $α$-stable Lévy noises under some dissipative conditions.
△ Less
Submitted 23 February, 2014; v1 submitted 9 December, 2013;
originally announced December 2013.
-
Synchronization of Coupled Stochastic Systems Driven by Non-Gaussian Lévy Noises
Authors:
Anhui Gu,
Yangrong Li
Abstract:
We consider the synchronization of the solutions to coupled stochastic systems of $N$-stochastic ordinary differential equations (SODEs) driven by Non-Gaussian Lévy noises ($N\in \mathbb{N})$. We discuss the synchronization between two solutions and among different components of solutions under certain dissipative and integrability conditions. Our results generalize the present work obtained in Li…
▽ More
We consider the synchronization of the solutions to coupled stochastic systems of $N$-stochastic ordinary differential equations (SODEs) driven by Non-Gaussian Lévy noises ($N\in \mathbb{N})$. We discuss the synchronization between two solutions and among different components of solutions under certain dissipative and integrability conditions. Our results generalize the present work obtained in Liu et al (2010) and Shen et al (2010).
△ Less
Submitted 9 December, 2013;
originally announced December 2013.
-
Singleton sets random attractor for stochastic FitzHugh-Nagumo lattice equations driven by fractional Brownian motions
Authors:
Anhui Gu,
Yangrong Li
Abstract:
The paper is devoted to the study of the dynamical behavior of the solutions of stochastic FitzHugh-Nagumo lattice equations, driven by fractional Brownian motions, with Hurst parameter greater than $1/2$. Under some usual dissipativity conditions, the system considered here features different dynamics from the same one perturbed by Brownian motion. In our case, the random dynamical system has a u…
▽ More
The paper is devoted to the study of the dynamical behavior of the solutions of stochastic FitzHugh-Nagumo lattice equations, driven by fractional Brownian motions, with Hurst parameter greater than $1/2$. Under some usual dissipativity conditions, the system considered here features different dynamics from the same one perturbed by Brownian motion. In our case, the random dynamical system has a unique random equilibrium, which constitutes a singleton sets random attractor.
△ Less
Submitted 9 April, 2014; v1 submitted 26 October, 2013;
originally announced October 2013.
-
The Power of Deferral: Maintaining a Constant-Competitive Steiner Tree Online
Authors:
Albert Gu,
Anupam Gupta,
Amit Kumar
Abstract:
In the online Steiner tree problem, a sequence of points is revealed one-by-one: when a point arrives, we only have time to add a single edge connecting this point to the previous ones, and we want to minimize the total length of edges added. For two decades, we know that the greedy algorithm maintains a tree whose cost is O(log n) times the Steiner tree cost, and this is best possible. But suppos…
▽ More
In the online Steiner tree problem, a sequence of points is revealed one-by-one: when a point arrives, we only have time to add a single edge connecting this point to the previous ones, and we want to minimize the total length of edges added. For two decades, we know that the greedy algorithm maintains a tree whose cost is O(log n) times the Steiner tree cost, and this is best possible. But suppose, in addition to the new edge we add, we can change a single edge from the previous set of edges: can we do much better? Can we maintain a tree that is constant-competitive?
We answer this question in the affirmative. We give a primal-dual algorithm, and a novel dual-based analysis, that makes only a single swap per step (in addition to adding the edge connecting the new point to the previous ones), and such that the tree's cost is only a constant times the optimal cost.
Previous results for this problem gave an algorithm that performed an amortized constant number of swaps: for each n, the number of swaps in the first n steps was O(n). We also give a simpler tight analysis for this amortized case.
△ Less
Submitted 21 October, 2013; v1 submitted 14 July, 2013;
originally announced July 2013.
-
An Optimal Differentiable Sphere Theorem for Complete Manifolds
Authors:
Hong-Wei Xu And Juan-Ru Gu
Abstract:
A new differentiable sphere theorem is obtained from the view of submanifold geometry. An important scalar is defined by the scalar curvature and the mean curvature of an oriented complete submanifold $M^n$ in a space form $F^{n+p}(c)$ with $c\ge0$. Making use of the Hamilton-Brendle-Schoen convergence result for Ricci flow and the Lawson-Simons-Xin formula for the nonexistence of stable currents,…
▽ More
A new differentiable sphere theorem is obtained from the view of submanifold geometry. An important scalar is defined by the scalar curvature and the mean curvature of an oriented complete submanifold $M^n$ in a space form $F^{n+p}(c)$ with $c\ge0$. Making use of the Hamilton-Brendle-Schoen convergence result for Ricci flow and the Lawson-Simons-Xin formula for the nonexistence of stable currents, we prove that if the infimum of this scalar is positive, then $M$ is diffeomorphic to $S^n$. We then introduce an intrinsic invariant $I(M)$ for oriented complete Riemannian $n$-manifold $M$ via the scalar, and prove that if $I(M)>0$, then $M$ is diffeomorphic to $S^n$. It should be emphasized that our differentiable sphere theorem is optimal for arbitrary $n(\ge2)$.
△ Less
Submitted 14 May, 2010;
originally announced May 2010.