Search | arXiv e-print repository

Building a stable classifier with the inflated argmax

Authors: Jake A. Soloff, Rina Foygel Barber, Rebecca Willett

Abstract: We propose a new framework for algorithmic stability in the context of multiclass classification. In practice, classification algorithms often operate by first assigning a continuous score (for instance, an estimated probability) to each possible label, then taking the maximizer -- i.e., selecting the class that has the highest score. A drawback of this type of approach is that it is inherently un… ▽ More We propose a new framework for algorithmic stability in the context of multiclass classification. In practice, classification algorithms often operate by first assigning a continuous score (for instance, an estimated probability) to each possible label, then taking the maximizer -- i.e., selecting the class that has the highest score. A drawback of this type of approach is that it is inherently unstable, meaning that it is very sensitive to slight perturbations of the training data, since taking the maximizer is discontinuous. Motivated by this challenge, we propose a pipeline for constructing stable classifiers from data, using bagging (i.e., resampling and averaging) to produce stable continuous scores, and then using a stable relaxation of argmax, which we call the "inflated argmax," to convert these scores to a set of candidate labels. The resulting stability guarantee places no distributional assumptions on the data, does not depend on the number of classes or dimensionality of the covariates, and holds for any base classifier. Using a common benchmark data set, we demonstrate that the inflated argmax provides necessary protection against unstable classifiers, without loss of accuracy. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13214 [pdf, other]

Multi-Frequency Progressive Refinement for Learned Inverse Scattering

Authors: Owen Melia, Olivia Tsang, Vasileios Charisopoulos, Yuehaw Khoo, Jeremy Hoskins, Rebecca Willett

Abstract: Interpreting scattered acoustic and electromagnetic wave patterns is a computational task that enables remote imaging in a number of important applications, including medical imaging, geophysical exploration, sonar and radar detection, and nondestructive testing of materials. However, accurately and stably recovering an inhomogeneous medium from far-field scattered wave measurements is a computati… ▽ More Interpreting scattered acoustic and electromagnetic wave patterns is a computational task that enables remote imaging in a number of important applications, including medical imaging, geophysical exploration, sonar and radar detection, and nondestructive testing of materials. However, accurately and stably recovering an inhomogeneous medium from far-field scattered wave measurements is a computationally difficult problem, due to the nonlinear and non-local nature of the forward scattering process. We design a neural network, called Multi-Frequency Inverse Scattering Network (MFISNet), and a training method to approximate the inverse map from far-field scattered wave measurements at multiple frequencies. We consider three variants of MFISNet, with the strongest performing variant inspired by the recursive linearization method -- a commonly used technique for stably inverting scattered wavefield data -- that progressively refines the estimate with higher frequency content. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 25 pages, 8 figures

arXiv:2405.13180 [pdf, other]

Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet

Authors: Melissa Adrian, Daniel Sanz-Alonso, Rebecca Willett

Abstract: Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and… ▽ More Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and the sparsity of the observations, filtering estimates can remain accurate in the long-time horizon. As a case study, we integrate FourCastNet, a state-of-the-art weather surrogate model, within a variational data assimilation framework using partial, noisy ERA5 data. Our results show that filtering estimates remain accurate over a year-long assimilation window and provide effective initial conditions for forecasting tasks, including extreme event prediction. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.09511 [pdf, other]

Stability via resampling: statistical problems beyond the real line

Authors: Jake A. Soloff, Rina Foygel Barber, Rebecca Willett

Abstract: Model averaging techniques based on resampling methods (such as bootstrap** or subsampling) have been utilized across many areas of statistics, often with the explicit goal of promoting stability in the resulting output. We provide a general, finite-sample theoretical result guaranteeing the stability of bagging when applied to algorithms that return outputs in a general space, so that the outpu… ▽ More Model averaging techniques based on resampling methods (such as bootstrap** or subsampling) have been utilized across many areas of statistics, often with the explicit goal of promoting stability in the resulting output. We provide a general, finite-sample theoretical result guaranteeing the stability of bagging when applied to algorithms that return outputs in a general space, so that the output is not necessarily a real-valued -- for example, an algorithm that estimates a vector of weights or a density function. We empirically assess the stability of bagging on synthetic and real-world data for a range of problem settings, including causal inference, nonparametric regression, and Bayesian model selection. △ Less

Submitted 24 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.10947 [pdf, other]

Residual Connections Harm Abstract Feature Learning in Masked Autoencoders

Authors: Xiao Zhang, Ruoxi Jiang, William Gao, Rebecca Willett, Michael Maire

Abstract: We demonstrate that adding a weighting factor to decay the strength of identity shortcuts within residual networks substantially improves semantic feature learning in the state-of-the-art self-supervised masked autoencoding (MAE) paradigm. Our modification to the identity shortcuts within a VIT-B/16 backbone of an MAE boosts linear probing accuracy on ImageNet from 67.8% to 72.7%. This significant… ▽ More We demonstrate that adding a weighting factor to decay the strength of identity shortcuts within residual networks substantially improves semantic feature learning in the state-of-the-art self-supervised masked autoencoding (MAE) paradigm. Our modification to the identity shortcuts within a VIT-B/16 backbone of an MAE boosts linear probing accuracy on ImageNet from 67.8% to 72.7%. This significant gap suggests that, while residual connection structure serves an essential role in facilitating gradient propagation, it may have a harmful side effect of reducing capacity for abstract learning by virtue of injecting an echo of shallower representations into deeper layers. We ameliorate this downside via a fixed formula for monotonically decreasing the contribution of identity connections as layer depth increases. Our design promotes the gradual development of feature abstractions, without impacting network trainability. Analyzing the representations learned by our modified residual networks, we find correlation between low effective feature rank and downstream task performance. △ Less

Submitted 20 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2403.05583 [pdf, other]

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

Authors: Tyler Benster, Guy Wilson, Reshef Elisha, Francis R Willett, Shaul Druckmann

Abstract: Silent Speech Interfaces (SSIs) offer a noninvasive alternative to brain-computer interfaces for soundless verbal communication. We introduce Multimodal Orofacial Neural Audio (MONA), a system that leverages cross-modal alignment through novel loss functions--cross-contrast (crossCon) and supervised temporal contrast (supTcon)--to train a multimodal model with a shared latent representation. This… ▽ More Silent Speech Interfaces (SSIs) offer a noninvasive alternative to brain-computer interfaces for soundless verbal communication. We introduce Multimodal Orofacial Neural Audio (MONA), a system that leverages cross-modal alignment through novel loss functions--cross-contrast (crossCon) and supervised temporal contrast (supTcon)--to train a multimodal model with a shared latent representation. This architecture enables the use of audio-only datasets like LibriSpeech to improve silent speech recognition. Additionally, our introduction of Large Language Model (LLM) Integrated Scoring Adjustment (LISA) significantly improves recognition accuracy. Together, MONA LISA reduces the state-of-the-art word error rate (WER) from 28.8% to 12.2% in the Gaddy (2020) benchmark dataset for silent speech on an open vocabulary. For vocal EMG recordings, our method improves the state-of-the-art from 23.3% to 3.7% WER. In the Brain-to-Text 2024 competition, LISA performs best, improving the top WER from 9.8% to 8.9%. To the best of our knowledge, this work represents the first instance where noninvasive silent speech recognition on an open vocabulary has cleared the threshold of 15% WER, demonstrating that SSIs can be a viable alternative to automatic speech recognition (ASR). Our work not only narrows the performance gap between silent and vocalized speech but also opens new possibilities in human-computer interaction, demonstrating the potential of cross-modal approaches in noisy and data-limited regimes. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.08808 [pdf, other]

Depth Separation in Norm-Bounded Infinite-Width Neural Networks

Authors: Suzanna Parkinson, Greg Ongie, Rebecca Willett, Ohad Shamir, Nathan Srebro

Abstract: We study depth separation in infinite-width neural networks, where complexity is controlled by the overall squared $\ell_2$-norm of the weights (sum of squares of all weights in the network). Whereas previous depth separation results focused on separation in terms of width, such results do not give insight into whether depth determines if it is possible to learn a network that generalizes well eve… ▽ More We study depth separation in infinite-width neural networks, where complexity is controlled by the overall squared $\ell_2$-norm of the weights (sum of squares of all weights in the network). Whereas previous depth separation results focused on separation in terms of width, such results do not give insight into whether depth determines if it is possible to learn a network that generalizes well even when the network width is unbounded. Here, we study separation in terms of the sample complexity required for learnability. Specifically, we show that there are functions that are learnable with sample complexity polynomial in the input dimension by norm-controlled depth-3 ReLU networks, yet are not learnable with sub-exponential sample complexity by norm-controlled depth-2 ReLU networks (with any value for the norm). We also show that a similar statement in the reverse direction is not possible: any function learnable with polynomial sample complexity by a norm-controlled depth-2 ReLU network with infinite width is also learnable with polynomial sample complexity by a norm-controlled depth-3 ReLU network. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.06837 [pdf, ps, other]

The rational HK-conjecture: transformation groupoids and a revised version

Authors: Robin J. Deeley, Rufus Willett

Abstract: We prove the rational HK-conjecture for a large class of transformation groupoids in the case when the relevant action has torsion-free stabilizers. A revised version of the rational HK-conjecture in the case of (possibly) torsion stabilizers is introduced and proved for a large class of transformation groupoids. In particular, this revised version holds for Scarparo's counterexamples to the origi… ▽ More We prove the rational HK-conjecture for a large class of transformation groupoids in the case when the relevant action has torsion-free stabilizers. A revised version of the rational HK-conjecture in the case of (possibly) torsion stabilizers is introduced and proved for a large class of transformation groupoids. In particular, this revised version holds for Scarparo's counterexamples to the original rational HK-conjecture. The key tools used are the Baum-Connes conjecture and a Chern character defined by Raven. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 43 pages

MSC Class: 46L80; 22A22

arXiv:2311.03611 [pdf, other]

Plug-and-Play Stability for Intracortical Brain-Computer Interfaces: A One-Year Demonstration of Seamless Brain-to-Text Communication

Authors: Chaofei Fan, Nick Hahn, Foram Kamdar, Donald Avansino, Guy H. Wilson, Leigh Hochberg, Krishna V. Shenoy, Jaimie M. Henderson, Francis R. Willett

Abstract: Intracortical brain-computer interfaces (iBCIs) have shown promise for restoring rapid communication to people with neurological disorders such as amyotrophic lateral sclerosis (ALS). However, to maintain high performance over time, iBCIs typically need frequent recalibration to combat changes in the neural recordings that accrue over days. This requires iBCI users to stop using the iBCI and engag… ▽ More Intracortical brain-computer interfaces (iBCIs) have shown promise for restoring rapid communication to people with neurological disorders such as amyotrophic lateral sclerosis (ALS). However, to maintain high performance over time, iBCIs typically need frequent recalibration to combat changes in the neural recordings that accrue over days. This requires iBCI users to stop using the iBCI and engage in supervised data collection, making the iBCI system hard to use. In this paper, we propose a method that enables self-recalibration of communication iBCIs without interrupting the user. Our method leverages large language models (LMs) to automatically correct errors in iBCI outputs. The self-recalibration process uses these corrected outputs ("pseudo-labels") to continually update the iBCI decoder online. Over a period of more than one year (403 days), we evaluated our Continual Online Recalibration with Pseudo-labels (CORP) framework with one clinical trial participant. CORP achieved a stable decoding accuracy of 93.84% in an online handwriting iBCI task, significantly outperforming other baseline methods. Notably, this is the longest-running iBCI stability demonstration involving a human participant. Our results provide the first evidence for long-term stabilization of a plug-and-play, high-performance communication iBCI, addressing a major barrier for the clinical translation of iBCIs. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2308.06271 [pdf, other]

Rotation-Invariant Random Features Provide a Strong Baseline for Machine Learning on 3D Point Clouds

Authors: Owen Melia, Eric Jonas, Rebecca Willett

Abstract: Rotational invariance is a popular inductive bias used by many fields in machine learning, such as computer vision and machine learning for quantum chemistry. Rotation-invariant machine learning methods set the state of the art for many tasks, including molecular property prediction and 3D shape classification. These methods generally either rely on task-specific rotation-invariant features, or th… ▽ More Rotational invariance is a popular inductive bias used by many fields in machine learning, such as computer vision and machine learning for quantum chemistry. Rotation-invariant machine learning methods set the state of the art for many tasks, including molecular property prediction and 3D shape classification. These methods generally either rely on task-specific rotation-invariant features, or they use general-purpose deep neural networks which are complicated to design and train. However, it is unclear whether the success of these methods is primarily due to the rotation invariance or the deep neural networks. To address this question, we suggest a simple and general-purpose method for learning rotation-invariant functions of three-dimensional point cloud data using a random features approach. Specifically, we extend the random features method of Rahimi & Recht 2007 by deriving a version that is invariant to three-dimensional rotations and showing that it is fast to evaluate on point cloud data. We show through experiments that our method matches or outperforms the performance of general-purpose rotation-invariant neural networks on standard molecular property prediction benchmark datasets QM7 and QM9. We also show that our method is general-purpose and provides a rotation-invariant baseline on the ModelNet40 shape classification task. Finally, we show that our method has an order of magnitude smaller prediction latency than competing kernel methods. △ Less

Submitted 27 July, 2023; originally announced August 2023.

arXiv:2307.11529 [pdf, ps, other]

Coarse equivalence versus bijective coarse equivalence of expander graphs

Authors: Florent Baudier, Bruno de Mendonça Braga, Ilijas Farah, Alessandro Vignati, Rufus Willett

Abstract: We provide a characterization of when a coarse equivalence between coarse disjoint unions of expander graphs is close to a bijective coarse equivalence. We use this to show that if the uniform Roe algebras of coarse disjoint unions of expanders graphs are isomorphic, then the metric spaces must be bijectively coarsely equivalent. We provide a characterization of when a coarse equivalence between coarse disjoint unions of expander graphs is close to a bijective coarse equivalence. We use this to show that if the uniform Roe algebras of coarse disjoint unions of expanders graphs are isomorphic, then the metric spaces must be bijectively coarsely equivalent. △ Less

Submitted 21 July, 2023; originally announced July 2023.

arXiv:2306.08693 [pdf, other]

Integrating Uncertainty Awareness into Conformalized Quantile Regression

Authors: Raphael Rossellini, Rina Foygel Barber, Rebecca Willett

Abstract: Conformalized Quantile Regression (CQR) is a recently proposed method for constructing prediction intervals for a response $Y$ given covariates $X$, without making distributional assumptions. However, existing constructions of CQR can be ineffective for problems where the quantile regressors perform better in certain parts of the feature space than others. The reason is that the prediction interva… ▽ More Conformalized Quantile Regression (CQR) is a recently proposed method for constructing prediction intervals for a response $Y$ given covariates $X$, without making distributional assumptions. However, existing constructions of CQR can be ineffective for problems where the quantile regressors perform better in certain parts of the feature space than others. The reason is that the prediction intervals of CQR do not distinguish between two forms of uncertainty: first, the variability of the conditional distribution of $Y$ given $X$ (i.e., aleatoric uncertainty), and second, our uncertainty in estimating this conditional distribution (i.e., epistemic uncertainty). This can lead to intervals that are overly narrow in regions where epistemic uncertainty is high. To address this, we propose a new variant of the CQR methodology, Uncertainty-Aware CQR (UACQR), that explicitly separates these two sources of uncertainty to adjust quantile regressors differentially across the feature space. Compared to CQR, our methods enjoy the same distribution-free theoretical coverage guarantees, while demonstrating in our experiments stronger conditional coverage properties in simulated settings and real-world data sets alike. △ Less

Submitted 12 March, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

Journal ref: PMLR 238:1540-1548, 2024

arXiv:2306.06342 [pdf, other]

Distribution-free inference with hierarchical data

Authors: Yonghoon Lee, Rina Foygel Barber, Rebecca Willett

Abstract: This paper studies distribution-free inference in settings where the data set has a hierarchical structure -- for example, groups of observations, or repeated measurements. In such settings, standard notions of exchangeability may not hold. To address this challenge, a hierarchical form of exchangeability is derived, facilitating extensions of distribution-free methods, including conformal predict… ▽ More This paper studies distribution-free inference in settings where the data set has a hierarchical structure -- for example, groups of observations, or repeated measurements. In such settings, standard notions of exchangeability may not hold. To address this challenge, a hierarchical form of exchangeability is derived, facilitating extensions of distribution-free methods, including conformal prediction and jackknife+. While the standard theoretical guarantee obtained by the conformal prediction framework is a marginal predictive coverage guarantee, in the special case of independent repeated measurements, it is possible to achieve a stronger form of coverage -- the "second-moment coverage" property -- to provide better control of conditional miscoverage rates, and distribution-free prediction sets that achieve this property are constructed. Simulations illustrate that this guarantee indeed leads to uniformly small conditional miscoverage rates. Empirically, this stronger guarantee comes at the cost of a larger width of the prediction set in scenarios where the fitted model is poorly calibrated, but this cost is very mild in cases where the fitted model is accurate. △ Less

Submitted 2 March, 2024; v1 submitted 10 June, 2023; originally announced June 2023.

arXiv:2306.01187 [pdf, other]

Training neural operators to preserve invariant measures of chaotic attractors

Authors: Ruoxi Jiang, Peter Y. Lu, Elena Orlova, Rebecca Willett

Abstract: Chaotic systems make long-horizon forecasts difficult because small perturbations in initial conditions cause trajectories to diverge at an exponential rate. In this setting, neural operators trained to minimize squared error losses, while capable of accurate short-term forecasts, often fail to reproduce statistical or structural properties of the dynamics over longer time horizons and can yield d… ▽ More Chaotic systems make long-horizon forecasts difficult because small perturbations in initial conditions cause trajectories to diverge at an exponential rate. In this setting, neural operators trained to minimize squared error losses, while capable of accurate short-term forecasts, often fail to reproduce statistical or structural properties of the dynamics over longer time horizons and can yield degenerate results. In this paper, we propose an alternative framework designed to preserve invariant measures of chaotic attractors that characterize the time-invariant statistical properties of the dynamics. Specifically, in the multi-environment setting (where each sample trajectory is governed by slightly different dynamics), we consider two novel approaches to training with noisy data. First, we propose a loss based on the optimal transport distance between the observed dynamics and the neural operator outputs. This approach requires expert knowledge of the underlying physics to determine what statistical features should be included in the optimal transport loss. Second, we show that a contrastive learning framework, which does not require any specialized prior knowledge, can preserve statistical properties of the dynamics nearly as well as the optimal transport approach. On a variety of chaotic systems, our method is shown empirically to preserve invariant measures of chaotic attractors. △ Less

Submitted 16 April, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted at NeurIPS 2023

arXiv:2305.19685 [pdf, other]

Deep Stochastic Mechanics

Authors: Elena Orlova, Aleksei Ustimenko, Ruoxi Jiang, Peter Y. Lu, Rebecca Willett

Abstract: This paper introduces a novel deep-learning-based approach for numerical simulation of a time-evolving Schrödinger equation inspired by stochastic mechanics and generative diffusion models. Unlike existing approaches, which exhibit computational complexity that scales exponentially in the problem dimension, our method allows us to adapt to the latent low-dimensional structure of the wave function… ▽ More This paper introduces a novel deep-learning-based approach for numerical simulation of a time-evolving Schrödinger equation inspired by stochastic mechanics and generative diffusion models. Unlike existing approaches, which exhibit computational complexity that scales exponentially in the problem dimension, our method allows us to adapt to the latent low-dimensional structure of the wave function by sampling from the Markovian diffusion. Depending on the latent dimension, our method may have far lower computational complexity in higher dimensions. Moreover, we propose novel equations for stochastic quantum mechanics, resulting in quadratic computational complexity with respect to the number of dimensions. Numerical simulations verify our theoretical findings and show a significant advantage of our method compared to other deep-learning-based approaches used for quantum mechanics. △ Less

Submitted 4 June, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.15598 [pdf, other]

ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models

Authors: Suzanna Parkinson, Greg Ongie, Rebecca Willett

Abstract: Neural networks often operate in the overparameterized regime, in which there are far more parameters than training samples, allowing the training data to be fit perfectly. That is, training the network effectively learns an interpolating function, and properties of the interpolant affect predictions the network will make on new samples. This manuscript explores how properties of such functions le… ▽ More Neural networks often operate in the overparameterized regime, in which there are far more parameters than training samples, allowing the training data to be fit perfectly. That is, training the network effectively learns an interpolating function, and properties of the interpolant affect predictions the network will make on new samples. This manuscript explores how properties of such functions learned by neural networks of depth greater than two layers. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function; it reflects the function space bias associated with the architecture. Our results show that adding additional linear layers to the input side of a shallow ReLU network yields a representation cost favoring functions with low mixed variation - that is, it has limited variation in directions orthogonal to a low-dimensional subspace and can be well approximated by a single- or multi-index model. Such functions may be represented by the composition of a function with low two-layer representation cost and a low-rank linear operator. Our experiments confirm this behavior in standard network training regimes. They additionally show that linear layers can improve generalization and the learned network is well-aligned with the true latent low-dimensional linear subspace when data is generated using a multi-index model. △ Less

Submitted 26 June, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2301.12600 [pdf, other]

Bagging Provides Assumption-free Stability

Authors: Jake A. Soloff, Rina Foygel Barber, Rebecca Willett

Abstract: Bagging is an important technique for stabilizing machine learning models. In this paper, we derive a finite-sample guarantee on the stability of bagging for any model. Our result places no assumptions on the distribution of the data, on the properties of the base algorithm, or on the dimensionality of the covariates. Our guarantee applies to many variants of bagging and is optimal up to a constan… ▽ More Bagging is an important technique for stabilizing machine learning models. In this paper, we derive a finite-sample guarantee on the stability of bagging for any model. Our result places no assumptions on the distribution of the data, on the properties of the base algorithm, or on the dimensionality of the covariates. Our guarantee applies to many variants of bagging and is optimal up to a constant. Empirical results validate our findings, showing that bagging successfully stabilizes even highly unstable base algorithms. △ Less

Submitted 25 April, 2024; v1 submitted 29 January, 2023; originally announced January 2023.

arXiv:2301.11961 [pdf, other]

Reduced-Order Autodifferentiable Ensemble Kalman Filters

Authors: Yuming Chen, Daniel Sanz-Alonso, Rebecca Willett

Abstract: This paper introduces a computational framework to reconstruct and forecast a partially observed state that evolves according to an unknown or expensive-to-simulate dynamical system. Our reduced-order autodifferentiable ensemble Kalman filters (ROAD-EnKFs) learn a latent low-dimensional surrogate model for the dynamics and a decoder that maps from the latent space to the state space. The learned d… ▽ More This paper introduces a computational framework to reconstruct and forecast a partially observed state that evolves according to an unknown or expensive-to-simulate dynamical system. Our reduced-order autodifferentiable ensemble Kalman filters (ROAD-EnKFs) learn a latent low-dimensional surrogate model for the dynamics and a decoder that maps from the latent space to the state space. The learned dynamics and decoder are then used within an ensemble Kalman filter to reconstruct and forecast the state. Numerical experiments show that if the state dynamics exhibit a hidden low-dimensional structure, ROAD-EnKFs achieve higher accuracy at lower computational cost compared to existing methods. If such structure is not expressed in the latent state dynamics, ROAD-EnKFs achieve similar accuracy at lower cost, making them a promising approach for surrogate state reconstruction and forecasting. △ Less

Submitted 27 January, 2023; originally announced January 2023.

arXiv:2212.14312 [pdf, ps, other]

Embeddings of von Neumann algebras into uniform Roe algebras and quasi-local algebras

Authors: Florent P. Baudier, Bruno de Mendonça Braga, Ilijas Farah, Alessandro Vignati, Rufus Willett

Abstract: We study which von Neumann algebras can be embedded into uniform Roe algebras and quasi-local algebras associated to a uniformly locally finite metric space $X$. Under weak assumptions, these $\mathrm{C}^*$-algebras contain embedded copies of $\prod_{k}\mathrm{M}_{n_k}(\mathbb C)$ for any \emph{bounded} countable (possibly finite) collection $(n_k)_k$ of natural numbers; we aim to show that they c… ▽ More We study which von Neumann algebras can be embedded into uniform Roe algebras and quasi-local algebras associated to a uniformly locally finite metric space $X$. Under weak assumptions, these $\mathrm{C}^*$-algebras contain embedded copies of $\prod_{k}\mathrm{M}_{n_k}(\mathbb C)$ for any \emph{bounded} countable (possibly finite) collection $(n_k)_k$ of natural numbers; we aim to show that they cannot contain any other von Neumann algebras. One of our main results shows that $L_\infty[0,1]$ does not embed into any of those algebras, even by a not-necessarily-normal $*$-homomorphism. In particular, it follows from the structure theory of von Neumann algebras that any von Neumann algebra which embeds into such algebra must be of the form $\prod_{k}\mathrm{M}_{n_k}(\mathbb C)$ for some countable (possibly finite) collection $(n_k)_k$ of natural numbers. Under additional assumptions, we also show that the sequence $(n_k)_k$ has to be bounded: in other words, the only embedded von Neumann algebras are the ``obvious'' ones. △ Less

Submitted 17 February, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

arXiv:2211.15856 [pdf, other]

Beyond Ensemble Averages: Leveraging Climate Model Ensembles for Subseasonal Forecasting

Authors: Elena Orlova, Haokun Liu, Raphael Rossellini, Benjamin A. Cash, Rebecca Willett

Abstract: Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as post-processing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dat… ▽ More Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as post-processing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and two-meter temperature two weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multi-model approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability. △ Less

Submitted 3 June, 2024; v1 submitted 28 November, 2022; originally announced November 2022.

arXiv:2211.01554 [pdf, other]

Embed and Emulate: Learning to estimate parameters of dynamical systems with uncertainty quantification

Authors: Ruoxi Jiang, Rebecca Willett

Abstract: This paper explores learning emulators for parameter estimation with uncertainty estimation of high-dimensional dynamical systems. We assume access to a computationally complex simulator that inputs a candidate parameter and outputs a corresponding multichannel time series. Our task is to accurately estimate a range of likely values of the underlying parameters. Standard iterative approaches neces… ▽ More This paper explores learning emulators for parameter estimation with uncertainty estimation of high-dimensional dynamical systems. We assume access to a computationally complex simulator that inputs a candidate parameter and outputs a corresponding multichannel time series. Our task is to accurately estimate a range of likely values of the underlying parameters. Standard iterative approaches necessitate running the simulator many times, which is computationally prohibitive. This paper describes a novel framework for learning feature embeddings of observed dynamics jointly with an emulator that can replace high-cost simulators for parameter estimation. Leveraging a contrastive learning approach, our method exploits intrinsic data properties within and across parameter and trajectory domains. On a coupled 396-dimensional multiscale Lorenz 96 system, our method significantly outperforms a typical parameter estimation method based on predefined metrics and a classical numerical simulator, and with only 1.19% of the baseline's computation time. Ablation studies highlight the potential of explicitly designing learned emulators for parameter estimation by leveraging contrastive learning. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2209.15585 [pdf, other]

Cloud Classification with Unsupervised Deep Learning

Authors: Takuya Kurihana, Ian Foster, Rebecca Willett, Sydney Jenkins, Kathryn Koenig, Ruby Werman, Ricardo Barros Lourenco, Casper Neo, Elisabeth Moyer

Abstract: We present a framework for cloud characterization that leverages modern unsupervised deep learning technologies. While previous neural network-based cloud classification models have used supervised learning methods, unsupervised learning allows us to avoid restricting the model to artificial categories based on historical cloud classification schemes and enables the discovery of novel, more detail… ▽ More We present a framework for cloud characterization that leverages modern unsupervised deep learning technologies. While previous neural network-based cloud classification models have used supervised learning methods, unsupervised learning allows us to avoid restricting the model to artificial categories based on historical cloud classification schemes and enables the discovery of novel, more detailed classifications. Our framework learns cloud features directly from radiance data produced by NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) satellite instrument, deriving cloud characteristics from millions of images without relying on pre-defined cloud types during the training process. We present preliminary results showing that our method extracts physically relevant information from radiance data and produces meaningful cloud classes. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: 5 pages, 6 figures, Proceedings for Climate Informatics Workshop 2019 Paris

arXiv:2205.04704 [pdf, other]

$C^*$-algebras with finite complexity

Authors: Arturo Jaime, Rufus Willett

Abstract: Complexity rank for $C^*$-algebras was introduced by the second author and Yu for applications towards the UCT: very roughly, this rank is at most $n$ if you can repeatedly cut the $C^*$-algebra in half at most $n$ times, and end up with something finite dimensional. In this paper, we study complexity rank, and also a weak complexity rank that we introduce; having weak complexity rank at most one… ▽ More Complexity rank for $C^*$-algebras was introduced by the second author and Yu for applications towards the UCT: very roughly, this rank is at most $n$ if you can repeatedly cut the $C^*$-algebra in half at most $n$ times, and end up with something finite dimensional. In this paper, we study complexity rank, and also a weak complexity rank that we introduce; having weak complexity rank at most one can be thought of as `two-colored local finite-dimensionality'. We first show that for separable, unital, and simple $C^*$-algebras, weak complexity rank one is equivalent to the conjunction of nuclear dimension one and real rank zero. In particular, this shows that the UCT for all nuclear $C^*$-algebras is equivalent to equality of the weak complexity rank and the complexity ranks for Kirchberg algebras with zero $K$-theory groups. However, we also show using a $K$-theoretic obstruction (torsion in $K_1$) that weak complexity rank one and complexity rank one are not the same in general. We then use the Kirchberg-Phillips classification theorem to compute the complexity rank of all UCT Kirchberg algebras: it is always one or two, with the rank one case occurring if and only if the $K_1$-group is torsion free. △ Less

Submitted 11 October, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

Comments: The second version incorporates various referee suggestions, including a substantial improvement to the results of Section 3.2. This should be the final version, to appear in the Münster Journal of Mathematics

MSC Class: 47L85 (Primary); 46L80; 46L35 (Secondary)

arXiv:2203.08339 [pdf, other]

NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction

Authors: Yi Ding, Avinash Rao, Hyebin Song, Rebecca Willett, Henry Hoffmann

Abstract: Datacenters execute large computational jobs, which are composed of smaller tasks. A job completes when all its tasks finish, so stragglers -- rare, yet extremely slow tasks -- are a major impediment to datacenter performance. Accurately predicting stragglers would enable proactive intervention, allowing datacenter operators to mitigate stragglers before they delay a job. While much prior work app… ▽ More Datacenters execute large computational jobs, which are composed of smaller tasks. A job completes when all its tasks finish, so stragglers -- rare, yet extremely slow tasks -- are a major impediment to datacenter performance. Accurately predicting stragglers would enable proactive intervention, allowing datacenter operators to mitigate stragglers before they delay a job. While much prior work applies machine learning to predict computer system performance, these approaches rely on complete labels -- i.e., sufficient examples of all possible behaviors, including straggling and non-straggling -- or strong assumptions about the underlying latency distributions -- e.g., whether Gaussian or not. Within a running job, however, none of this information is available until stragglers have revealed themselves when they have already delayed the job. To predict stragglers accurately and early without labeled positive examples or assumptions on latency distributions, this paper presents NURD, a novel Negative-Unlabeled learning approach with Reweighting and Distribution-compensation that only trains on negative and unlabeled streaming data. The key idea is to train a predictor using finished tasks of non-stragglers to predict latency for unlabeled running tasks, and then reweight each unlabeled task's prediction based on a weighting function of its feature space. We evaluate NURD on two production traces from Google and Alibaba, and find that compared to the best baseline approach, NURD produces 2--11 percentage point increases in the F1 score in terms of prediction accuracy, and 2.0--8.8 percentage point improvements in job completion time. △ Less

Submitted 13 August, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

arXiv:2202.00856 [pdf, other]

The Role of Linear Layers in Nonlinear Interpolating Networks

Authors: Greg Ongie, Rebecca Willett

Abstract: This paper explores the implicit bias of overparameterized neural networks of depth greater than two layers. Our framework considers a family of networks of varying depth that all have the same capacity but different implicitly defined representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to… ▽ More This paper explores the implicit bias of overparameterized neural networks of depth greater than two layers. Our framework considers a family of networks of varying depth that all have the same capacity but different implicitly defined representation costs. The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function; it reflects the function space bias associated with the architecture. Our results show that adding linear layers to a ReLU network yields a representation cost that reflects a complex interplay between the alignment and sparsity of ReLU units. Specifically, using a neural network to fit training data with minimum representation cost yields an interpolating function that is constant in directions perpendicular to a low-dimensional subspace on which a parsimonious interpolant exists. △ Less

Submitted 1 February, 2022; originally announced February 2022.

arXiv:2110.07435 [pdf, other]

Adaptive Differentially Private Empirical Risk Minimization

Authors: Xiaoxia Wu, Lingxiao Wang, Irina Cristali, Quanquan Gu, Rebecca Willett

Abstract: We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization. At each iteration, the random noise added to the gradient is optimally adapted to the stepsize; we name this process adaptive differentially private (ADP) learning. Given the same privacy budget, we prove that the ADP method considerably improves the utility guarantee compared t… ▽ More We propose an adaptive (stochastic) gradient perturbation method for differentially private empirical risk minimization. At each iteration, the random noise added to the gradient is optimally adapted to the stepsize; we name this process adaptive differentially private (ADP) learning. Given the same privacy budget, we prove that the ADP method considerably improves the utility guarantee compared to the standard differentially private method in which vanilla random noise is added. Our method is particularly useful for gradient-based algorithms with time-varying learning rates, including variants of AdaGrad (Duchi et al., 2011). We provide extensive numerical experiments to demonstrate the effectiveness of the proposed adaptive differentially private algorithm. △ Less

Submitted 24 October, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2107.07687 [pdf, other]

Auto-differentiable Ensemble Kalman Filters

Authors: Yuming Chen, Daniel Sanz-Alonso, Rebecca Willett

Abstract: Data assimilation is concerned with sequentially estimating a temporally-evolving state. This task, which arises in a wide range of scientific and engineering applications, is particularly challenging when the state is high-dimensional and the state-space dynamics are unknown. This paper introduces a machine learning framework for learning dynamical systems in data assimilation. Our auto-different… ▽ More Data assimilation is concerned with sequentially estimating a temporally-evolving state. This task, which arises in a wide range of scientific and engineering applications, is particularly challenging when the state is high-dimensional and the state-space dynamics are unknown. This paper introduces a machine learning framework for learning dynamical systems in data assimilation. Our auto-differentiable ensemble Kalman filters (AD-EnKFs) blend ensemble Kalman filters for state recovery with machine learning tools for learning the dynamics. In doing so, AD-EnKFs leverage the ability of ensemble Kalman filters to scale to high-dimensional states and the power of automatic differentiation to train high-dimensional surrogate models for the dynamics. Numerical results using the Lorenz-96 model show that AD-EnKFs outperform existing methods that use expectation-maximization or particle filters to merge data assimilation and machine learning. In addition, AD-EnKFs are easy to implement and require minimal tuning. △ Less

Submitted 19 July, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

arXiv:2106.12034 [pdf, other]

Pure Exploration in Kernel and Neural Bandits

Authors: Yinglun Zhu, Dongruo Zhou, Ruoxi Jiang, Quanquan Gu, Rebecca Willett, Robert Nowak

Abstract: We study pure exploration in bandits, where the dimension of the feature representation can be much larger than the number of arms. To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecification. Our approach is conceptually very different from existing works th… ▽ More We study pure exploration in bandits, where the dimension of the feature representation can be much larger than the number of arms. To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecification. Our approach is conceptually very different from existing works that can either only handle low-dimensional linear bandits or passively deal with model misspecification. We showcase the application of our approach to two pure exploration settings that were previously under-studied: (1) the reward function belongs to a possibly infinite-dimensional Reproducing Kernel Hilbert Space, and (2) the reward function is nonlinear and can be approximated by neural networks. Our main results provide sample complexity guarantees that only depend on the effective dimension of the feature spaces in the kernel or neural representations. Extensive experiments conducted on both synthetic and real-world datasets demonstrate the efficacy of our methods. △ Less

Submitted 17 March, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

arXiv:2106.11391 [pdf, ps, other]

doi 10.1007/s00222-022-01140-x

Uniform Roe algebras of uniformly locally finite metric spaces are rigid

Authors: Florent P. Baudier, Bruno de Mendonça Braga, Ilijas Farah, Ana Khukhro, Alessandro Vignati, Rufus Willett

Abstract: We show that if $X$ and $Y$ are uniformly locally finite metric spaces whose uniform Roe algebras, $\cstu(X)$ and $\cstu(Y)$, are isomorphic as \cstar-algebras, then $X$ and $Y$ are coarsely equivalent metric spaces. Moreover, we show that coarse equivalence between $X$ and $Y$ is equivalent to Morita equivalence between $\cstu(X)$ and $\cstu(Y)$. As an application, we obtain that if $Γ$ and $Λ$ a… ▽ More We show that if $X$ and $Y$ are uniformly locally finite metric spaces whose uniform Roe algebras, $\cstu(X)$ and $\cstu(Y)$, are isomorphic as \cstar-algebras, then $X$ and $Y$ are coarsely equivalent metric spaces. Moreover, we show that coarse equivalence between $X$ and $Y$ is equivalent to Morita equivalence between $\cstu(X)$ and $\cstu(Y)$. As an application, we obtain that if $Γ$ and $Λ$ are finitely generated groups, then the crossed products $\ell_\infty(Γ)\rtimes_rΓ$ and $ \ell_\infty(Λ)\rtimes_rΛ$ are isomorphic if and only if $Γ$ and $Λ$ are bi-Lipschitz equivalent. △ Less

Submitted 8 June, 2022; v1 submitted 21 June, 2021; originally announced June 2021.

Comments: 26 pages, second version with revisions

arXiv:2104.10766 [pdf, other]

The UCT for $C^*$-algebras with finite complexity

Authors: Rufus Willett, Guoliang Yu

Abstract: A $C^*$-algebra satisfies the Universal Coefficient Theorem (UCT) of Rosenberg and Schochet if it is equivalent in Kasparov's $KK$-theory to a commutative $C^*$-algebra. This paper is motivated by the problem of establishing the range of validity of the UCT, and in particular, whether the UCT holds for all nuclear $C^*$-algebras. We introduce the idea of a $C^*$-algebra that "decomposes" over a… ▽ More A $C^*$-algebra satisfies the Universal Coefficient Theorem (UCT) of Rosenberg and Schochet if it is equivalent in Kasparov's $KK$-theory to a commutative $C^*$-algebra. This paper is motivated by the problem of establishing the range of validity of the UCT, and in particular, whether the UCT holds for all nuclear $C^*$-algebras. We introduce the idea of a $C^*$-algebra that "decomposes" over a class $\mathcal{C}$ of $C^*$-algebras. Roughly, this means that locally, there are approximately central elements that approximately cut the $C^*$-algebra into two $C^*$-subalgebras from $\mathcal{C}$ that have well-behaved intersection. We show that if a $C^*$-algebra decomposes over the class of nuclear, UCT $C^*$-algebras, then it satisfies the UCT. The argument is based on controlled $KK$-theory, as introduced by the authors in earlier work. Nuclearity is used via Kasparov's Hilbert module version of Voiculescu's theorem, and Haagerup's theorem that nuclear $C^*$-algebras are amenable We say that a $C^*$-algebra has finite complexity if it is in the smallest class of $C^*$-algebras containing the finite-dimensional $C^*$-algebras, and closed under decomposability; our main result implies that all $C^*$-algebras in this class satisfy the UCT. The class of $C^*$-algebras with finite complexity is large, and comes with an ordinal-number invariant measuring the complexity level. We conjecture that a $C^*$-algebra of finite nuclear dimension and real rank zero has finite complexity; this (and several other related conjectures) would imply the UCT for all separable nuclear $C^*$-algebras. We also give new local formulations of the UCT, and some other necessary and sufficient conditions for the UCT to hold for all nuclear $C^*$-algebras. △ Less

Submitted 12 July, 2023; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: Version 4 contains various small corrections and clarifications, and adds a new subsection 7.1 to clarify the proof of the main theorem, and to what extent the ingredients for this are 'local' in nature. Version 4 is the final version, to appear in Memoirs of the EMS

MSC Class: 19K35; 46L80; 46L85

arXiv:2104.05885 [pdf, ps, other]

doi 10.1112/plms.12510

Dynamic asymptotic dimension and Matui's HK conjecture

Authors: Christian Bönicke, Clément Dell'Aiera, James Gabe, Rufus Willett

Abstract: We prove that the homology groups of a principal ample groupoid vanish in dimensions greater than the dynamic asymptotic dimension of the groupoid (as a side-effect of our methods, we also give a new model of groupoid homology in terms of the Tor groups of homological algebra, which might be of independent interest). As a consequence, the K-theory of the $C^*$-algebras associated with groupoids of… ▽ More We prove that the homology groups of a principal ample groupoid vanish in dimensions greater than the dynamic asymptotic dimension of the groupoid (as a side-effect of our methods, we also give a new model of groupoid homology in terms of the Tor groups of homological algebra, which might be of independent interest). As a consequence, the K-theory of the $C^*$-algebras associated with groupoids of finite dynamic asymptotic dimension can be computed from the homology of the underlying groupoid. In particular, principal ample groupoids with dynamic asymptotic dimension at most two and finitely generated second homology satisfy Matui's HK-conjecture. We also construct explicit maps from the groupoid homology groups to the K-theory groups of their $C^*$-algebras in degrees zero and one, and investigate their properties. △ Less

Submitted 26 November, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: Version 3 adds a new description of the comparison map in dimension one based on the ABC spectral sequence, and shows that it is the same as the description based on map** cones. This is the final version, to appear in Proceedings of the LMS

MSC Class: 22A22; 37B99; 46L80; 46L85

Journal ref: Proc. London Math. Soc. 126 (2023) 1182-1253

arXiv:2103.13555 [pdf, other]

Prediction in the presence of response-dependent missing labels

Authors: Hyebin Song, Garvesh Raskutti, Rebecca Willett

Abstract: In a variety of settings, limitations of sensing technologies or other sampling mechanisms result in missing labels, where the likelihood of a missing label in the training set is an unknown function of the data. For example, satellites used to detect forest fires cannot sense fires below a certain size threshold. In such cases, training datasets consist of positive and pseudo-negative observation… ▽ More In a variety of settings, limitations of sensing technologies or other sampling mechanisms result in missing labels, where the likelihood of a missing label in the training set is an unknown function of the data. For example, satellites used to detect forest fires cannot sense fires below a certain size threshold. In such cases, training datasets consist of positive and pseudo-negative observations where pseudo-negative observations can be either true negatives or undetected positives with small magnitudes. We develop a new methodology and non-convex algorithm P(ositive) U(nlabeled) - O(ccurrence) M(agnitude) M(ixture) which jointly estimates the occurrence and detection likelihood of positive samples, utilizing prior knowledge of the detection mechanism. Our approach uses ideas from positive-unlabeled (PU)-learning and zero-inflated models that jointly estimate the magnitude and occurrence of events. We provide conditions under which our model is identifiable and prove that even though our approach leads to a non-convex objective, any local minimizer has optimal statistical error (up to a log term) and projected gradient descent has geometric convergence rates. We demonstrate on both synthetic data and a California wildfire dataset that our method out-performs existing state-of-the-art approaches. △ Less

Submitted 24 March, 2021; originally announced March 2021.

arXiv:2103.04885 [pdf, other]

doi 10.1109/TGRS.2021.3098008

Data-driven Cloud Clustering via a Rotationally Invariant Autoencoder

Authors: Takuya Kurihana, Elisabeth Moyer, Rebecca Willett, Davis Gilton, Ian Foster

Abstract: Advanced satellite-born remote sensing instruments produce high-resolution multi-spectral data for much of the globe at a daily cadence. These datasets open up the possibility of improved understanding of cloud dynamics and feedback, which remain the biggest source of uncertainty in global climate model projections. As a step towards answering these questions, we describe an automated rotation-inv… ▽ More Advanced satellite-born remote sensing instruments produce high-resolution multi-spectral data for much of the globe at a daily cadence. These datasets open up the possibility of improved understanding of cloud dynamics and feedback, which remain the biggest source of uncertainty in global climate model projections. As a step towards answering these questions, we describe an automated rotation-invariant cloud clustering (RICC) method that leverages deep learning autoencoder technology to organize cloud imagery within large datasets in an unsupervised fashion, free from assumptions about predefined classes. We describe both the design and implementation of this method and its evaluation, which uses a sequence of testing protocols to determine whether the resulting clusters: (1) are physically reasonable, (i.e., embody scientifically relevant distinctions); (2) capture information on spatial distributions, such as textures; (3) are cohesive and separable in latent space; and (4) are rotationally invariant, (i.e., insensitive to the orientation of an image). Results obtained when these evaluation protocols are applied to RICC outputs suggest that the resultant novel cloud clusters capture meaningful aspects of cloud physics, are appropriately spatially coherent, and are invariant to orientations of input images. Our results support the possibility of using an unsupervised data-driven approach for automated clustering and pattern discovery in cloud imagery. △ Less

Submitted 28 October, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: 25 pages. Accepted by IEEE Transactions on Geoscience and Remote Sensing (TGRS)

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2021

arXiv:2102.07944 [pdf, other]

Deep Equilibrium Architectures for Inverse Problems in Imaging

Authors: Davis Gilton, Gregory Ongie, Rebecca Willett

Abstract: Recent efforts on solving inverse problems in imaging via deep neural networks use architectures inspired by a fixed number of iterations of an optimization method. The number of iterations is typically quite small due to difficulties in training networks corresponding to more iterations; the resulting solvers cannot be run for more iterations at test time without incurring significant errors. Thi… ▽ More Recent efforts on solving inverse problems in imaging via deep neural networks use architectures inspired by a fixed number of iterations of an optimization method. The number of iterations is typically quite small due to difficulties in training networks corresponding to more iterations; the resulting solvers cannot be run for more iterations at test time without incurring significant errors. This paper describes an alternative approach corresponding to an infinite number of iterations, yielding a consistent improvement in reconstruction accuracy above state-of-the-art alternatives and where the computational budget can be selected at test time to optimize context-dependent trade-offs between accuracy and computation. The proposed approach leverages ideas from Deep Equilibrium Models, where the fixed-point iteration is constructed to incorporate a known forward model and insights from classical optimization-based reconstruction methods. △ Less

Submitted 2 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

arXiv:2012.00460 [pdf, ps, other]

Functional Linear Regression with Mixed Predictors

Authors: Daren Wang, Zifeng Zhao, Yi Yu, Rebecca Willett

Abstract: We study a functional linear regression model that deals with functional responses and allows for both functional covariates and high-dimensional vector covariates. The proposed model is flexible and nests several functional regression models in the literature as special cases. Based on the theory of reproducing kernel Hilbert spaces (RKHS), we propose a penalized least squares estimator that can… ▽ More We study a functional linear regression model that deals with functional responses and allows for both functional covariates and high-dimensional vector covariates. The proposed model is flexible and nests several functional regression models in the literature as special cases. Based on the theory of reproducing kernel Hilbert spaces (RKHS), we propose a penalized least squares estimator that can accommodate functional variables observed on discrete sample points. Besides a conventional smoothness penalty, a group Lasso-type penalty is further imposed to induce sparsity in the high-dimensional vector predictors. We derive finite sample theoretical guarantees and show that the excess prediction risk of our estimator is minimax optimal. Furthermore, our analysis reveals an interesting phase transition phenomenon that the optimal excess risk is determined jointly by the smoothness and the sparsity of the functional regression coefficients. A novel efficient optimization algorithm based on iterative coordinate descent is devised to handle the smoothness and group penalties simultaneously. Simulation studies and real data applications illustrate the promising performance of the proposed approach compared to the state-of-the-art methods in the literature. △ Less

Submitted 23 August, 2022; v1 submitted 1 December, 2020; originally announced December 2020.

arXiv:2012.00139 [pdf, other]

Model Adaptation for Inverse Problems in Imaging

Authors: Davis Gilton, Gregory Ongie, Rebecca Willett

Abstract: Deep neural networks have been applied successfully to a wide variety of inverse problems arising in computational imaging. These networks are typically trained using a forward model that describes the measurement process to be inverted, which is often incorporated directly into the network itself. However, these approaches are sensitive to changes in the forward model: if at test time the forward… ▽ More Deep neural networks have been applied successfully to a wide variety of inverse problems arising in computational imaging. These networks are typically trained using a forward model that describes the measurement process to be inverted, which is often incorporated directly into the network itself. However, these approaches are sensitive to changes in the forward model: if at test time the forward model varies (even slightly) from the one the network was trained for, the reconstruction performance can degrade substantially. Given a network trained to solve an initial inverse problem with a known forward model, we propose two novel procedures that adapt the network to a change in the forward model, even without full knowledge of the change. Our approaches do not require access to more labeled data (i.e., ground truth images). We show these simple model adaptation approaches achieve empirical success in a variety of inverse problems, including deblurring, super-resolution, and undersampled image reconstruction in magnetic resonance imaging. △ Less

Submitted 12 April, 2021; v1 submitted 30 November, 2020; originally announced December 2020.

arXiv:2011.13993 [pdf, ps, other]

Functional Autoregressive Processes in Reproducing Kernel Hilbert Spaces

Authors: Daren Wang, Zifeng Zhao, Rebecca Willett, Chun Yip Yau

Abstract: We study the estimation and prediction of functional autoregressive~(FAR) processes, a statistical tool for modeling functional time series data. Due to the infinite-dimensional nature of FAR processes, the existing literature addresses its inference via dimension reduction and theoretical results therein require the (unrealistic) assumption of fully observed functional time series. We propose an… ▽ More We study the estimation and prediction of functional autoregressive~(FAR) processes, a statistical tool for modeling functional time series data. Due to the infinite-dimensional nature of FAR processes, the existing literature addresses its inference via dimension reduction and theoretical results therein require the (unrealistic) assumption of fully observed functional time series. We propose an alternative inference framework based on Reproducing Kernel Hilbert Spaces~(RKHS). Specifically, a nuclear norm regularization method is proposed for estimating the transition operators of the FAR process directly from discrete samples of the functional time series. We derive a representer theorem for the FAR process, which enables infinite-dimensional inference without dimension reduction. Sharp theoretical guarantees are established under the (more realistic) assumption that we only have finite discrete samples of the FAR process. Extensive numerical experiments and a real data application of energy consumption prediction are further conducted to illustrate the promising performance of the proposed approach compared to the state-of-the-art methods in the literature. △ Less

Submitted 27 November, 2020; originally announced November 2020.

arXiv:2011.10906 [pdf, other]

Controlled KK-theory I: a Milnor exact sequence

Authors: Rufus Willett, Guoliang Yu

Abstract: We introduce controlled $KK$-theory groups associated to a pair $A$, $B$ of separable $C^*$-algebras. Roughly, these consist of elements of the usual $K$-theory group $K_0(B)$ that approximately commute with elements of $A$. Our main results show that these groups are related to Kasparov's $KK$-groups by a Milnor exact sequence, in such a way that Rørdam's $KL$-group is identified with an inverse… ▽ More We introduce controlled $KK$-theory groups associated to a pair $A$, $B$ of separable $C^*$-algebras. Roughly, these consist of elements of the usual $K$-theory group $K_0(B)$ that approximately commute with elements of $A$. Our main results show that these groups are related to Kasparov's $KK$-groups by a Milnor exact sequence, in such a way that Rørdam's $KL$-group is identified with an inverse limit of our controlled $KK$-groups. In the case that the $C^*$-algebras involved satisfy the UCT, our Milnor exact sequence agrees with the Milnor sequence associated to a $KK$-filtration in the sense of Schochet, although our results are independent of the UCT. Applications to the UCT will be pursued in subsequent work. △ Less

Submitted 21 April, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

Comments: Version 2 has extra material necessary for the follow-up paper, plus some additional clarifications

MSC Class: 19K35; 46L80; 46L85

arXiv:2010.10410 [pdf, other]

Localizing Changes in High-Dimensional Regression Models

Authors: Alessandro Rinaldo, Daren Wang, Qin Wen, Rebecca Willett, Yi Yu

Abstract: This paper addresses the problem of localizing change points in high-dimensional linear regression models with piecewise constant regression coefficients. We develop a dynamic programming approach to estimate the locations of the change points whose performance improves upon the current state-of-the-art, even as the dimensionality, the sparsity of the regression coefficients, the temporal spacing… ▽ More This paper addresses the problem of localizing change points in high-dimensional linear regression models with piecewise constant regression coefficients. We develop a dynamic programming approach to estimate the locations of the change points whose performance improves upon the current state-of-the-art, even as the dimensionality, the sparsity of the regression coefficients, the temporal spacing between two consecutive change points, and the magnitude of the difference of two consecutive regression coefficient vectors are allowed to vary with the sample size. Furthermore, we devise a computationally-efficient refinement procedure that provably reduces the localization error of preliminary estimates of the change points. We demonstrate minimax lower bounds on the localization error that nearly match the upper bound on the localization error of our methodology and show that the signal-to-noise condition we impose is essentially the weakest possible based on information-theoretic arguments. Extensive numerical results support our theoretical findings, and experiments on real air quality data reveal change points supported by historical information not used by the algorithm. △ Less

Submitted 20 October, 2020; originally announced October 2020.

arXiv:2006.03572 [pdf, other]

Detecting Abrupt Changes in High-Dimensional Self-Exciting Poisson Processes

Authors: Daren Wang, Yi Yu, Rebecca Willett

Abstract: High-dimensional self-exciting point processes have been widely used in many application areas to model discrete event data in which past and current events affect the likelihood of future events. In this paper, we are concerned with detecting abrupt changes of the coefficient matrices in discrete-time high-dimensional self-exciting Poisson processes, which have yet to be studied in the existing l… ▽ More High-dimensional self-exciting point processes have been widely used in many application areas to model discrete event data in which past and current events affect the likelihood of future events. In this paper, we are concerned with detecting abrupt changes of the coefficient matrices in discrete-time high-dimensional self-exciting Poisson processes, which have yet to be studied in the existing literature due to both theoretical and computational challenges rooted in the non-stationary and high-dimensional nature of the underlying process. We propose a penalized dynamic programming approach which is supported by a theoretical rate analysis and numerical evidence. △ Less

Submitted 5 June, 2020; originally announced June 2020.

arXiv:2005.06001 [pdf, other]

Deep Learning Techniques for Inverse Problems in Imaging

Authors: Gregory Ongie, Ajil Jalal, Christopher A. Metzler, Richard G. Baraniuk, Alexandros G. Dimakis, Rebecca Willett

Abstract: Recent work in machine learning shows that deep neural networks can be used to solve a wide variety of inverse problems arising in computational imaging. We explore the central prevailing themes of this emerging area and present a taxonomy that can be used to categorize different problems and reconstruction methods. Our taxonomy is organized along two central axes: (1) whether or not a forward mod… ▽ More Recent work in machine learning shows that deep neural networks can be used to solve a wide variety of inverse problems arising in computational imaging. We explore the central prevailing themes of this emerging area and present a taxonomy that can be used to categorize different problems and reconstruction methods. Our taxonomy is organized along two central axes: (1) whether or not a forward model is known and to what extent it is used in training and testing, and (2) whether or not the learning is supervised or unsupervised, i.e., whether or not the training relies on access to matched ground truth image and measurement pairs. We also discuss the trade-offs associated with these different reconstruction approaches, caveats and common failure modes, plus open problems and avenues for future work. △ Less

Submitted 12 May, 2020; originally announced May 2020.

arXiv:2005.03184 [pdf, ps, other]

The UCT problem for nuclear $C^\ast$-algebras

Authors: Nathanial P. Brown, Sarah L. Browne, Rufus Willett, Jianchao Wu

Abstract: In recent years, a large class of nuclear $C^\ast$-algebras have been classified, modulo an assumption on the Universal Coefficient Theorem (UCT). We think this assumption is redundant and propose a strategy for proving it. Indeed, following the original proof of the classification theorem, we propose bridging the gap between reduction theorems and examples. While many such bridges are possible, v… ▽ More In recent years, a large class of nuclear $C^\ast$-algebras have been classified, modulo an assumption on the Universal Coefficient Theorem (UCT). We think this assumption is redundant and propose a strategy for proving it. Indeed, following the original proof of the classification theorem, we propose bridging the gap between reduction theorems and examples. While many such bridges are possible, various approximate ideal structures appear quite promising. △ Less

Submitted 16 November, 2021; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: 13 pages; to appear in the Rocky Mountain Journal of Mathematics

arXiv:2003.12633 [pdf, other]

Detection and Description of Change in Visual Streams

Authors: Davis Gilton, Ruotian Luo, Rebecca Willett, Greg Shakhnarovich

Abstract: This paper presents a framework for the analysis of changes in visual streams: ordered sequences of images, possibly separated by significant time gaps. We propose a new approach to incorporating unlabeled data into training to generate natural language descriptions of change. We also develop a framework for estimating the time of change in visual stream. We use learned representations for change… ▽ More This paper presents a framework for the analysis of changes in visual streams: ordered sequences of images, possibly separated by significant time gaps. We propose a new approach to incorporating unlabeled data into training to generate natural language descriptions of change. We also develop a framework for estimating the time of change in visual stream. We use learned representations for change evidence and consistency of perceived change, and combine these in a regularized graph cut based change detector. Experimental evaluation on visual stream datasets, which we release as part of our contribution, shows that representation learning driven by natural language descriptions significantly improves change detection accuracy, compared to methods that do not rely on language. △ Less

Submitted 9 April, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

arXiv:2003.07429 [pdf, other]

Context-dependent self-exciting point processes: models, methods, and risk bounds in high dimensions

Authors: Lili Zheng, Garvesh Raskutti, Rebecca Willett, Benjamin Mark

Abstract: High-dimensional autoregressive point processes model how current events trigger or inhibit future events, such as activity by one member of a social network can affect the future activity of his or her neighbors. While past work has focused on estimating the underlying network structure based solely on the times at which events occur on each node of the network, this paper examines the more nuanc… ▽ More High-dimensional autoregressive point processes model how current events trigger or inhibit future events, such as activity by one member of a social network can affect the future activity of his or her neighbors. While past work has focused on estimating the underlying network structure based solely on the times at which events occur on each node of the network, this paper examines the more nuanced problem of estimating context-dependent networks that reflect how features associated with an event (such as the content of a social media post) modulate the strength of influences among nodes. Specifically, we leverage ideas from compositional time series and regularization methods in machine learning to conduct network estimation for high-dimensional marked point processes. Two models and corresponding estimators are considered in detail: an autoregressive multinomial model suited to categorical marks and a logistic-normal model suited to marks with mixed membership in different categories. Importantly, the logistic-normal model leads to a convex negative log-likelihood objective and captures dependence across categories. We provide theoretical guarantees for both estimators, which we validate by simulations and a synthetic data-generating model. We further validate our methods through two real data examples and demonstrate the advantages and disadvantages of both approaches. △ Less

Submitted 16 March, 2020; originally announced March 2020.

arXiv:2003.03469 [pdf, ps, other]

Amenability and weak containment for actions of locally compact groups on $C^*$-algebras

Authors: Alcides Buss, Siegfried Echterhoff, Rufus Willett

Abstract: In this work we introduce and study a new notion of amenability for actions of locally compact groups on $C^*$-algebras. Our definition extends the definition of amenability for actions of discrete groups due to Claire Anantharaman-Delaroche. We show that our definition has several characterizations and permanence properties analogous to those known in the discrete case. For example, for actions o… ▽ More In this work we introduce and study a new notion of amenability for actions of locally compact groups on $C^*$-algebras. Our definition extends the definition of amenability for actions of discrete groups due to Claire Anantharaman-Delaroche. We show that our definition has several characterizations and permanence properties analogous to those known in the discrete case. For example, for actions on commutative $C^*$-algebras, we show that our notion of amenability is equivalent to measurewise amenability. Combined with a recent result of Alex Bearden and Jason Crann, this also settles a long standing open problem about the equivalence of topological amenability and measurewise amenability for a second countable $G$-space $X$. We use our new notion of amenability to study when the maximal and reduced crossed products agree. One of our main results generalizes a theorem of Matsumura: we show that for an action of an exact locally compact group $G$ on a locally compact space $X$ the full and reduced crossed products $C_0(X)\rtimes_\max G$ and $C_0(X)\rtimes_{\operatorname{red}} G$ coincide if and only if the action of $G$ on $X$ is amenable. We also show that the analogue of this theorem does not hold for actions on noncommutative $C^*$-algebras. Finally, we study amenability as it relates to more detailed structure in the case of $C^*$-algebras that fibre over an appropriate $G$-space $X$, and the interaction of amenability with various regularity properties such as nuclearity, exactness, and the (L)LP, and the equivariant versions of injectivity and the WEP. △ Less

Submitted 3 May, 2022; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: Some small changes (mostly in the introduction) and an adjustment to the AMS book style. This is the final version which will appear in the Memoirs of the AMS

MSC Class: 46L55; 43A35

arXiv:2002.11255 [pdf, other]

An Optimal Statistical and Computational Framework for Generalized Tensor Estimation

Authors: Rungang Han, Rebecca Willett, Anru R. Zhang

Abstract: This paper describes a flexible framework for generalized low-rank tensor estimation problems that includes many important instances arising from applications in computational imaging, genomics, and network analysis. The proposed estimator consists of finding a low-rank tensor fit to the data under generalized parametric models. To overcome the difficulty of non-convexity in these problems, we int… ▽ More This paper describes a flexible framework for generalized low-rank tensor estimation problems that includes many important instances arising from applications in computational imaging, genomics, and network analysis. The proposed estimator consists of finding a low-rank tensor fit to the data under generalized parametric models. To overcome the difficulty of non-convexity in these problems, we introduce a unified approach of projected gradient descent that adapts to the underlying low-rank structure. Under mild conditions on the loss function, we establish both an upper bound on statistical error and the linear rate of computational convergence through a general deterministic analysis. Then we further consider a suite of generalized tensor estimation problems, including sub-Gaussian tensor PCA, tensor regression, and Poisson and binomial tensor PCA. We prove that the proposed algorithm achieves the minimax optimal rate of convergence in estimation error. Finally, we demonstrate the superiority of the proposed framework via extensive experiments on both simulated and real data. △ Less

Submitted 4 February, 2021; v1 submitted 25 February, 2020; originally announced February 2020.

arXiv:2002.01532 [pdf, ps, other]

Bounded Derivations on Uniform Roe Algebras

Authors: Matthew Lorentz, Rufus Willett

Abstract: We show that if $C_u^*(X)$ is a uniform Roe algebra associated to a bounded geometry metric space X, then all bounded derivations on $C^*_u(X)$ are inner. We show that if $C_u^*(X)$ is a uniform Roe algebra associated to a bounded geometry metric space X, then all bounded derivations on $C^*_u(X)$ are inner. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: 9 pages

MSC Class: 46L85; 46L57

arXiv:1910.01635 [pdf, other]

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case

Authors: Greg Ongie, Rebecca Willett, Daniel Soudry, Nathan Srebro

Abstract: A key element of understanding the efficacy of overparameterized neural networks is characterizing how they represent functions as the number of weights in the network approaches infinity. In this paper, we characterize the norm required to realize a function $f:\mathbb{R}^d\rightarrow\mathbb{R}$ as a single hidden-layer ReLU network with an unbounded number of units (infinite width), but where th… ▽ More A key element of understanding the efficacy of overparameterized neural networks is characterizing how they represent functions as the number of weights in the network approaches infinity. In this paper, we characterize the norm required to realize a function $f:\mathbb{R}^d\rightarrow\mathbb{R}$ as a single hidden-layer ReLU network with an unbounded number of units (infinite width), but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm. This was settled for univariate univariate functions in Savarese et al. (2019), where it was shown that the required norm is determined by the L1-norm of the second derivative of the function. We extend the characterization to multivariate functions (i.e., networks with d input units), relating the required norm to the L1-norm of the Radon transform of a (d+1)/2-power Laplacian of the function. This characterization allows us to show that all functions in Sobolev spaces $W^{s,1}(\mathbb{R})$, $s\geq d+1$, can be represented with bounded norm, to calculate the required norm for several specific functions, and to obtain a depth separation result. These results have important implications for understanding generalization performance and the distinction between neural networks and more traditional kernel learning. △ Less

Submitted 3 October, 2019; originally announced October 2019.

arXiv:1909.06359 [pdf, other]

Localizing Changes in High-Dimensional Vector Autoregressive Processes

Authors: Daren Wang, Yi Yu, Alessandro Rinaldo, Rebecca Willett

Abstract: Autoregressive models capture stochastic processes in which past realizations determine the generative distribution of new data; they arise naturally in a variety of industrial, biomedical, and financial settings. A key challenge when working with such data is to determine when the underlying generative model has changed, as this can offer insights into distinct operating regimes of the underlying… ▽ More Autoregressive models capture stochastic processes in which past realizations determine the generative distribution of new data; they arise naturally in a variety of industrial, biomedical, and financial settings. A key challenge when working with such data is to determine when the underlying generative model has changed, as this can offer insights into distinct operating regimes of the underlying system. This paper describes a novel dynamic programming approach to localizing changes in high-dimensional autoregressive processes and associated error rates that improve upon the prior state of the art. When the model parameters are piecewise constant over time and the corresponding process is piecewise stable, the proposed dynamic programming algorithm consistently localizes change points even as the dimensionality, the sparsity of the coefficient matrices, the temporal spacing between two consecutive change points, and the magnitude of the difference of two consecutive coefficient matrices are allowed to vary with the sample size. Furthermore, the accuracy of initial, coarse change point localization estimates can be boosted via a computationally-efficient refinement algorithm that provably improves the localization error rate. Finally, a comprehensive simulation experiments and a real data analysis are provided to show the numerical superiority of our proposed methods. △ Less

Submitted 29 July, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

Comments: 53 pages; 4 figure

arXiv:1908.09241 [pdf, ps, other]

Approximate ideal structures and K-theory

Authors: Rufus Willett

Abstract: We introduce a notion of approximate ideal structure for a $C^*$-algebra, and use it as a tool to study $K$-theory groups. The notion is motivated by the classical Mayer-Vietoris sequence, by the theory of nuclear dimension as introduced by Winter and Zacharias, and by the theory of dynamical complexity introduced by Guentner, Yu, and the author. A major inspiration for our methods comes from rece… ▽ More We introduce a notion of approximate ideal structure for a $C^*$-algebra, and use it as a tool to study $K$-theory groups. The notion is motivated by the classical Mayer-Vietoris sequence, by the theory of nuclear dimension as introduced by Winter and Zacharias, and by the theory of dynamical complexity introduced by Guentner, Yu, and the author. A major inspiration for our methods comes from recent work of Oyono-Oyono and Yu in the setting of controlled $K$-theory of filtered C*-algebras; we do not, however, use that language in this paper. We give two main applications. The first is a vanishing result for $K$-theory that is relevant to the Baum-Connes conjecture. The second is a permanence result for the Künneth formula in $C^*$-algebra $K$-theory: roughly, this says that if $A$ can be decomposed into a pair of subalgebras $(C,D)$ such that $C$, $D$, and $C\cap D$ all satisfy the Künneth formula, then $A$ itself satisfies the Künneth formula. △ Less

Submitted 8 May, 2020; v1 submitted 24 August, 2019; originally announced August 2019.

Comments: 65 pages

MSC Class: 46L80; 46L85

Showing 1–50 of 144 results for author: Willett, R