-
The Widening Gap: The Benefits and Harms of Generative AI for Novice Programmers
Authors:
James Prather,
Brent Reeves,
Juho Leinonen,
Stephen MacNeil,
Arisoa S. Randrianasolo,
Brett Becker,
Bailey Kimmel,
Jared Wright,
Ben Briggs
Abstract:
Novice programmers often struggle through programming problem solving due to a lack of metacognitive awareness and strategies. Previous research has shown that novices can encounter multiple metacognitive difficulties while programming. Novices are typically unaware of how these difficulties are hindering their progress. Meanwhile, many novices are now programming with generative AI (GenAI), which…
▽ More
Novice programmers often struggle through programming problem solving due to a lack of metacognitive awareness and strategies. Previous research has shown that novices can encounter multiple metacognitive difficulties while programming. Novices are typically unaware of how these difficulties are hindering their progress. Meanwhile, many novices are now programming with generative AI (GenAI), which can provide complete solutions to most introductory programming problems, code suggestions, hints for next steps when stuck, and explain cryptic error messages. Its impact on novice metacognition has only started to be explored. Here we replicate a previous study that examined novice programming problem solving behavior and extend it by incorporating GenAI tools. Through 21 lab sessions consisting of participant observation, interview, and eye tracking, we explore how novices are coding with GenAI tools. Although 20 of 21 students completed the assigned programming problem, our findings show an unfortunate divide in the use of GenAI tools between students who accelerated and students who struggled. Students who accelerated were able to use GenAI to create code they already intended to make and were able to ignore unhelpful or incorrect inline code suggestions. But for students who struggled, our findings indicate that previously known metacognitive difficulties persist, and that GenAI unfortunately can compound them and even introduce new metacognitive difficulties. Furthermore, struggling students often expressed cognitive dissonance about their problem solving ability, thought they performed better than they did, and finished with an illusion of competence. Based on our observations from both groups, we propose ways to scaffold the novice GenAI experience and make suggestions for future work.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Direct Zernike Coefficient Prediction from Point Spread Functions and Extended Images using Deep Learning
Authors:
Yong En Kok,
Alexander Bentley,
Andrew Parkes,
Amanda J. Wright,
Michael G. Somekh,
Michael Pound
Abstract:
Optical imaging quality can be severely degraded by system and sample induced aberrations. Existing adaptive optics systems typically rely on iterative search algorithm to correct for aberrations and improve images. This study demonstrates the application of convolutional neural networks to characterise the optical aberration by directly predicting the Zernike coefficients from two to three phase-…
▽ More
Optical imaging quality can be severely degraded by system and sample induced aberrations. Existing adaptive optics systems typically rely on iterative search algorithm to correct for aberrations and improve images. This study demonstrates the application of convolutional neural networks to characterise the optical aberration by directly predicting the Zernike coefficients from two to three phase-diverse optical images. We evaluated our network on 600,000 simulated Point Spread Function (PSF) datasets randomly generated within the range of -1 to 1 radians using the first 25 Zernike coefficients. The results show that using only three phase-diverse images captured above, below and at the focal plane with an amplitude of 1 achieves a low RMSE of 0.10 radians on the simulated PSF dataset. Furthermore, this approach directly predicts Zernike modes simulated extended 2D samples, while maintaining a comparable RMSE of 0.15 radians. We demonstrate that this approach is effective using only a single prediction step, or can be iterated a small number of times. This simple and straightforward technique provides rapid and accurate method for predicting the aberration correction using three or less phase-diverse images, paving the way for evaluation on real-world dataset.
△ Less
Submitted 24 April, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing
Authors:
Shuyao Li,
Yu Cheng,
Ilias Diakonikolas,
Jelena Diakonikolas,
Rong Ge,
Stephen J. Wright
Abstract:
Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, limiting the use of existing nonconvex algorithms in adversarial settings.
In this paper, we study the problem of finding SOSPs in the strong c…
▽ More
Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, limiting the use of existing nonconvex algorithms in adversarial settings.
In this paper, we study the problem of finding SOSPs in the strong contamination model, where a constant fraction of datapoints are arbitrarily corrupted. We introduce a general framework for efficiently finding an approximate SOSP with \emph{dimension-independent} accuracy guarantees, using $\widetilde{O}({D^2}/ε)$ samples where $D$ is the ambient dimension and $ε$ is the fraction of corrupted datapoints.
As a concrete application of our framework, we apply it to the problem of low rank matrix sensing, develo** efficient and provably robust algorithms that can tolerate corruptions in both the sensing matrices and the measurements. In addition, we establish a Statistical Query lower bound providing evidence that the quadratic dependence on $D$ in the sample complexity is necessary for computationally efficient algorithms.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization
Authors:
Andrew Lowy,
Jonathan Ullman,
Stephen J. Wright
Abstract:
We provide a simple and flexible framework for designing differentially private algorithms to find approximate stationary points of non-convex loss functions. Our framework is based on using a private approximate risk minimizer to "warm start" another private algorithm for finding stationary points. We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-c…
▽ More
We provide a simple and flexible framework for designing differentially private algorithms to find approximate stationary points of non-convex loss functions. Our framework is based on using a private approximate risk minimizer to "warm start" another private algorithm for finding stationary points. We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-convex loss functions. First, we obtain improved rates for finding stationary points of smooth non-convex empirical loss functions. Second, we specialize to quasar-convex functions, which generalize star-convex functions and arise in learning dynamical systems and training some neural nets. We achieve the optimal rate for this class. Third, we give an optimal algorithm for finding stationary points of functions satisfying the Kurdyka-Lojasiewicz (KL) condition. For example, over-parameterized neural networks often satisfy this condition. Fourth, we provide new state-of-the-art rates for stationary points of non-convex population loss functions. Fifth, we obtain improved rates for non-convex generalized linear models. A modification of our algorithm achieves nearly the same rates for second-order stationary points of functions with Lipschitz Hessian, improving over the previous state-of-the-art for each of the above problems.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Extending the Reach of First-Order Algorithms for Nonconvex Min-Max Problems with Cohypomonotonicity
Authors:
Ahmet Alacaoglu,
Donghwan Kim,
Stephen J. Wright
Abstract:
We focus on constrained, $L$-smooth, nonconvex-nonconcave min-max problems either satisfying $ρ$-cohypomonotonicity or admitting a solution to the $ρ$-weakly Minty Variational Inequality (MVI), where larger values of the parameter $ρ>0$ correspond to a greater degree of nonconvexity. These problem classes include examples in two player reinforcement learning, interaction dominant min-max problems,…
▽ More
We focus on constrained, $L$-smooth, nonconvex-nonconcave min-max problems either satisfying $ρ$-cohypomonotonicity or admitting a solution to the $ρ$-weakly Minty Variational Inequality (MVI), where larger values of the parameter $ρ>0$ correspond to a greater degree of nonconvexity. These problem classes include examples in two player reinforcement learning, interaction dominant min-max problems, and certain synthetic test problems on which classical min-max algorithms fail. It has been conjectured that first-order methods can tolerate value of $ρ$ no larger than $\frac{1}{L}$, but existing results in the literature have stagnated at the tighter requirement $ρ< \frac{1}{2L}$. With a simple argument, we obtain optimal or best-known complexity guarantees with cohypomonotonicity or weak MVI conditions for $ρ< \frac{1}{L}$. The algorithms we analyze are inexact variants of Halpern and Krasnosel'skiĭ-Mann (KM) iterations. We also provide algorithms and complexity guarantees in the stochastic case with the same range on $ρ$. Our main insight for the improvements in the convergence analyses is to harness the recently proposed "conic nonexpansiveness" property of operators. As byproducts, we provide a refined analysis for inexact Halpern iteration and propose a stochastic KM iteration with a multilevel Monte Carlo estimator.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Small jet engine reservoir computing digital twin
Authors:
C. J. Wright,
N. Biederman,
B. Gyovai,
D. J. Gauthier,
J. P. Wilhelm
Abstract:
Machine learning was applied to create a digital twin of a numerical simulation of a single-scroll jet engine. A similar model based on the insights gained from this numerical study was used to create a digital twin of a JetCat P100-RX jet engine using only experimental data. Engine data was collected from a custom sensor system measuring parameters such as thrust, exhaust gas temperature, shaft s…
▽ More
Machine learning was applied to create a digital twin of a numerical simulation of a single-scroll jet engine. A similar model based on the insights gained from this numerical study was used to create a digital twin of a JetCat P100-RX jet engine using only experimental data. Engine data was collected from a custom sensor system measuring parameters such as thrust, exhaust gas temperature, shaft speed, weather conditions, etc. Data was gathered while the engine was placed under different test conditions by controlling shaft speed. The machine learning model was generated (trained) using a next-generation reservoir computer, a best-in-class machine learning algorithm for dynamical systems. Once the model was trained, it was used to predict behavior it had never seen with an accuracy of better than 1.8% when compared to the testing data.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
DenseNet and Support Vector Machine classifications of major depressive disorder using vertex-wise cortical features
Authors:
Vladimir Belov,
Tracy Erwin-Grabner,
Ling-Li Zeng,
Christopher R. K. Ching,
Andre Aleman,
Alyssa R. Amod,
Zeynep Basgoze,
Francesco Benedetti,
Bianca Besteher,
Katharina Brosch,
Robin Bülow,
Romain Colle,
Colm G. Connolly,
Emmanuelle Corruble,
Baptiste Couvy-Duchesne,
Kathryn Cullen,
Udo Dannlowski,
Christopher G. Davey,
Annemiek Dols,
Jan Ernsting,
Jennifer W. Evans,
Lukas Fisch,
Paola Fuentes-Claramonte,
Ali Saffet Gonul,
Ian H. Gotlib
, et al. (63 additional authors not shown)
Abstract:
Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, h…
▽ More
Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, has the potential to provide diagnostic and predictive biomarkers for MDD. However, previous attempts to demarcate MDD patients and healthy controls (HC) based on segmented cortical features via linear machine learning approaches have reported low accuracies. In this study, we used globally representative data from the ENIGMA-MDD working group containing an extensive sample of people with MDD (N=2,772) and HC (N=4,240), which allows a comprehensive analysis with generalizable results. Based on the hypothesis that integration of vertex-wise cortical features can improve classification performance, we evaluated the classification of a DenseNet and a Support Vector Machine (SVM), with the expectation that the former would outperform the latter. As we analyzed a multi-site sample, we additionally applied the ComBat harmonization tool to remove potential nuisance effects of site. We found that both classifiers exhibited close to chance performance (balanced accuracy DenseNet: 51%; SVM: 53%), when estimated on unseen sites. Slightly higher classification performance (balanced accuracy DenseNet: 58%; SVM: 55%) was found when the cross-validation folds contained subjects from all sites, indicating site effect. In conclusion, the integration of vertex-wise morphometric features and the use of the non-linear classifier did not lead to the differentiability between MDD and HC. Our results support the notion that MDD classification on this combination of features and classifiers is unfeasible.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Complexity of Single Loop Algorithms for Nonlinear Programming with Stochastic Objective and Constraints
Authors:
Ahmet Alacaoglu,
Stephen J. Wright
Abstract:
We analyze the complexity of single-loop quadratic penalty and augmented Lagrangian algorithms for solving nonconvex optimization problems with functional equality constraints. We consider three cases, in all of which the objective is stochastic and smooth, that is, an expectation over an unknown distribution that is accessed by sampling. The nature of the equality constraints differs among the th…
▽ More
We analyze the complexity of single-loop quadratic penalty and augmented Lagrangian algorithms for solving nonconvex optimization problems with functional equality constraints. We consider three cases, in all of which the objective is stochastic and smooth, that is, an expectation over an unknown distribution that is accessed by sampling. The nature of the equality constraints differs among the three cases: deterministic and linear in the first case, deterministic, smooth and nonlinear in the second case, and stochastic, smooth and nonlinear in the third case. Variance reduction techniques are used to improve the complexity. To find a point that satisfies $\varepsilon$-approximate first-order conditions, we require $\widetilde{O}(\varepsilon^{-3})$ complexity in the first case, $\widetilde{O}(\varepsilon^{-4})$ in the second case, and $\widetilde{O}(\varepsilon^{-5})$ in the third case. For the first and third cases, they are the first algorithms of "single loop" type (that also use $O(1)$ samples at each iteration) that still achieve the best-known complexity guarantees.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Comparing Optimization Targets for Contrast-Consistent Search
Authors:
Hugo Fry,
Seamus Fallows,
Ian Fan,
Jamie Wright,
Nandi Schoots
Abstract:
We investigate the optimization target of Contrast-Consistent Search (CCS), which aims to recover the internal representations of truth of a large language model. We present a new loss function that we call the Midpoint-Displacement (MD) loss function. We demonstrate that for a certain hyper-parameter value this MD loss function leads to a prober with very similar weights to CCS. We further show t…
▽ More
We investigate the optimization target of Contrast-Consistent Search (CCS), which aims to recover the internal representations of truth of a large language model. We present a new loss function that we call the Midpoint-Displacement (MD) loss function. We demonstrate that for a certain hyper-parameter value this MD loss function leads to a prober with very similar weights to CCS. We further show that this hyper-parameter is not optimal and that with a better hyper-parameter the MD loss function attains a higher test accuracy than CCS.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
A randomized algorithm for nonconvex minimization with inexact evaluations and complexity guarantees
Authors:
Shuyao Li,
Stephen J. Wright
Abstract:
We consider minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian (without assuming access to the function value) to achieve approximate second-order optimality. A novel feature of our method is that if an approximate direction of negative curvature is chosen as the step, we choose its sense to be positive or negative with equal probability. We allow gradie…
▽ More
We consider minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian (without assuming access to the function value) to achieve approximate second-order optimality. A novel feature of our method is that if an approximate direction of negative curvature is chosen as the step, we choose its sense to be positive or negative with equal probability. We allow gradients to be inexact in a relative sense and relax the coupling between inexactness thresholds for the first- and second-order optimality conditions. Our convergence analysis includes both an expectation bound based on martingale analysis and a high-probability bound based on concentration inequalities. We apply our algorithm to empirical risk minimization problems and obtain improved gradient sample complexity over existing works.
△ Less
Submitted 26 March, 2024; v1 submitted 28 October, 2023;
originally announced October 2023.
-
Guarantees for Self-Play in Multiplayer Games via Polymatrix Decomposability
Authors:
Revan MacQueen,
James R. Wright
Abstract:
Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself. Self-play is useful for generating large quantities of data for learning, but has the drawback that the agents the learner will face post-training may have dramatically different behavior than the learner came to expect by interacting with itself. For the spe…
▽ More
Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself. Self-play is useful for generating large quantities of data for learning, but has the drawback that the agents the learner will face post-training may have dramatically different behavior than the learner came to expect by interacting with itself. For the special case of two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that perform well against any post-training opponent; however, no such guarantee exists for multiplayer games. We show that in games that approximately decompose into a set of two-player constant-sum games (called constant-sum polymatrix games) where global $ε$-Nash equilibria are boundedly far from Nash equilibria in each subgame (called subgame stability), any no-external-regret algorithm that learns by self-play will produce a strategy with bounded vulnerability. For the first time, our results identify a structural property of multiplayer games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms. We demonstrate our findings through experiments on Leduc poker.
△ Less
Submitted 29 November, 2023; v1 submitted 17 October, 2023;
originally announced October 2023.
-
TpopT: Efficient Trainable Template Optimization on Low-Dimensional Manifolds
Authors:
**gkai Yan,
Shiyu Wang,
Xinyu Rain Wei,
Jimmy Wang,
Zsuzsanna Márka,
Szabolcs Márka,
John Wright
Abstract:
In scientific and engineering scenarios, a recurring task is the detection of low-dimensional families of signals or patterns. A classic family of approaches, exemplified by template matching, aims to cover the search space with a dense template bank. While simple and highly interpretable, it suffers from poor computational efficiency due to unfavorable scaling in the signal space dimensionality.…
▽ More
In scientific and engineering scenarios, a recurring task is the detection of low-dimensional families of signals or patterns. A classic family of approaches, exemplified by template matching, aims to cover the search space with a dense template bank. While simple and highly interpretable, it suffers from poor computational efficiency due to unfavorable scaling in the signal space dimensionality. In this work, we study TpopT (TemPlate OPTimization) as an alternative scalable framework for detecting low-dimensional families of signals which maintains high interpretability. We provide a theoretical analysis of the convergence of Riemannian gradient descent for TpopT, and prove that it has a superior dimension scaling to covering. We also propose a practical TpopT framework for nonparametric signal sets, which incorporates techniques of embedding and kernel interpolation, and is further configurable into a trainable network architecture by unrolled optimization. The proposed trainable TpopT exhibits significantly improved efficiency-accuracy tradeoffs for gravitational wave detection, where matched filtering is currently a method of choice. We further illustrate the general applicability of this approach with experiments on handwritten digit data.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
A one-query lower bound for unitary synthesis and breaking quantum cryptography
Authors:
Alex Lombardi,
Fermi Ma,
John Wright
Abstract:
The Unitary Synthesis Problem (Aaronson-Kuperberg 2007) asks whether any $n$-qubit unitary $U$ can be implemented by an efficient quantum algorithm $A$ augmented with an oracle that computes an arbitrary Boolean function $f$. In other words, can the task of implementing any unitary be efficiently reduced to the task of implementing any Boolean function?
In this work, we prove a one-query lower b…
▽ More
The Unitary Synthesis Problem (Aaronson-Kuperberg 2007) asks whether any $n$-qubit unitary $U$ can be implemented by an efficient quantum algorithm $A$ augmented with an oracle that computes an arbitrary Boolean function $f$. In other words, can the task of implementing any unitary be efficiently reduced to the task of implementing any Boolean function?
In this work, we prove a one-query lower bound for unitary synthesis. We show that there exist unitaries $U$ such that no quantum polynomial-time oracle algorithm $A^f$ can implement $U$, even approximately, if it only makes one (quantum) query to $f$. Our approach also has implications for quantum cryptography: we prove (relative to a random oracle) the existence of quantum cryptographic primitives that remain secure against all one-query adversaries $A^{f}$. Since such one-query algorithms can decide any language, solve any classical search problem, and even prepare any quantum state, our result suggests that implementing random unitaries and breaking quantum cryptography may be harder than all of these tasks.
To prove this result, we formulate unitary synthesis as an efficient challenger-adversary game, which enables proving lower bounds by analyzing the maximum success probability of an adversary $A^f$. Our main technical insight is to identify a natural spectral relaxation of the one-query optimization problem, which we bound using tools from random matrix theory.
We view our framework as a potential avenue to rule out polynomial-query unitary synthesis, and we state conjectures in this direction.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Accelerating optimization over the space of probability measures
Authors:
Shi Chen,
Qin Li,
Oliver Tse,
Stephen J. Wright
Abstract:
The acceleration of gradient-based optimization methods is a subject of significant practical and theoretical importance, particularly within machine learning applications. While much attention has been directed towards optimizing within Euclidean space, the need to optimize over spaces of probability measures in machine learning motivates exploration of accelerated gradient methods in this contex…
▽ More
The acceleration of gradient-based optimization methods is a subject of significant practical and theoretical importance, particularly within machine learning applications. While much attention has been directed towards optimizing within Euclidean space, the need to optimize over spaces of probability measures in machine learning motivates exploration of accelerated gradient methods in this context too. To this end, we introduce a Hamiltonian-flow approach analogous to momentum-based approaches in Euclidean space. We demonstrate that, in the continuous-time setting, algorithms based on this approach can achieve convergence rates of arbitrarily high order. We complement our findings with numerical examples.
△ Less
Submitted 18 June, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
How to Evaluate Behavioral Models
Authors:
Greg d'Eon,
Sophie Greenwood,
Kevin Leyton-Brown,
James R. Wright
Abstract:
Researchers building behavioral models, such as behavioral game theorists, use experimental data to evaluate predictive models of human behavior. However, there is little agreement about which loss function should be used in evaluations, with error rate, negative log-likelihood, cross-entropy, Brier score, and squared L2 error all being common choices. We attempt to offer a principled answer to th…
▽ More
Researchers building behavioral models, such as behavioral game theorists, use experimental data to evaluate predictive models of human behavior. However, there is little agreement about which loss function should be used in evaluations, with error rate, negative log-likelihood, cross-entropy, Brier score, and squared L2 error all being common choices. We attempt to offer a principled answer to the question of which loss functions should be used for this task, formalizing axioms that we argue loss functions should satisfy. We construct a family of loss functions, which we dub "diagonal bounded Bregman divergences", that satisfy all of these axioms. These rule out many loss functions used in practice, but notably include squared L2 error; we thus recommend its use for evaluating behavioral models.
△ Less
Submitted 22 February, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Correcting auto-differentiation in neural-ODE training
Authors:
Yewei Xu,
Shi Chen,
Qin Li,
Stephen J. Wright
Abstract:
Does the use of auto-differentiation yield reasonable updates to deep neural networks that represent neural ODEs? Through mathematical analysis and numerical evidence, we find that when the neural network employs high-order forms to approximate the underlying ODE flows (such as the Linear Multistep Method (LMM)), brute-force computation using auto-differentiation often produces non-converging arti…
▽ More
Does the use of auto-differentiation yield reasonable updates to deep neural networks that represent neural ODEs? Through mathematical analysis and numerical evidence, we find that when the neural network employs high-order forms to approximate the underlying ODE flows (such as the Linear Multistep Method (LMM)), brute-force computation using auto-differentiation often produces non-converging artificial oscillations. In the case of Leapfrog, we propose a straightforward post-processing technique that effectively eliminates these oscillations, rectifies the gradient computation and thus respects the updates of the underlying flow.
△ Less
Submitted 3 June, 2023;
originally announced June 2023.
-
Differentially Private Optimization for Smooth Nonconvex ERM
Authors:
Changyu Gao,
Stephen J. Wright
Abstract:
We develop simple differentially private optimization algorithms that move along directions of (expected) descent to find an approximate second-order solution for nonconvex ERM. We use line search, mini-batching, and a two-phase strategy to improve the speed and practicality of the algorithm. Numerical experiments demonstrate the effectiveness of these approaches.
We develop simple differentially private optimization algorithms that move along directions of (expected) descent to find an approximate second-order solution for nonconvex ERM. We use line search, mini-batching, and a two-phase strategy to improve the speed and practicality of the algorithm. Numerical experiments demonstrate the effectiveness of these approaches.
△ Less
Submitted 9 June, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
Enhancing Neural Network Differential Equation Solvers
Authors:
Matthew J. H. Wright
Abstract:
We motivate the use of neural networks for the construction of numerical solutions to differential equations. We prove that there exists a feed-forward neural network that can arbitrarily minimise an objective function that is zero at the solution of Poisson's equation, allowing us to guarantee that neural network solution estimates can get arbitrarily close to the exact solutions. We also show ho…
▽ More
We motivate the use of neural networks for the construction of numerical solutions to differential equations. We prove that there exists a feed-forward neural network that can arbitrarily minimise an objective function that is zero at the solution of Poisson's equation, allowing us to guarantee that neural network solution estimates can get arbitrarily close to the exact solutions. We also show how these estimates can be appreciably enhanced through various strategies, in particular through the construction of error correction networks, for which we propose a general method. We conclude by providing numerical experiments that attest to the validity of all such strategies for variants of Poisson's equation.
△ Less
Submitted 28 December, 2022;
originally announced January 2023.
-
Multi-output multilevel best linear unbiased estimators via semidefinite programming
Authors:
M. Croci,
K. E. Willcox,
S. J. Wright
Abstract:
Multifidelity forward uncertainty quantification (UQ) problems often involve multiple quantities of interest and heterogeneous models (e.g., different grids, equations, dimensions, physics, surrogate and reduced-order models). While computational efficiency is key in this context, multi-output strategies in multilevel/multifidelity methods are either sub-optimal or non-existent. In this paper we e…
▽ More
Multifidelity forward uncertainty quantification (UQ) problems often involve multiple quantities of interest and heterogeneous models (e.g., different grids, equations, dimensions, physics, surrogate and reduced-order models). While computational efficiency is key in this context, multi-output strategies in multilevel/multifidelity methods are either sub-optimal or non-existent. In this paper we extend multilevel best linear unbiased estimators (MLBLUE) to multi-output forward UQ problems and we present new semidefinite programming formulations for their optimal setup. Not only do these formulations yield the optimal number of samples required, but also the optimal selection of low-fidelity models to use. While existing MLBLUE approaches are single-output only and require a non-trivial nonlinear optimization procedure, the new multi-output formulations can be solved reliably and efficiently. We demonstrate the efficacy of the new methods and formulations in practical UQ problems with model heterogeneity.
△ Less
Submitted 15 May, 2023; v1 submitted 18 January, 2023;
originally announced January 2023.
-
Science Platforms for Heliophysics Data Analysis
Authors:
Monica G. Bobra,
Will T. Barnes,
Thomas Y. Chen,
Mark C. M. Cheung,
Laura A. Hayes,
Jack Ireland,
Miho Janvier,
Michael S. F. Kirk,
James P. Mason,
Stuart J. Mumford,
Paul J. Wright
Abstract:
We recommend that NASA maintain and fund science platforms that enable interactive and scalable data analysis in order to maximize the scientific return of data collected from space-based instruments.
We recommend that NASA maintain and fund science platforms that enable interactive and scalable data analysis in order to maximize the scientific return of data collected from space-based instruments.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
Understanding the Impact of Input Entropy on FPU, CPU, and GPU Power
Authors:
Sridutt Bhalachandra,
Brian Austin,
Samuel Williams,
Nicholas J. Wright
Abstract:
Power is increasingly becoming a limiting resource in high-performance, GPU-accelerated computing systems. Understanding the range and sources of power variation is essential in setting realistic bounds on rack and system peak power, and develo** techniques that minimize energy. While variations arising during manufacturing and other factors like algorithm among others have been previously studi…
▽ More
Power is increasingly becoming a limiting resource in high-performance, GPU-accelerated computing systems. Understanding the range and sources of power variation is essential in setting realistic bounds on rack and system peak power, and develo** techniques that minimize energy. While variations arising during manufacturing and other factors like algorithm among others have been previously studied, this work shows that the program inputs can also severely impact the power consumed not only on the GPU but also CPUs. Power variations of up to 67% were observed on an NVIDIA Ampere A100 GPU for the same algorithm (DGEMM benchmark) and input size with different matrix values. Our investigation shows that the values used as matrix elements, their position, and their uniqueness strongly influence power consumption. The implications of this result on supercomputer performance and energy efficiency are further discussed.
△ Less
Submitted 17 December, 2022;
originally announced December 2022.
-
Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization
Authors:
Xufeng Cai,
Chaobing Song,
Stephen J. Wright,
Jelena Diakonikolas
Abstract:
Nonconvex optimization is central in solving many machine learning problems, in which block-wise structure is commonly encountered. In this work, we propose cyclic block coordinate methods for nonconvex optimization problems with non-asymptotic gradient norm guarantees. Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent prog…
▽ More
Nonconvex optimization is central in solving many machine learning problems, in which block-wise structure is commonly encountered. In this work, we propose cyclic block coordinate methods for nonconvex optimization problems with non-asymptotic gradient norm guarantees. Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent progress on cyclic block coordinate methods. In deterministic settings, our convergence guarantee matches the guarantee of (full-gradient) gradient descent, but with the gradient Lipschitz constant being defined w.r.t.~a Mahalanobis norm. In stochastic settings, we use recursive variance reduction to decrease the per-iteration cost and match the arithmetic operation complexity of current optimal stochastic full-gradient methods, with a unified analysis for both finite-sum and infinite-sum cases. We prove a faster linear convergence result when a Polyak-Łojasiewicz (PŁ) condition holds. To our knowledge, this work is the first to provide non-asymptotic convergence guarantees -- variance-reduced or not -- for a cyclic block coordinate method in general composite (smooth + nonsmooth) nonconvex settings. Our experimental results demonstrate the efficacy of the proposed cyclic scheme in training deep neural nets.
△ Less
Submitted 27 January, 2023; v1 submitted 9 December, 2022;
originally announced December 2022.
-
A DPU Solution for Container Overlay Networks
Authors:
Anton Njavro,
James Tau,
Taylor Groves,
Nicholas J. Wright,
Richard West
Abstract:
There is an increasing demand to incorporate hybrid environments as part of workflows across edge, cloud, and HPC systems. In a such converging environment of cloud and HPC, containers are starting to play a more prominent role, bringing their networking infrastructure along with them. However, the current body of work shows that container overlay networks, which are often used to connect containe…
▽ More
There is an increasing demand to incorporate hybrid environments as part of workflows across edge, cloud, and HPC systems. In a such converging environment of cloud and HPC, containers are starting to play a more prominent role, bringing their networking infrastructure along with them. However, the current body of work shows that container overlay networks, which are often used to connect containers across physical hosts, are ill-suited for the HPC environment. They tend to impose significant overhead and noise, resulting in degraded performance and disturbance to co-processes on the same host. This paper focuses on utilizing a novel class of hardware, Data Processing Unit, to offload the networking stack of overlay networks away from the host onto the DPU. We intend to show that such ancillary offload is possible and that it will result in decreased overhead on host nodes which in turn will improve the performance of running processes.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Non-strategic Econometrics (for Initial Play)
Authors:
Daniel Chui,
Jason Hartline,
James R. Wright
Abstract:
Modelling agent preferences has applications in a range of fields including economics and increasingly, artificial intelligence. These preferences are not always known and thus may need to be estimated from observed behavior, in which case a model is required to map agent preferences to behavior, also known as structural estimation. Traditional models are based on the assumption that agents are pe…
▽ More
Modelling agent preferences has applications in a range of fields including economics and increasingly, artificial intelligence. These preferences are not always known and thus may need to be estimated from observed behavior, in which case a model is required to map agent preferences to behavior, also known as structural estimation. Traditional models are based on the assumption that agents are perfectly rational: that is, they perfectly optimize and behave in accordance with their own interests. Work in the field of behavioral game theory has shown, however, that human agents often make decisions that are imperfectly rational, and the field has developed models that relax the perfect rationality assumption. We apply models developed for predicting behavior towards estimating preferences and show that they outperform both traditional and commonly used benchmark models on data collected from human subjects. In fact, Nash equilibrium and its relaxation, quantal response equilibrium (QRE), can induce an inaccurate estimate of agent preferences when compared against ground truth.
A key finding is that modelling non-strategic behavior, conventionally considered uniform noise, is important for estimating preferences. To this end, we introduce quantal-linear4, a rich non-strategic model. We also propose an augmentation to the popular quantal response equilibrium with a non-strategic component. We call this augmented model QRE+L0 and find an improvement in estimating values over the standard QRE.
QRE+L0 allows for alternative models of non-strategic behavior in addition to quantal-linear4.
△ Less
Submitted 28 February, 2023; v1 submitted 12 August, 2022;
originally announced August 2022.
-
Boosting the Efficiency of Parametric Detection with Hierarchical Neural Networks
Authors:
**gkai Yan,
Robert Colgan,
John Wright,
Zsuzsa Márka,
Imre Bartos,
Szabolcs Márka
Abstract:
Gravitational wave astronomy is a vibrant field that leverages both classic and modern data processing techniques for the understanding of the universe. Various approaches have been proposed for improving the efficiency of the detection scheme, with hierarchical matched filtering being an important strategy. Meanwhile, deep learning methods have recently demonstrated both consistency with matched…
▽ More
Gravitational wave astronomy is a vibrant field that leverages both classic and modern data processing techniques for the understanding of the universe. Various approaches have been proposed for improving the efficiency of the detection scheme, with hierarchical matched filtering being an important strategy. Meanwhile, deep learning methods have recently demonstrated both consistency with matched filtering methods and remarkable statistical performance. In this work, we propose Hierarchical Detection Network (HDN), a novel approach to efficient detection that combines ideas from hierarchical matching and deep learning. The network is trained using a novel loss function, which encodes simultaneously the goals of statistical accuracy and efficiency. We discuss the source of complexity reduction of the proposed model, and describe a general recipe for initialization with each layer specializing in different regions. We demonstrate the performance of HDN with experiments using open LIGO data and synthetic injections, and observe with two-layer models a $79\%$ efficiency gain compared with matched filtering at an equal error rate of $0.2\%$. Furthermore, we show how training a three-layer HDN initialized using two-layer model can further boost both accuracy and efficiency, highlighting the power of multiple simple layers in efficient detection.
△ Less
Submitted 23 July, 2022;
originally announced July 2022.
-
Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections
Authors:
Dustin Morrill,
Ryan D'Orazio,
Marc Lanctot,
James R. Wright,
Michael Bowling,
Amy R. Greenwald
Abstract:
Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of…
▽ More
Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games. Integrating the idea of time selection into counterfactual regret minimization (CFR), we introduce the extensive-form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set. We identify behavioral deviation subsets, the partial sequence deviation types, that subsume previously studied types and lead to efficient EFR instances in games with moderate lengths. In addition, we present a thorough empirical analysis of EFR instantiated with different deviation types in benchmark games, where we find that stronger types typically induce better performance.
△ Less
Submitted 1 June, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
The Case for Technosignatures: Why They May Be Abundant, Long-lived, Highly Detectable, and Unambiguous
Authors:
Jason T. Wright,
Jacob Haqq-Misra,
Adam Frank,
Ravi Kopparapu,
Manasvi Lingam,
Sofia Z. Sheikh
Abstract:
The intuition suggested by the Drake equation implies that technology should be less prevalent than biology in the galaxy. However, it has been appreciated for decades in the SETI community that technosignatures could be more abundant, longer-lived, more detectable, and less ambiguous than biosignatures. We collect the arguments for and against technosignatures' ubiquity and discuss the implicatio…
▽ More
The intuition suggested by the Drake equation implies that technology should be less prevalent than biology in the galaxy. However, it has been appreciated for decades in the SETI community that technosignatures could be more abundant, longer-lived, more detectable, and less ambiguous than biosignatures. We collect the arguments for and against technosignatures' ubiquity and discuss the implications of some properties of technological life that fundamentally differ from nontechnological life in the context of modern astrobiology: It can spread among the stars to many sites, it can be more easily detected at large distances, and it can produce signs that are unambiguously technological. As an illustration in terms of the Drake equation, we consider two Drake-like equations, for technosignatures (calculating N(tech)) and biosignatures (calculating N(bio)). We argue that Earth and humanity may be poor guides to the longevity term L and that its maximum value could be very large, in that technology can outlive its creators and even its host star. We conclude that while the Drake equation implies that N(bio)>>N(tech), it is also plausible that N(tech)>>N(bio). As a consequence, as we seek possible indicators of extraterrestrial life, for instance, via characterization of the atmospheres of habitable exoplanets, we should search for both biosignatures and technosignatures. This exercise also illustrates ways in which biosignature and technosignature searches can complement and supplement each other and how methods of technosignature search, including old ideas from SETI, can inform the search for biosignatures and life generally.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
Detecting and Diagnosing Terrestrial Gravitational-Wave Mimics Through Feature Learning
Authors:
Robert E. Colgan,
Zsuzsa Márka,
**gkai Yan,
Imre Bartos,
John N. Wright,
Szabolcs Márka
Abstract:
As engineered systems grow in complexity, there is an increasing need for automatic methods that can detect, diagnose, and even correct transient anomalies that inevitably arise and can be difficult or impossible to diagnose and fix manually. Among the most sensitive and complex systems of our civilization are the detectors that search for incredibly small variations in distance caused by gravitat…
▽ More
As engineered systems grow in complexity, there is an increasing need for automatic methods that can detect, diagnose, and even correct transient anomalies that inevitably arise and can be difficult or impossible to diagnose and fix manually. Among the most sensitive and complex systems of our civilization are the detectors that search for incredibly small variations in distance caused by gravitational waves -- phenomena originally predicted by Albert Einstein to emerge and propagate through the universe as the result of collisions between black holes and other massive objects in deep space. The extreme complexity and precision of such detectors causes them to be subject to transient noise issues that can significantly limit their sensitivity and effectiveness. In this work, we present a demonstration of a method that can detect and characterize emergent transient anomalies of such massively complex systems. We illustrate the performance, precision, and adaptability of the automated solution via one of the prevalent issues limiting gravitational-wave discoveries: noise artifacts of terrestrial origin that contaminate gravitational wave observatories' highly sensitive measurements and can obscure or even mimic the faint astrophysical signals for which they are listening. Specifically, we demonstrate how a highly interpretable convolutional classifier can automatically learn to detect transient anomalies from auxiliary detector data without needing to observe the anomalies themselves. We also illustrate several other useful features of the model, including how it performs automatic variable selection to reduce tens of thousands of auxiliary data channels to only a few relevant ones; how it identifies behavioral signatures predictive of anomalies in those channels; and how it can be used to investigate individual anomalies and the channels associated with them.
△ Less
Submitted 5 July, 2022; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Resource-Efficient Invariant Networks: Exponential Gains by Unrolled Optimization
Authors:
Sam Buchanan,
**gkai Yan,
Ellie Haber,
John Wright
Abstract:
Achieving invariance to nuisance transformations is a fundamental challenge in the construction of robust and reliable vision systems. Existing approaches to invariance scale exponentially with the dimension of the family of transformations, making them unable to cope with natural variabilities in visual data such as changes in pose and perspective. We identify a common limitation of these approac…
▽ More
Achieving invariance to nuisance transformations is a fundamental challenge in the construction of robust and reliable vision systems. Existing approaches to invariance scale exponentially with the dimension of the family of transformations, making them unable to cope with natural variabilities in visual data such as changes in pose and perspective. We identify a common limitation of these approaches--they rely on sampling to traverse the high-dimensional space of transformations--and propose a new computational primitive for building invariant networks based instead on optimization, which in many scenarios provides a provably more efficient method for high-dimensional exploration than sampling. We provide empirical and theoretical corroboration of the efficiency gains and soundness of our proposed method, and demonstrate its utility in constructing an efficient invariant network for a simple hierarchical object detection task when combined with unrolled optimization. Code for our networks and experiments is available at https://github.com/sdbuch/refine.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Architectural Optimization and Feature Learning for High-Dimensional Time Series Datasets
Authors:
Robert E. Colgan,
**gkai Yan,
Zsuzsa Márka,
Imre Bartos,
Szabolcs Márka,
John N. Wright
Abstract:
As our ability to sense increases, we are experiencing a transition from data-poor problems, in which the central issue is a lack of relevant data, to data-rich problems, in which the central issue is to identify a few relevant features in a sea of observations. Motivated by applications in gravitational-wave astrophysics, we study the problem of predicting the presence of transient noise artifact…
▽ More
As our ability to sense increases, we are experiencing a transition from data-poor problems, in which the central issue is a lack of relevant data, to data-rich problems, in which the central issue is to identify a few relevant features in a sea of observations. Motivated by applications in gravitational-wave astrophysics, we study the problem of predicting the presence of transient noise artifacts in a gravitational wave detector from a rich collection of measurements from the detector and its environment. We argue that feature learning--in which relevant features are optimized from data--is critical to achieving high accuracy. We introduce models that reduce the error rate by over 60% compared to the previous state of the art, which used fixed, hand-crafted features. Feature learning is useful not only because it improves performance on prediction tasks; the results provide valuable information about patterns associated with phenomena of interest that would otherwise be undiscoverable. In our application, features found to be associated with transient noise provide diagnostic information about its origin and suggest mitigation strategies. Learning in high-dimensional settings is challenging. Through experiments with a variety of architectures, we identify two key factors in successful models: sparsity, for selecting relevant variables within the high-dimensional observations; and depth, which confers flexibility for handling complex interactions and robustness with respect to temporal variations. We illustrate their significance through systematic experiments on real detector data. Our results provide experimental corroboration of common assumptions in the machine-learning community and have direct applicability to improving our ability to sense gravitational waves, as well as to many other problem settings with similarly high-dimensional, noisy, or partly irrelevant data.
△ Less
Submitted 5 July, 2022; v1 submitted 27 February, 2022;
originally announced February 2022.
-
Tactile Materials in Practice: Understanding the Experiences of Teachers of the Visually Impaired
Authors:
Mahika Phutane,
Julie Wright,
Brenda Veronica Castro,
Lei Shi,
Simone R. Stern,
Holly M. Lawson,
Shiri Azenkot
Abstract:
Teachers of the visually impaired (TVIs) regularly present tactile materials (tactile graphics, 3D models, and real objects) to students with vision impairments. Researchers have been increasingly interested in designing tools to support the use of tactile materials, but we still lack an in-depth understanding of how tactile materials are created and used in practice today. To address this gap, we…
▽ More
Teachers of the visually impaired (TVIs) regularly present tactile materials (tactile graphics, 3D models, and real objects) to students with vision impairments. Researchers have been increasingly interested in designing tools to support the use of tactile materials, but we still lack an in-depth understanding of how tactile materials are created and used in practice today. To address this gap, we conducted interviews with 21 TVIs and a 3-week diary study with eight of them. We found that tactile materials were regularly used for academic as well as non-academic concepts like tactile literacy, motor ability, and spatial awareness. Real objects and 3D models served as "step** stones" to tactile graphics and our participants preferred to teach with 3D models, despite finding them difficult to create, obtain, and modify. Use of certain materials also carried social implications; participants selected materials that fostered student independence and allow classroom inclusion. We contribute design considerations, encouraging future work on tactile materials to enable student and TVI co-creation, facilitate rapid prototy**, and promote movement and spatial awareness. To support future research in this area, our paper provides a fundamental understanding of current practices. We bridge these practices to established pedagogical approaches and highlight opportunities for growth regarding this important genre of educational materials.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
On the Complexity of a Practical Primal-Dual Coordinate Method
Authors:
Ahmet Alacaoglu,
Volkan Cevher,
Stephen J. Wright
Abstract:
We prove complexity bounds for the primal-dual algorithm with random extrapolation and coordinate descent (PURE-CD), which has been shown to obtain good practical performance for solving convex-concave min-max problems with bilinear coupling. Our complexity bounds either match or improve the best-known results in the literature for both dense and sparse (strongly)-convex-(strongly)-concave problem…
▽ More
We prove complexity bounds for the primal-dual algorithm with random extrapolation and coordinate descent (PURE-CD), which has been shown to obtain good practical performance for solving convex-concave min-max problems with bilinear coupling. Our complexity bounds either match or improve the best-known results in the literature for both dense and sparse (strongly)-convex-(strongly)-concave problems.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
Testing matrix product states
Authors:
Mehdi Soleimanifar,
John Wright
Abstract:
Devising schemes for testing the amount of entanglement in quantum systems has played a crucial role in quantum computing and information theory. Here, we study the problem of testing whether an unknown state $|ψ\rangle$ is a matrix product state (MPS) in the property testing model. MPS are a class of physically-relevant quantum states which arise in the study of quantum many-body systems. A quant…
▽ More
Devising schemes for testing the amount of entanglement in quantum systems has played a crucial role in quantum computing and information theory. Here, we study the problem of testing whether an unknown state $|ψ\rangle$ is a matrix product state (MPS) in the property testing model. MPS are a class of physically-relevant quantum states which arise in the study of quantum many-body systems. A quantum state $|ψ_{1,...,n}\rangle$ comprised of $n$ qudits is said to be an MPS of bond dimension $r$ if the reduced density matrix $ψ_{1,...,k}$ has rank $r$ for each $k \in \{1,...,n\}$. When $r=1$, this corresponds to the set of product states. For larger values of $r$, this yields a more expressive class of quantum states, which are allowed to possess limited amounts of entanglement. In the property testing model, one is given $m$ identical copies of $|ψ\rangle$, and the goal is to determine whether $|ψ\rangle$ is an MPS of bond dimension $r$ or whether $|ψ\rangle$ is far from all such states. For the case of product states, we study the product test, a simple two-copy test previously analyzed by Harrow and Montanaro (FOCS 2010), and a key ingredient in their proof that $\mathsf{QMA(2)}=\mathsf{QMA}(k)$ for $k \geq 2$. We give a new and simpler analysis of the product test which achieves an optimal bound for a wide range of parameters, answering open problems of Harrow and Montanaro (FOCS 2010) and Montanaro and de Wolf (2016). For the case of $r\geq 2$, we give an efficient algorithm for testing whether $|ψ\rangle$ is an MPS of bond dimension $r$ using $m = O(n r^2)$ copies, independent of the dimensions of the qudits, and we show that $Ω(n^{1/2})$ copies are necessary for this task. This lower bound shows that a dependence on the number of qudits $n$ is necessary, in sharp contrast to the case of product states where a constant number of copies suffices.
△ Less
Submitted 5 January, 2022;
originally announced January 2022.
-
A novel data-driven algorithm to predict anomalous prescription based on patient's feature set
Authors:
Qiongge Li,
Jean Wright,
Russell Hales,
Ranh Voong,
Todd McNutt
Abstract:
Appropriate dosing of radiation is crucial to patient safety in radiotherapy. Current quality assurance depends heavily on a peer-review process, where the physicians' peer review on each patient's treatment plan, including dose and fractionation. However, such a process is manual and laborious. Physicians may not identify errors due to time constraints and caseload. We designed a novel prescripti…
▽ More
Appropriate dosing of radiation is crucial to patient safety in radiotherapy. Current quality assurance depends heavily on a peer-review process, where the physicians' peer review on each patient's treatment plan, including dose and fractionation. However, such a process is manual and laborious. Physicians may not identify errors due to time constraints and caseload. We designed a novel prescription anomaly detection algorithm that utilizes historical data to predict anomalous cases. Such a tool can serve as an electronic peer who will assist the peer-review process providing extra safety to the patients. In our primary model, we created two dissimilarity metrics, R and F. R defining how far a new patient's prescription is from historical prescriptions. F represents how far away a patient's feature set is from the group with an identical or similar prescription. We flag prescription if either metric is greater than specific optimized cut-off values. We used thoracic cancer patients (n=2356) as an example and extracted seven features. Here, we report our testing f1 score, between 75%-94% for different treatment technique groups. We also independently validate our results by conducting a mock peer review with three thoracic specialists. Our model has a lower type 2 error rate compared to manual peer-review physicians. Our model has many advantages over traditional machine learning algorithms, particularly in that it does not suffer from class imbalance. It can also explain why it flags each case and separate prescription and non-prescription-related features without learning from the data.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Quantum soundness of testing tensor codes
Authors:
Zhengfeng Ji,
Anand Natarajan,
Thomas Vidick,
John Wright,
Henry Yuen
Abstract:
A locally testable code is an error-correcting code that admits very efficient probabilistic tests of membership. Tensor codes provide a simple family of combinatorial constructions of locally testable codes that generalize the family of Reed-Muller codes. The natural test for tensor codes, the axis-parallel line vs. point test, plays an essential role in constructions of probabilistically checkab…
▽ More
A locally testable code is an error-correcting code that admits very efficient probabilistic tests of membership. Tensor codes provide a simple family of combinatorial constructions of locally testable codes that generalize the family of Reed-Muller codes. The natural test for tensor codes, the axis-parallel line vs. point test, plays an essential role in constructions of probabilistically checkable proofs.
We analyze the axis-parallel line vs. point test as a two-prover game and show that the test is sound against quantum provers sharing entanglement. Our result implies the quantum-soundness of the low individual degree test, which is an essential component of the MIP* = RE theorem. Our proof also generalizes to the infinite-dimensional commuting-operator model of quantum provers.
△ Less
Submitted 6 December, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning
Authors:
Vincent Liu,
James R. Wright,
Martha White
Abstract:
Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regular…
▽ More
Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regularity (AIR), is that actions primarily impact a part of the state (an endogenous component) and have limited impact on the remaining part of the state (an exogenous component). AIR is a strong assumption, but it nonetheless holds in a number of real-world domains including financial markets. We discuss algorithms that exploit the AIR property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration. Finally, we demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world environments where the regularity holds.
△ Less
Submitted 3 May, 2023; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Coordinate Linear Variance Reduction for Generalized Linear Programming
Authors:
Chaobing Song,
Cheuk Yin Lin,
Stephen J. Wright,
Jelena Diakonikolas
Abstract:
We study a class of generalized linear programs (GLP) in a large-scale setting, which includes simple, possibly nonsmooth convex regularizer and simple convex set constraints. By reformulating (GLP) as an equivalent convex-concave min-max problem, we show that the linear structure in the problem can be used to design an efficient, scalable first-order algorithm, to which we give the name \emph{Coo…
▽ More
We study a class of generalized linear programs (GLP) in a large-scale setting, which includes simple, possibly nonsmooth convex regularizer and simple convex set constraints. By reformulating (GLP) as an equivalent convex-concave min-max problem, we show that the linear structure in the problem can be used to design an efficient, scalable first-order algorithm, to which we give the name \emph{Coordinate Linear Variance Reduction} (\textsc{clvr}; pronounced "clever"). \textsc{clvr} yields improved complexity results for (GLP) that depend on the max row norm of the linear constraint matrix in (GLP) rather than the spectral norm. When the regularization terms and constraints are separable, \textsc{clvr} admits an efficient lazy update strategy that makes its complexity bounds scale with the number of nonzero elements of the linear constraint matrix in (GLP) rather than the matrix dimensions. On the other hand, for the special case of linear programs, by exploiting sharpness, we propose a restart scheme for \textsc{clvr} to obtain empirical linear convergence. Then we show that Distributionally Robust Optimization (DRO) problems with ambiguity sets based on both $f$-divergence and Wasserstein metrics can be reformulated as (GLPs) by introducing sparsely connected auxiliary variables. We complement our theoretical guarantees with numerical experiments that verify our algorithm's practical effectiveness, in terms of wall-clock time and number of data passes.
△ Less
Submitted 6 April, 2023; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Unique Games hardness of Quantum Max-Cut, and a conjectured vector-valued Borell's inequality
Authors:
Yeongwoo Hwang,
Joe Neeman,
Ojas Parekh,
Kevin Thompson,
John Wright
Abstract:
The Gaussian noise stability of a function $f:\mathbb{R}^n \to \{-1, 1\}$ is the expected value of $f(\boldsymbol{x}) \cdot f(\boldsymbol{y})$ over $ρ$-correlated Gaussian random variables $\boldsymbol{x}$ and $\boldsymbol{y}$. Borell's inequality states that for $-1 \leq ρ\leq 0$, this is minimized by the halfspace $f(x) = \mathrm{sign}(x_1)$. In this work, we generalize this result to hold for f…
▽ More
The Gaussian noise stability of a function $f:\mathbb{R}^n \to \{-1, 1\}$ is the expected value of $f(\boldsymbol{x}) \cdot f(\boldsymbol{y})$ over $ρ$-correlated Gaussian random variables $\boldsymbol{x}$ and $\boldsymbol{y}$. Borell's inequality states that for $-1 \leq ρ\leq 0$, this is minimized by the halfspace $f(x) = \mathrm{sign}(x_1)$. In this work, we generalize this result to hold for functions $f:\mathbb{R}^n \to S^{k-1}$ which output $k$-dimensional unit vectors. Our main conjecture, which we call the $\textit{vector-valued Borell's inequality}$, asserts that the expected value of $\langle f(\boldsymbol{x}), f(\boldsymbol{y})\rangle$ is minimized by the function $f(x) = x_{\leq k} / \Vert x_{\leq k} \Vert$, where $x_{\leq k} = (x_1, \ldots, x_k)$. We give several pieces of evidence in favor of this conjecture, including a proof that it does indeed hold in the special case of $n = k$.
As an application of this conjecture, we show that it implies several hardness of approximation results for a special case of the local Hamiltonian problem related to the anti-ferromagnetic Heisenberg model known as Quantum Max-Cut. This can be viewed as a natural quantum analogue of the classical Max-Cut problem and has been proposed as a useful testbed for develo** algorithms. We show the following, assuming our conjecture:
(1) The integrality gap of the basic SDP is $0.498$, matching an existing rounding algorithm. Combined with existing results, this shows that the basic SDP does not achieve the optimal approximation ratio.
(2) It is Unique Games-hard (UG-hard) to compute a $(0.956+\varepsilon)$-approximation to the value of the best product state, matching an existing approximation algorithm.
(3) It is UG-hard to compute a $(0.956+\varepsilon)$-approximation to the value of the best (possibly entangled) state.
△ Less
Submitted 28 September, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Measuring the Effectiveness of Digital Hygiene using Historical DNS Data
Authors:
Oliver Farnan,
Gregory Walton,
Joss Wright
Abstract:
This paper describes an ongoing experiment evaluating the efficacy of a digital safety intervention in six high-risk, low capacity Civil Society Organisations (CSOs) in Central Asia. The evaluation takes the form of statistical analysis of DNS traffic in each organisation, obtained via security tools installed by researchers.
The hypothesis is that the digital safety intervention strengthens the…
▽ More
This paper describes an ongoing experiment evaluating the efficacy of a digital safety intervention in six high-risk, low capacity Civil Society Organisations (CSOs) in Central Asia. The evaluation takes the form of statistical analysis of DNS traffic in each organisation, obtained via security tools installed by researchers.
The hypothesis is that the digital safety intervention strengthens the overall digital security posture of the CSOs, as measured by number of malware attacks intercepted by a cloud-based DNS firewall installed on the CSOs networks.
The research collects DNS traffic from CSOs that are participating in the digital safety intervention, and compares a treatment group consisting of four CSOs against DNS traffic from a second group of two CSOs in which the intervention has not yet taken place.
This project is ongoing, with data collection underway at a number of Central Asian CSOs. In this paper we outline the experimental design of the project, and look at the early data coming out of the DNS firewall. This is done to support the ultimate question of whether DNS data such as this can be used to accurately assess the efficacy of digital hygiene efforts.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
An Integrated System for Mobile Image-Based Dietary Assessment
Authors:
Zeman Shao,
Yue Han,
Jiangpeng He,
Runyu Mao,
Janine Wright,
Deborah Kerr,
Carol Boushey,
Fengqing Zhu
Abstract:
Accurate assessment of dietary intake requires improved tools to overcome limitations of current methods including user burden and measurement error. Emerging technologies such as image-based approaches using advanced machine learning techniques coupled with widely available mobile devices present new opportunities to improve the accuracy of dietary assessment that is cost-effective, convenient an…
▽ More
Accurate assessment of dietary intake requires improved tools to overcome limitations of current methods including user burden and measurement error. Emerging technologies such as image-based approaches using advanced machine learning techniques coupled with widely available mobile devices present new opportunities to improve the accuracy of dietary assessment that is cost-effective, convenient and timely. However, the quality and quantity of datasets are essential for achieving good performance for automated image analysis. Building a large image dataset with high quality groundtruth annotation is a challenging problem, especially for food images as the associated nutrition information needs to be provided or verified by trained dietitians with domain knowledge. In this paper, we present the design and development of a mobile, image-based dietary assessment system to capture and analyze dietary intake, which has been deployed in both controlled-feeding and community-dwelling dietary studies. Our system is capable of collecting high quality food images in naturalistic settings and provides groundtruth annotations for develo** new computational approaches.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Deep Networks Provably Classify Data on Curves
Authors:
Tingran Wang,
Sam Buchanan,
Dar Gilboa,
John Wright
Abstract:
Data with low-dimensional nonlinear structure are ubiquitous in engineering and scientific problems. We study a model problem with such structure -- a binary classification task that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere. Aside from mild regularity conditions, we place no restrictions on the configuration of the curves.…
▽ More
Data with low-dimensional nonlinear structure are ubiquitous in engineering and scientific problems. We study a model problem with such structure -- a binary classification task that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere. Aside from mild regularity conditions, we place no restrictions on the configuration of the curves. We prove that when (i) the network depth is large relative to certain geometric properties that set the difficulty of the problem and (ii) the network width and number of samples is polynomial in the depth, randomly-initialized gradient descent quickly learns to correctly classify all points on the two curves with high probability. To our knowledge, this is the first generalization guarantee for deep networks with nonlinear data that depends only on intrinsic data properties. Our analysis proceeds by a reduction to dynamics in the neural tangent kernel (NTK) regime, where the network depth plays the role of a fitting resource in solving the classification problem. In particular, via fine-grained control of the decay properties of the NTK, we demonstrate that when the network is sufficiently deep, the NTK can be locally approximated by a translationally invariant operator on the manifolds and stably inverted over smooth functions, which guarantees convergence and generalization.
△ Less
Submitted 28 October, 2021; v1 submitted 29 July, 2021;
originally announced July 2021.
-
Furthering a Comprehensive SETI Bibliography
Authors:
Julia LaFond,
Jason T. Wright,
Macy J. Huston
Abstract:
In 2019, Reyes & Wright used the NASA Astrophysics Data System (ADS) to initiate a comprehensive bibliography for SETI accessible to the public. Since then, updates to the library have been incomplete, partly due to the difficulty in managing the large number of false positive publications generated by searching ADS using simple search terms. In preparation for a recent update, the scope of the li…
▽ More
In 2019, Reyes & Wright used the NASA Astrophysics Data System (ADS) to initiate a comprehensive bibliography for SETI accessible to the public. Since then, updates to the library have been incomplete, partly due to the difficulty in managing the large number of false positive publications generated by searching ADS using simple search terms. In preparation for a recent update, the scope of the library was revised and reexamined. The scope now includes social sciences and commensal SETI. Results were curated based on five SETI keyword searches: "SETI", "technosignature", "Fermi Paradox," "Drake Equation", and "extraterrestrial intelligence." These keywords returned 553 publications that merited inclusion in the bibliography that were not previously present. A curated library of false positive results is now concurrently maintained to facilitate their exclusion from future searches. A search query and workflow was developed to capture nearly all SETI-related papers indexed by ADS while minimizing false positives. These tools will enable efficient, consistent updates of the SETI library by future curators, and could be adopted for other bibliography projects as well.
△ Less
Submitted 6 July, 2021;
originally announced July 2021.
-
The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models
Authors:
Greg d'Eon,
Jason d'Eon,
James R. Wright,
Kevin Leyton-Brown
Abstract:
Supervised learning models often make systematic errors on rare subsets of the data. When these subsets correspond to explicit labels in the data (e.g., gender, race) such poor performance can be identified straightforwardly. This paper introduces a method for discovering systematic errors that do not correspond to such explicitly labelled subgroups. The key idea is that similar inputs tend to hav…
▽ More
Supervised learning models often make systematic errors on rare subsets of the data. When these subsets correspond to explicit labels in the data (e.g., gender, race) such poor performance can be identified straightforwardly. This paper introduces a method for discovering systematic errors that do not correspond to such explicitly labelled subgroups. The key idea is that similar inputs tend to have similar representations in the final hidden layer of a neural network. We leverage this structure by "shining a spotlight" on this representation space to find contiguous regions where the model performs poorly. We show that the spotlight surfaces semantically meaningful areas of weakness in a wide variety of existing models spanning computer vision, NLP, and recommender systems.
△ Less
Submitted 15 October, 2021; v1 submitted 1 July, 2021;
originally announced July 2021.
-
Disinformation, Stochastic Harm, and Costly Effort: A Principal-Agent Analysis of Regulating Social Media Platforms
Authors:
Shehroze Khan,
James R. Wright
Abstract:
The spread of disinformation on social platforms is harmful to society. This harm may manifest as a gradual degradation of public discourse; but it can also take the form of sudden dramatic events such as the 2021 insurrection on Capitol Hill. The platforms themselves are in the best position to prevent the spread of disinformation, as they have the best access to relevant data and the expertise t…
▽ More
The spread of disinformation on social platforms is harmful to society. This harm may manifest as a gradual degradation of public discourse; but it can also take the form of sudden dramatic events such as the 2021 insurrection on Capitol Hill. The platforms themselves are in the best position to prevent the spread of disinformation, as they have the best access to relevant data and the expertise to use it. However, mitigating disinformation is costly, not only for implementing detection algorithms or employing manual effort, but also because limiting such highly viral content impacts user engagement and potential advertising revenue. Since the costs of harmful content are borne by other entities, the platform will therefore have no incentive to exercise the socially-optimal level of effort. This problem is similar to that of environmental regulation, in which the costs of adverse events are not directly borne by a firm, the mitigation effort of a firm is not observable, and the causal link between a harmful consequence and a specific failure is difficult to prove. For environmental regulation, one solution is to perform costly monitoring to ensure that the firm takes adequate precautions according to a specified rule. However, a fixed rule for classifying disinformation becomes less effective over time, as bad actors can learn to sequentially and strategically bypass it. Encoding our domain as a Markov decision process, we demonstrate that no penalty based on a static rule, no matter how large, can incentivize optimal effort. Penalties based on an adaptive rule can incentivize optimal effort, but counter-intuitively, only if the regulator sufficiently overreacts to harmful events by requiring a greater-than-optimal level of effort. We offer novel insights for the effective regulation of social platforms, highlight inherent challenges, and discuss promising avenues for future work.
△ Less
Submitted 27 June, 2022; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Square Root Principal Component Pursuit: Tuning-Free Noisy Robust Matrix Recovery
Authors:
Junhui Zhang,
**gkai Yan,
John Wright
Abstract:
We propose a new framework -- Square Root Principal Component Pursuit -- for low-rank matrix recovery from observations corrupted with noise and outliers. Inspired by the square root Lasso, this new formulation does not require prior knowledge of the noise level. We show that a single, universal choice of the regularization parameter suffices to achieve reconstruction error proportional to the (a…
▽ More
We propose a new framework -- Square Root Principal Component Pursuit -- for low-rank matrix recovery from observations corrupted with noise and outliers. Inspired by the square root Lasso, this new formulation does not require prior knowledge of the noise level. We show that a single, universal choice of the regularization parameter suffices to achieve reconstruction error proportional to the (a priori unknown) noise level. In comparison, previous formulations such as stable PCP rely on noise-dependent parameters to achieve similar performance, and are therefore challenging to deploy in applications where the noise level is unknown. We validate the effectiveness of our new method through experiments on simulated and real datasets. Our simulations corroborate the claim that a universal choice of the regularization parameter yields near optimal performance across a range of noise levels, indicating that the proposed method outperforms the (somewhat loose) bound proved here.
△ Less
Submitted 28 October, 2021; v1 submitted 16 June, 2021;
originally announced June 2021.
-
ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction
Authors:
Kwan Ho Ryan Chan,
Yaodong Yu,
Chong You,
Haozhi Qi,
John Wright,
Yi Ma
Abstract:
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation. We argue that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets.…
▽ More
This work attempts to provide a plausible theoretical framework that aims to interpret modern deep (convolutional) networks from the principles of data compression and discriminative representation. We argue that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction objective naturally leads to a multi-layer deep network, named ReduNet, which shares common characteristics of modern deep networks. The deep layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer via forward propagation, although they are amenable to fine-tuning via back propagation. All components of so-obtained "white-box" network have precise optimization, statistical, and geometric interpretation. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation in the invariant setting suggests a trade-off between sparsity and invariance, and also indicates that such a deep convolution network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments clearly verify the effectiveness of both the rate reduction objective and the associated ReduNet. All code and data are available at \url{https://github.com/Ma-Lab-Berkeley}.
△ Less
Submitted 28 November, 2021; v1 submitted 21 May, 2021;
originally announced May 2021.
-
Randomized Algorithms for Scientific Computing (RASC)
Authors:
Aydin Buluc,
Tamara G. Kolda,
Stefan M. Wild,
Mihai Anitescu,
Anthony DeGennaro,
John Jakeman,
Chandrika Kamath,
Ramakrishnan Kannan,
Miles E. Lopes,
Per-Gunnar Martinsson,
Kary Myers,
Jelani Nelson,
Juan M. Restrepo,
C. Seshadhri,
Draguna Vrabie,
Brendt Wohlberg,
Stephen J. Wright,
Chao Yang,
Peter Zwart
Abstract:
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and sc…
▽ More
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.
△ Less
Submitted 21 March, 2022; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Generalized Approach to Matched Filtering using Neural Networks
Authors:
**gkai Yan,
Mariam Avagyan,
Robert E. Colgan,
Doğa Veske,
Imre Bartos,
John Wright,
Zsuzsa Márka,
Szabolcs Márka
Abstract:
Gravitational wave science is a pioneering field with rapidly evolving data analysis methodology currently assimilating and inventing deep learning techniques. The bulk of the sophisticated flagship searches of the field rely on the time-tested matched filtering principle within their core. In this paper, we make a key observation on the relationship between the emerging deep learning and the trad…
▽ More
Gravitational wave science is a pioneering field with rapidly evolving data analysis methodology currently assimilating and inventing deep learning techniques. The bulk of the sophisticated flagship searches of the field rely on the time-tested matched filtering principle within their core. In this paper, we make a key observation on the relationship between the emerging deep learning and the traditional techniques: matched filtering is formally equivalent to a particular neural network. This means that a neural network can be constructed analytically to exactly implement matched filtering, and can be further trained on data or boosted with additional complexity for improved performance. Moreover, we show that the proposed neural network architecture can outperform matched filtering, both with or without knowledge of a prior on the parameter distribution. When a prior is given, the proposed neural network can approach the statistically optimal performance. We also propose and investigate two different neural network architectures MNet-Shallow and MNet-Deep, both of which implement matched filtering at initialization and can be trained on data. MNet-Shallow has simpler structure, while MNet-Deep is more flexible and can deal with a wider range of distributions. Our theoretical findings are corroborated by experiments using real LIGO data and synthetic injections, where our proposed methods significantly outperform matched filtering at false positive rates above $5\times 10^{-3}\%$. The fundamental equivalence between matched filtering and neural networks allows us to define a "complexity standard candle" to characterize the relative complexity of the different approaches to gravitational wave signal searches in a common framework. Finally, our results suggest new perspectives on the role of deep learning in gravitational wave detection.
△ Less
Submitted 2 February, 2022; v1 submitted 8 April, 2021;
originally announced April 2021.
-
Towards Learning Food Portion From Monocular Images With Cross-Domain Feature Adaptation
Authors:
Zeman Shao,
Shaobo Fang,
Runyu Mao,
Jiangpeng He,
Janine Wright,
Deborah Kerr,
Carol Jo Boushey,
Fengqing Zhu
Abstract:
We aim to estimate food portion size, a property that is strongly related to the presence of food object in 3D space, from single monocular images under real life setting. Specifically, we are interested in end-to-end estimation of food portion size, which has great potential in the field of personal health management. Unlike image segmentation or object recognition where annotation can be obtaine…
▽ More
We aim to estimate food portion size, a property that is strongly related to the presence of food object in 3D space, from single monocular images under real life setting. Specifically, we are interested in end-to-end estimation of food portion size, which has great potential in the field of personal health management. Unlike image segmentation or object recognition where annotation can be obtained through large scale crowd sourcing, it is much more challenging to collect datasets for portion size estimation since human cannot accurately estimate the size of an object in an arbitrary 2D image without expert knowledge. To address such challenge, we introduce a real life food image dataset collected from a nutrition study where the groundtruth food energy (calorie) is provided by registered dietitians, and will be made available to the research community. We propose a deep regression process for portion size estimation by combining features estimated from both RGB and learned energy distribution domains. Our estimates of food energy achieved state-of-the-art with a MAPE of 11.47%, significantly outperforms non-expert human estimates by 27.56%.
△ Less
Submitted 12 March, 2021;
originally announced March 2021.
-
Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums
Authors:
Chaobing Song,
Stephen J. Wright,
Jelena Diakonikolas
Abstract:
We study structured nonsmooth convex finite-sum optimization that appears widely in machine learning applications, including support vector machines and least absolute deviation. For the primal-dual formulation of this problem, we propose a novel algorithm called \emph{Variance Reduction via Primal-Dual Accelerated Dual Averaging (\vrpda)}. In the nonsmooth and general convex setting, \vrpda~has t…
▽ More
We study structured nonsmooth convex finite-sum optimization that appears widely in machine learning applications, including support vector machines and least absolute deviation. For the primal-dual formulation of this problem, we propose a novel algorithm called \emph{Variance Reduction via Primal-Dual Accelerated Dual Averaging (\vrpda)}. In the nonsmooth and general convex setting, \vrpda~has the overall complexity $O(nd\log\min \{1/ε, n\} + d/ε)$ in terms of the primal-dual gap, where $n$ denotes the number of samples, $d$ the dimension of the primal variables, and $ε$ the desired accuracy. In the nonsmooth and strongly convex setting, the overall complexity of \vrpda~becomes $O(nd\log\min\{1/ε, n\} + d/\sqrtε)$ in terms of both the primal-dual gap and the distance between iterate and optimal solution. Both these results for \vrpda~improve significantly on state-of-the-art complexity estimates, which are $O(nd\log \min\{1/ε, n\} + \sqrt{n}d/ε)$ for the nonsmooth and general convex setting and $O(nd\log \min\{1/ε, n\} + \sqrt{n}d/\sqrtε)$ for the nonsmooth and strongly convex setting, in a much more simple and straightforward way. Moreover, both complexities are better than \emph{lower} bounds for general convex finite sums that lack the particular (common) structure that we consider. Our theoretical results are supported by numerical experiments, which confirm the competitive performance of \vrpda~compared to state-of-the-art.
△ Less
Submitted 7 April, 2021; v1 submitted 26 February, 2021;
originally announced February 2021.