Search | arXiv e-print repository

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

Authors: C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant , et al. (5 additional authors not shown)

Abstract: We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial string inserted before the question is complete. Even in the simple setting of 1-digit addition problems, it is easy to find adversarial prompts that mak… ▽ More We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial string inserted before the question is complete. Even in the simple setting of 1-digit addition problems, it is easy to find adversarial prompts that make all tested models (including PaLM2, GPT4, Claude2) misbehave, and even to steer models to a particular wrong answer. We additionally provide a simple algorithm for finding successful attacks by querying those same models, which we name "prompt inversion rejection sampling" (PIRS). We finally show that models can be partially hardened against these attacks via reinforcement learning and via agentic constitutional loops. However, we were not able to make a language model fully robust against adversarial arithmetic attacks. △ Less

Submitted 15 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.10047 [pdf, other]

Improving Large Language Model Fine-tuning for Solving Math Problems

Authors: Yixin Liu, Avi Singh, C. Daniel Freeman, John D. Co-Reyes, Peter J. Liu

Abstract: Despite their success in many natural language tasks, solving math problems remains a significant challenge for large language models (LLMs). A large gap exists between LLMs' pass-at-one and pass-at-N performance in solving math problems, suggesting LLMs might be close to finding correct solutions, motivating our exploration of fine-tuning methods to unlock LLMs' performance. Using the challenging… ▽ More Despite their success in many natural language tasks, solving math problems remains a significant challenge for large language models (LLMs). A large gap exists between LLMs' pass-at-one and pass-at-N performance in solving math problems, suggesting LLMs might be close to finding correct solutions, motivating our exploration of fine-tuning methods to unlock LLMs' performance. Using the challenging MATH dataset, we investigate three fine-tuning strategies: (1) solution fine-tuning, where we fine-tune to generate a detailed solution for a given math problem; (2) solution-cluster re-ranking, where the LLM is fine-tuned as a solution verifier/evaluator to choose among generated candidate solution clusters; (3) multi-task sequential fine-tuning, which integrates both solution generation and evaluation tasks together efficiently to enhance the LLM performance. With these methods, we present a thorough empirical study on a series of PaLM 2 models and find: (1) The quality and style of the step-by-step solutions used for fine-tuning can make a significant impact on the model performance; (2) While solution re-ranking and majority voting are both effective for improving the model performance when used separately, they can also be used together for an even greater performance boost; (3) Multi-task fine-tuning that sequentially separates the solution generation and evaluation tasks can offer improved performance compared with the solution fine-tuning baseline. Guided by these insights, we design a fine-tuning recipe that yields approximately 58.8% accuracy on the MATH dataset with fine-tuned PaLM 2-L models, an 11.2% accuracy improvement over the few-shot performance of pre-trained PaLM 2-L model with majority voting. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2212.01055 [pdf, other]

Transformer-Based Learned Optimization

Authors: Erik Gärtner, Luke Metz, Mykhaylo Andriluka, C. Daniel Freeman, Cristian Sminchisescu

Abstract: We propose a new approach to learned optimization where we represent the computation of an optimizer's update step using a neural network. The parameters of the optimizer are then learned by training on a set of optimization tasks with the objective to perform minimization efficiently. Our innovation is a new neural network architecture, Optimus, for the learned optimizer inspired by the classic B… ▽ More We propose a new approach to learned optimization where we represent the computation of an optimizer's update step using a neural network. The parameters of the optimizer are then learned by training on a set of optimization tasks with the objective to perform minimization efficiently. Our innovation is a new neural network architecture, Optimus, for the learned optimizer inspired by the classic BFGS algorithm. As in BFGS, we estimate a preconditioning matrix as a sum of rank-one updates but use a Transformer-based neural network to predict these updates jointly with the step length and direction. In contrast to several recent learned optimization-based approaches, our formulation allows for conditioning across the dimensions of the parameter space of the target problem while remaining applicable to optimization tasks of variable dimensionality without retraining. We demonstrate the advantages of our approach on a benchmark composed of objective functions traditionally used for the evaluation of optimization algorithms, as well as on the real world-task of physics-based visual reconstruction of articulated 3d human motion. △ Less

Submitted 28 June, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (CVPR) in Vancouver, Canada

arXiv:2211.09760 [pdf, other]

VeLO: Training Versatile Learned Optimizers by Scaling Up

Authors: Luke Metz, James Harrison, C. Daniel Freeman, Amil Merchant, Lucas Beyer, James Bradbury, Naman Agrawal, Ben Poole, Igor Mordatch, Adam Roberts, Jascha Sohl-Dickstein

Abstract: While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. M… ▽ More While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.08199 [pdf, other]

Allowing Safe Contact in Robotic Goal-Reaching: Planning and Tracking in Operational and Null Spaces

Authors: Xinghao Zhu, Wenzhao Lian, Bodi Yuan, C. Daniel Freeman, Masayoshi Tomizuka

Abstract: In recent years, impressive results have been achieved in robotic manipulation. While many efforts focus on generating collision-free reference signals, few allow safe contact between the robot bodies and the environment. However, in human's daily manipulation, contact between arms and obstacles is prevalent and even necessary. This paper investigates the benefit of allowing safe contact during ro… ▽ More In recent years, impressive results have been achieved in robotic manipulation. While many efforts focus on generating collision-free reference signals, few allow safe contact between the robot bodies and the environment. However, in human's daily manipulation, contact between arms and obstacles is prevalent and even necessary. This paper investigates the benefit of allowing safe contact during robotic manipulation and advocates generating and tracking compliance reference signals in both operational and null spaces. In addition, to optimize the collision-allowed trajectories, we present a hybrid solver that integrates sampling- and gradient-based approaches. We evaluate the proposed method on a goal-reaching task in five simulated and real-world environments with different collisional conditions. We show that allowing safe contact improves goal-reaching efficiency and provides feasible solutions in highly collisional scenarios where collision-free constraints cannot be enforced. Moreover, we demonstrate that planning in null space, in addition to operational space, improves trajectory safety. △ Less

Submitted 31 October, 2022; originally announced November 2022.

Comments: 7 pages, 5 figures, submitted to ICRA 2023

arXiv:2203.11860 [pdf, other]

Practical tradeoffs between memory, compute, and performance in learned optimizers

Authors: Luke Metz, C. Daniel Freeman, James Harrison, Niru Maheswaranathan, Jascha Sohl-Dickstein

Abstract: Optimization plays a costly and crucial role in develo** machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric functions. The parameters of these functions are then optimized so that the resulting learned optimizer minimizes a target loss on a chosen class of models. Learned opti… ▽ More Optimization plays a costly and crucial role in develo** machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric functions. The parameters of these functions are then optimized so that the resulting learned optimizer minimizes a target loss on a chosen class of models. Learned optimizers can both reduce the number of required training steps and improve the final test loss. However, they can be expensive to train, and once trained can be expensive to use due to computational and memory overhead for the optimizer itself. In this work, we identify and quantify the design features governing the memory, compute, and performance trade-offs for many learned and hand-designed optimizers. We further leverage our analysis to construct a learned optimizer that is both faster and more memory efficient than previous work. Our model and training code are open source. △ Less

Submitted 16 July, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

arXiv:2111.05803 [pdf, other]

Gradients are Not All You Need

Authors: Luke Metz, C. Daniel Freeman, Samuel S. Schoenholz, Tal Kachman

Abstract: Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics sim… ▽ More Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics simulation to training learned optimizers. We trace this failure to the spectrum of the Jacobian of the system under study, and provide criteria for when a practitioner might expect this failure to spoil their differentiation based optimization algorithms. △ Less

Submitted 20 January, 2022; v1 submitted 10 November, 2021; originally announced November 2021.

arXiv:2106.13281 [pdf, other]

Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Authors: C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, Olivier Bachem

Abstract: We present Brax, an open source library for rigid body simulation with a focus on performance and parallelism on accelerators, written in JAX. We present results on a suite of tasks inspired by the existing reinforcement learning literature, but remade in our engine. Additionally, we provide reimplementations of PPO, SAC, ES, and direct policy optimization in JAX that compile alongside our environ… ▽ More We present Brax, an open source library for rigid body simulation with a focus on performance and parallelism on accelerators, written in JAX. We present results on a suite of tasks inspired by the existing reinforcement learning literature, but remade in our engine. Additionally, we provide reimplementations of PPO, SAC, ES, and direct policy optimization in JAX that compile alongside our environments, allowing the learning algorithm and the environment processing to occur on the same device, and to scale seamlessly on accelerators. Finally, we include notebooks that facilitate training of performant policies on common OpenAI Gym MuJoCo-like tasks in minutes. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: 9 pages + 12 pages of appendices and references. In submission at NeurIPS 2021 Datasets and Benchmarks Track

arXiv:2101.07367 [pdf, other]

Training Learned Optimizers with Randomly Initialized Learned Optimizers

Authors: Luke Metz, C. Daniel Freeman, Niru Maheswaranathan, Jascha Sohl-Dickstein

Abstract: Learned optimizers are increasingly effective, with performance exceeding that of hand designed optimizers such as Adam~\citep{kingma2014adam} on specific tasks \citep{metz2019understanding}. Despite the potential gains available, in current work the meta-training (or `outer-training') of the learned optimizer is performed by a hand-designed optimizer, or by an optimizer trained by a hand-designed… ▽ More Learned optimizers are increasingly effective, with performance exceeding that of hand designed optimizers such as Adam~\citep{kingma2014adam} on specific tasks \citep{metz2019understanding}. Despite the potential gains available, in current work the meta-training (or `outer-training') of the learned optimizer is performed by a hand-designed optimizer, or by an optimizer trained by a hand-designed optimizer \citep{metz2020tasks}. We show that a population of randomly initialized learned optimizers can be used to train themselves from scratch in an online fashion, without resorting to a hand designed optimizer in any part of the process. A form of population based training is used to orchestrate this self-training. Although the randomly initialized optimizers initially make slow progress, as they improve they experience a positive feedback loop, and become rapidly more effective at training themselves. We believe feedback loops of this type, where an optimizer improves itself, will be important and powerful in the future of machine learning. These methods not only provide a path towards increased performance, but more importantly relieve research and engineering effort. △ Less

Submitted 14 January, 2021; originally announced January 2021.

arXiv:2009.11243 [pdf, other]

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

Authors: Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

Abstract: Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer… ▽ More Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch. △ Less

Submitted 23 September, 2020; originally announced September 2020.

arXiv:2002.11887 [pdf, other]

Using a thousand optimization tasks to learn hyperparameter search strategies

Authors: Luke Metz, Niru Maheswaranathan, Ruoxi Sun, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

Abstract: We present TaskSet, a dataset of tasks for use in training and evaluating optimizers. TaskSet is unique in its size and diversity, containing over a thousand tasks ranging from image classification with fully connected or convolutional neural networks, to variational autoencoders, to non-volume preserving flows on a variety of datasets. As an example application of such a dataset we explore meta-l… ▽ More We present TaskSet, a dataset of tasks for use in training and evaluating optimizers. TaskSet is unique in its size and diversity, containing over a thousand tasks ranging from image classification with fully connected or convolutional neural networks, to variational autoencoders, to non-volume preserving flows on a variety of datasets. As an example application of such a dataset we explore meta-learning an ordered list of hyperparameters to try sequentially. By learning this hyperparameter list from data generated using TaskSet we achieve large speedups in sample efficiency over random search. Next we use the diversity of the TaskSet and our method for learning hyperparameter lists to empirically explore the generalization of these lists to new optimization tasks in a variety of settings including ImageNet classification with Resnet50 and LM1B language modeling with transformers. As part of this work we have opensourced code for all tasks, as well as ~29 million training curves for these problems and the corresponding hyperparameters. △ Less

Submitted 31 March, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

arXiv:1910.13038 [pdf, other]

Learning to Predict Without Looking Ahead: World Models Without Forward Prediction

Authors: C. Daniel Freeman, Luke Metz, David Ha

Abstract: Much of model-based reinforcement learning involves learning a model of an agent's world, and training an agent to leverage this model to perform a task more efficiently. While these models are demonstrably useful for agents, every naturally occurring model of the world of which we are aware---e.g., a brain---arose as the byproduct of competing evolutionary pressures for survival, not minimization… ▽ More Much of model-based reinforcement learning involves learning a model of an agent's world, and training an agent to leverage this model to perform a task more efficiently. While these models are demonstrably useful for agents, every naturally occurring model of the world of which we are aware---e.g., a brain---arose as the byproduct of competing evolutionary pressures for survival, not minimization of a supervised forward-predictive loss via gradient descent. That useful models can arise out of the messy and slow optimization process of evolution suggests that forward-predictive modeling can arise as a side-effect of optimization under the right circumstances. Crucially, this optimization process need not explicitly be a forward-predictive loss. In this work, we introduce a modification to traditional reinforcement learning which we call observational dropout, whereby we limit the agents ability to observe the real environment at each timestep. In doing so, we can coerce an agent into learning a world model to fill in the observation gaps during reinforcement learning. We show that the emerged world model, while not explicitly trained to predict the future, can help the agent learn key skills required to perform well in its environment. Videos of our results available at https://learningtopredict.github.io/ △ Less

Submitted 30 October, 2019; v1 submitted 28 October, 2019; originally announced October 2019.

Comments: To appear at the Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019)

arXiv:1810.10180 [pdf, other]

Understanding and correcting pathologies in the training of learned optimizers

Authors: Luke Metz, Niru Maheswaranathan, Jeremy Nixon, C. Daniel Freeman, Jascha Sohl-Dickstein

Abstract: Deep learning has shown that learned functions can dramatically outperform hand-designed functions on perceptual tasks. Analogously, this suggests that learned optimizers may similarly outperform current hand-designed optimizers, especially for specific problems. However, learned optimizers are notoriously difficult to train and have yet to demonstrate wall-clock speedups over hand-designed optimi… ▽ More Deep learning has shown that learned functions can dramatically outperform hand-designed functions on perceptual tasks. Analogously, this suggests that learned optimizers may similarly outperform current hand-designed optimizers, especially for specific problems. However, learned optimizers are notoriously difficult to train and have yet to demonstrate wall-clock speedups over hand-designed optimizers, and thus are rarely used in practice. Typically, learned optimizers are trained by truncated backpropagation through an unrolled optimization process resulting in gradients that are either strongly biased (for short truncations) or have exploding norm (for long truncations). In this work we propose a training scheme which overcomes both of these difficulties, by dynamically weighting two unbiased gradient estimators for a variational loss on optimizer performance, allowing us to train neural networks to perform optimization of a specific task faster than tuned first-order methods. We demonstrate these results on problems where our learned optimizer trains convolutional networks faster in wall-clock time compared to tuned first-order methods and with an improvement in test loss. △ Less

Submitted 7 June, 2019; v1 submitted 24 October, 2018; originally announced October 2018.

arXiv:1807.00821 [pdf, other]

doi 10.1021/acs.jctc.8b00536

Modern Approaches to Exact Diagonalization and Selected Configuration Interaction with the Adaptive Sampling CI Method

Authors: Norm M. Tubman, C. Daniel Freeman, Daniel S. Levine, Diptarka Hait, Martin Head-Gordon, K. Birgitta Whaley

Abstract: Recent advances in selected CI, including the adaptive sampling configuration interaction (ASCI) algorithm and its heat bath extension, have made the ASCI approach competitive with the most accurate techniques available, and hence an increasingly powerful tool in solving quantum Hamiltonians. In this work, we show that a useful paradigm for generating efficient selected CI/exact diagonalization al… ▽ More Recent advances in selected CI, including the adaptive sampling configuration interaction (ASCI) algorithm and its heat bath extension, have made the ASCI approach competitive with the most accurate techniques available, and hence an increasingly powerful tool in solving quantum Hamiltonians. In this work, we show that a useful paradigm for generating efficient selected CI/exact diagonalization algorithms is driven by fast sorting algorithms, much in the same way iterative diagonalization is based on the paradigm of matrix vector multiplication. We present several new algorithms for all parts of performing a selected CI, which includes new ASCI search, dynamic bit masking, fast orbital rotations, fast diagonal matrix elements, and residue arrays. The algorithms presented here are fast and scalable, and we find that because they are built on fast sorting algorithms they are more efficient than all other approaches we considered. After introducing these techniques we present ASCI results applied to a large range of systems and basis sets in order to demonstrate the types of simulations that can be practically treated at the full-CI level with modern methods and hardware, presenting double- and triple-zeta benchmark data for the G1 dataset. The largest of these calculations is Si$_{2}$H$_{6}$ which is a simulation of 34 electrons in 152 orbitals. We also present some preliminary results for fast deterministic perturbation theory simulations that use hash functions to maintain high efficiency for treating large basis sets. △ Less

Submitted 28 December, 2019; v1 submitted 2 July, 2018; originally announced July 2018.

Comments: 22 pages,8 figures, 15 tables (added supplemental information on Cr2 in the svp basis)

Journal ref: J. Chem. Theory Comput. 2020, 16, 4, 2139-2159

arXiv:1710.03757 [pdf, other]

Monte Carlo Tensor Network Renormalization

Authors: William Huggins, C. Daniel Freeman, Miles Stoudenmire, Norm M. Tubman, K. Birgitta Whaley

Abstract: Techniques for approximately contracting tensor networks are limited in how efficiently they can make use of parallel computing resources. In this work we demonstrate and characterize a Monte Carlo approach to the tensor network renormalization group method which can be used straightforwardly on modern computing architectures. We demonstrate the efficiency of the technique and show that Monte Carl… ▽ More Techniques for approximately contracting tensor networks are limited in how efficiently they can make use of parallel computing resources. In this work we demonstrate and characterize a Monte Carlo approach to the tensor network renormalization group method which can be used straightforwardly on modern computing architectures. We demonstrate the efficiency of the technique and show that Monte Carlo tensor network renormalization provides an attractive path to improving the accuracy of a wide class of challenging computations while also providing useful estimates of uncertainty and a statistical guarantee of unbiased results. △ Less

Submitted 10 October, 2017; originally announced October 2017.

Comments: 8 pages, 3 figures

arXiv:1708.02260 [pdf, other]

doi 10.1103/PhysRevA.98.032322

Stable quantum memories with limited measurement

Authors: C. Daniel Freeman, Mohan Sarovar, C. M. Herdman, K. B. Whaley

Abstract: We demonstrate the existence of a finite temperature threshold for a 1D stabilizer code under an error correcting protocol that requires only a fraction of the syndrome measurements. Below the threshold temperature, encoded states have exponentially long lifetimes, as demonstrated by numerical and analytical arguments. We sketch how this algorithm generalizes to higher dimensional stabilizer codes… ▽ More We demonstrate the existence of a finite temperature threshold for a 1D stabilizer code under an error correcting protocol that requires only a fraction of the syndrome measurements. Below the threshold temperature, encoded states have exponentially long lifetimes, as demonstrated by numerical and analytical arguments. We sketch how this algorithm generalizes to higher dimensional stabilizer codes with string-like excitations, like the toric code. △ Less

Submitted 7 August, 2017; originally announced August 2017.

Comments: 11 Pages, 7 Figures

Journal ref: Phys. Rev. A 98, 032322 (2018)

arXiv:1611.01540 [pdf, other]

Topology and Geometry of Half-Rectified Network Optimization

Authors: C. Daniel Freeman, Joan Bruna

Abstract: The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem. Some insights were recently gained using spin glass models and mean-field approximations, but at the expense of strongly simplifying the nonlinear nature of the model. In this work, we do not make any such assumpt… ▽ More The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem. Some insights were recently gained using spin glass models and mean-field approximations, but at the expense of strongly simplifying the nonlinear nature of the model. In this work, we do not make any such assumption and study conditions on the data distribution and model architecture that prevent the existence of bad local minima. Our theoretical work quantifies and formalizes two important \emph{folklore} facts: (i) the landscape of deep linear networks has a radically different topology from that of deep half-rectified ones, and (ii) that the energy landscape in the non-linear case is fundamentally controlled by the interplay between the smoothness of the data distribution and model over-parametrization. Our main theoretical contribution is to prove that half-rectified single layer networks are asymptotically connected, and we provide explicit bounds that reveal the aforementioned interplay. The conditioning of gradient descent is the next challenge we address. We study this question through the geometry of the level sets, and we introduce an algorithm to efficiently estimate the regularity of such sets on large-scale networks. Our empirical results show that these level sets remain connected throughout all the learning phase, suggesting a near convex behavior, but they become exponentially more curvy as the energy level decays, in accordance to what is observed in practice with very low curvature attractors. △ Less

Submitted 1 June, 2017; v1 submitted 4 November, 2016; originally announced November 2016.

Comments: 22 Pages (10 main + Appendices), 4 Figures, 1 Table, Published as a conference paper at ICLR 2017

arXiv:1608.05074 [pdf, other]

Entanglement structure of non-equilibrium steady states

Authors: Raghu Mahajan, C. Daniel Freeman, Sam Mumford, Norm Tubman, Brian Swingle

Abstract: We study the problem of calculating transport properties of interacting quantum systems, specifically electrical and thermal conductivities, by computing the non-equilibrium steady state (NESS) of the system biased by contacts. Our approach is based on the structure of entanglement in the NESS. With reasonable physical assumptions, we show that a NESS close to local equilibrium is lightly entangle… ▽ More We study the problem of calculating transport properties of interacting quantum systems, specifically electrical and thermal conductivities, by computing the non-equilibrium steady state (NESS) of the system biased by contacts. Our approach is based on the structure of entanglement in the NESS. With reasonable physical assumptions, we show that a NESS close to local equilibrium is lightly entangled and can be represented via a computationally efficient tensor network. We further argue that the NESS may be found by dynamically evolving the system within a manifold of appropriate low entanglement states. A physically realistic law of dynamical evolution is Markovian open system dynamics, or the Lindblad equation. We explore this approach in a well-studied free fermion model where comparisons with the literature are possible. We study both electrical and thermal currents with and without disorder, and compute entropic quantities such as mutual information and conditional mutual information. We conclude with a discussion of the prospects of this approach for the challenging problem of transport in strongly interacting systems, especially those with disorder. △ Less

Submitted 17 August, 2016; originally announced August 2016.

Comments: 10 pages + appendices, 10 figures

arXiv:1603.05005 [pdf, other]

doi 10.1103/PhysRevA.96.012311

Engineering autonomous error correction in stabilizer codes at finite temperature

Authors: C. Daniel Freeman, C. M. Herdman, K. B. Whaley

Abstract: We present an error correcting protocol that enhances the lifetime of stabilizer code based qubits which are susceptible to the creation of pairs of localized defects (due to string-like error operators) at finite temperature, such as the toric code. The primary tool employed is dynamic application of a local, unitary operator which exchanges defects and thereby translates localized excitations. C… ▽ More We present an error correcting protocol that enhances the lifetime of stabilizer code based qubits which are susceptible to the creation of pairs of localized defects (due to string-like error operators) at finite temperature, such as the toric code. The primary tool employed is dynamic application of a local, unitary operator which exchanges defects and thereby translates localized excitations. Crucially, the protocol does not require any measurements of stabilizer operators, and therefore can be used to enhance the lifetime of a qubit in the absence of such experimental resources. △ Less

Submitted 5 May, 2016; v1 submitted 16 March, 2016; originally announced March 2016.

Comments: 14 pages, 13 figures. Comments welcome, APS March Meeting session K44.00007

Journal ref: Phys. Rev. A 96, 012311 (2017)

arXiv:1405.2315 [pdf, other]

doi 10.1103/PhysRevB.90.134302

Relaxation dynamics of the toric code in contact with a thermal reservoir: Finite-size scaling in a low temperature regime

Authors: C. Daniel Freeman, C. M. Herdman, Dylan J Gorman, K. B. Whaley

Abstract: We present an analysis of the relaxation dynamics of finite-size topological qubits in contact with a thermal bath. Using a continuous-time Monte Carlo method, we explicitly compute the low-temperature nonequilibrium dynamics of the toric code on finite lattices. In contrast to the size-independent bound predicted for the toric code in the thermodynamic limit, we identify a low-temperature regime… ▽ More We present an analysis of the relaxation dynamics of finite-size topological qubits in contact with a thermal bath. Using a continuous-time Monte Carlo method, we explicitly compute the low-temperature nonequilibrium dynamics of the toric code on finite lattices. In contrast to the size-independent bound predicted for the toric code in the thermodynamic limit, we identify a low-temperature regime on finite lattices below a size-dependent crossover temperature with nontrivial finite-size and temperature scaling of the relaxation time. We demonstrate how this nontrivial finite-size scaling is governed by the scaling of topologically nontrivial two-dimensional classical random walks. The transition out of this low-temperature regime defines a dynamical finite-size crossover temperature that scales inversely with the log of the system size, in agreement with a crossover temperature defined from equilibrium properties. We find that both the finite-size and finite-temperature scaling are stronger in the low-temperature regime than above the crossover temperature. Since this finite-temperature scaling competes with the scaling of the robustness to unitary perturbations, this analysis may elucidate the scaling of memory lifetimes of possible physical realizations of topological qubits. △ Less

Submitted 4 December, 2014; v1 submitted 9 May, 2014; originally announced May 2014.

Comments: 14 Pages, 13 figures

Journal ref: Phys. Rev. B 90, 134302 (2014)

Showing 1–20 of 20 results for author: Freeman, C D