Search | arXiv e-print repository

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

Authors: Yunzhen Feng, Elvis Dohmatob, Pu Yang, Francois Charton, Julia Kempe

Abstract: Synthesized data from generative models is increasingly considered as an alternative to human-annotated data for fine-tuning Large Language Models. This raises concerns about model collapse: a drop in performance of models fine-tuned on generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investig… ▽ More Synthesized data from generative models is increasingly considered as an alternative to human-annotated data for fine-tuning Large Language Models. This raises concerns about model collapse: a drop in performance of models fine-tuned on generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investigate the use of feedback on synthesized data to prevent model collapse. We derive theoretical conditions under which a Gaussian mixture classification model can achieve asymptotically optimal performance when trained on feedback-augmented synthesized data, and provide supporting simulations for finite regimes. We illustrate our theoretical predictions on two practical problems: computing matrix eigenvalues with transformers and news summarization with large language models, which both undergo model collapse when trained on model-generated data. We show that training from feedback-augmented synthesized data, either by pruning incorrect predictions or by selecting the best of several guesses, can prevent model collapse, validating popular approaches like RLHF. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.04981 [pdf, other]

The Price of Implicit Bias in Adversarially Robust Generalization

Authors: Nikolaos Tsilivis, Natalie Frank, Nathan Srebro, Julia Kempe

Abstract: We study the implicit bias of optimization in robust empirical risk minimization (robust ERM) and its connection with robust generalization. In classification settings under adversarial perturbations with linear models, we study what type of regularization should ideally be applied for a given perturbation set to improve (robust) generalization. We then show that the implicit bias of optimization… ▽ More We study the implicit bias of optimization in robust empirical risk minimization (robust ERM) and its connection with robust generalization. In classification settings under adversarial perturbations with linear models, we study what type of regularization should ideally be applied for a given perturbation set to improve (robust) generalization. We then show that the implicit bias of optimization in robust ERM can significantly affect the robustness of the model and identify two ways this can happen; either through the optimization algorithm or the architecture. We verify our predictions in simulations with synthetic data and experimentally study the importance of implicit bias in robust ERM with deep neural networks. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.02128 [pdf, other]

Iteration Head: A Mechanistic Study of Chain-of-Thought

Authors: Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Alice Yang, Francois Charton, Julia Kempe

Abstract: Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particul… ▽ More Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and conditions of apparition of CoT capabilities remains limited. This paper helps fill this gap by demonstrating how CoT reasoning emerges in transformers in a controlled and interpretable setting. In particular, we observe the appearance of a specialized attention mechanism dedicated to iterative reasoning, which we coined "iteration heads". We track both the emergence and the precise working of these iteration heads down to the attention level, and measure the transferability of the CoT skills to which they give rise between tasks. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2404.19640 [pdf, other]

Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks

Authors: Yunzhen Feng, Tim G. J. Rudner, Nikolaos Tsilivis, Julia Kempe

Abstract: Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN infe… ▽ More Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that BNNs trained with state-of-the-art approximate inference methods, and even BNNs trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. We also identify various conceptual and experimental errors in previous works that claimed inherent adversarial robustness of BNNs and conclusively demonstrate that BNNs and uncertainty-aware Bayesian prediction pipelines are not inherently robust against adversarial attacks. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.05579 [pdf, other]

Robust Data Pruning: Uncovering and Overcoming Implicit Bias

Authors: Artem Vysogorets, Kartik Ahuja, Julia Kempe

Abstract: In the era of exceptionally data-hungry models, careful selection of the training data is essential to mitigate the extensive costs of deep learning. Data pruning offers a solution by removing redundant or uninformative samples from the dataset, which yields faster convergence and improved neural scaling laws. However, little is known about its impact on classification bias of the trained models.… ▽ More In the era of exceptionally data-hungry models, careful selection of the training data is essential to mitigate the extensive costs of deep learning. Data pruning offers a solution by removing redundant or uninformative samples from the dataset, which yields faster convergence and improved neural scaling laws. However, little is known about its impact on classification bias of the trained models. We conduct the first systematic study of this effect and reveal that existing data pruning algorithms can produce highly biased classifiers. At the same time, we argue that random data pruning with appropriate class ratios has potential to improve the worst-class performance. We propose a "fairness-aware" approach to pruning and empirically demonstrate its performance on standard computer vision benchmarks. In sharp contrast to existing algorithms, our proposed method continues improving robustness at a tolerable drop of average performance as we prune more from the datasets. We present theoretical analysis of the classification risk in a mixture of Gaussians to further motivate our algorithm and support our findings. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.09869 [pdf, other]

Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors

Authors: Tim G. J. Rudner, Ya Shi Zhang, Andrew Gordon Wilson, Julia Kempe

Abstract: Machine learning models often perform poorly under subpopulation shifts in the data distribution. Develo** methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well… ▽ More Machine learning models often perform poorly under subpopulation shifts in the data distribution. Develo** methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well under subpopulation shifts. We design a simple group-aware prior that only requires access to a small set of data with group information and demonstrate that training with this prior yields state-of-the-art performance -- even when only retraining the final layer of a previously trained non-robust model. Group aware-priors are conceptually simple, complementary to existing approaches, such as attribute pseudo labeling and data reweighting, and open up promising new avenues for harnessing Bayesian inference to enable robustness to subpopulation shifts. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Published in Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

arXiv:2402.07712 [pdf, other]

Model Collapse Demystified: The Case of Regression

Authors: Elvis Dohmatob, Yunzhen Feng, Julia Kempe

Abstract: In the era of proliferation of large language and image generation models, the phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e the model collapses. In this work, we study this phenomenon in the setting… ▽ More In the era of proliferation of large language and image generation models, the phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e the model collapses. In this work, we study this phenomenon in the setting of high-dimensional regression and obtain analytic formulae which quantitatively outline this phenomenon in a broad range of regimes. In the special case of polynomial decaying spectral and source conditions, we obtain modified scaling laws which exhibit new crossover phenomena from fast to slow rates. We also propose a simple strategy based on adaptive regularization to mitigate model collapse. Our theoretical results are validated with experiments. △ Less

Submitted 30 April, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.07043 [pdf, other]

A Tale of Tails: Model Collapse as a Change of Scaling Laws

Authors: Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Julia Kempe

Abstract: As AI model size grows, neural scaling laws have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data. In this paper we ask: How will… ▽ More As AI model size grows, neural scaling laws have become a crucial tool to predict the improvements of large models when increasing capacity and the size of original (human or natural) training data. Yet, the widespread use of popular models means that the ecosystem of online data and text will co-evolve to progressively contain increased amounts of synthesized data. In this paper we ask: How will the scaling laws change in the inevitable regime where synthetic data makes its way into the training corpus? Will future models, still improve, or be doomed to degenerate up to total (model) collapse? We develop a theoretical framework of model collapse through the lens of scaling laws. We discover a wide range of decay phenomena, analyzing loss of scaling, shifted scaling with number of generations, the ''un-learning" of skills, and grokking when mixing human and synthesized data. Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the large language model Llama2. △ Less

Submitted 31 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

Journal ref: ICML 2024

arXiv:2402.03579 [pdf, other]

Deconstructing the Goldilocks Zone of Neural Network Initialization

Authors: Artem Vysogorets, Anna Dawid, Julia Kempe

Abstract: The second-order properties of the training loss have a massive impact on the optimization dynamics of deep learning models. Fort & Scherlis (2019) discovered that a large excess of positive curvature and local convexity of the loss Hessian is associated with highly trainable initial points located in a region coined the "Goldilocks zone". Only a handful of subsequent studies touched upon this rel… ▽ More The second-order properties of the training loss have a massive impact on the optimization dynamics of deep learning models. Fort & Scherlis (2019) discovered that a large excess of positive curvature and local convexity of the loss Hessian is associated with highly trainable initial points located in a region coined the "Goldilocks zone". Only a handful of subsequent studies touched upon this relationship, so it remains largely unexplained. In this paper, we present a rigorous and comprehensive analysis of the Goldilocks zone for homogeneous neural networks. In particular, we derive the fundamental condition resulting in excess of positive curvature of the loss, explaining and refining its conventionally accepted connection to the initialization norm. Further, we relate the excess of positive curvature to model confidence, low initial loss, and a previously unknown type of vanishing cross-entropy loss gradient. To understand the importance of excessive positive curvature for trainability of deep networks, we optimize fully-connected and convolutional architectures outside the Goldilocks zone and analyze the emergent behaviors. We find that strong model performance is not perfectly aligned with the Goldilocks zone, calling for further research into this relationship. △ Less

Submitted 4 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2311.17967 [pdf, other]

Discovering Galaxy Features via Dataset Distillation

Authors: Haowen Guan, Xuan Zhao, Zishi Wang, Zhiyang Li, Julia Kempe

Abstract: In many applications, Neural Nets (NNs) have classification performance on par or even exceeding human capacity. Moreover, it is likely that NNs leverage underlying features that might differ from those humans perceive to classify. Can we "reverse-engineer" pertinent features to enhance our scientific understanding? Here, we apply this idea to the notoriously difficult task of galaxy classificatio… ▽ More In many applications, Neural Nets (NNs) have classification performance on par or even exceeding human capacity. Moreover, it is likely that NNs leverage underlying features that might differ from those humans perceive to classify. Can we "reverse-engineer" pertinent features to enhance our scientific understanding? Here, we apply this idea to the notoriously difficult task of galaxy classification: NNs have reached high performance for this task, but what does a neural net (NN) "see" when it classifies galaxies? Are there morphological features that the human eye might overlook that could help with the task and provide new insights? Can we visualize tracers of early evolution, or additionally incorporated spectral data? We present a novel way to summarize and visualize galaxy morphology through the lens of neural networks, leveraging Dataset Distillation, a recent deep-learning methodology with the primary objective to distill knowledge from a large dataset and condense it into a compact synthetic dataset, such that a model trained on this synthetic dataset achieves performance comparable to a model trained on the full dataset. We curate a class-balanced, medium-size high-confidence version of the Galaxy Zoo 2 dataset, and proceed with dataset distillation from our accurate NN-classifier to create synthesized prototypical images of galaxy morphological features, demonstrating its effectiveness. Of independent interest, we introduce a self-adaptive version of the state-of-the-art Matching Trajectory algorithm to automate the distillation process, and show enhanced performance on computer vision benchmarks. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: Accepted to NeurIPS Workshop on Machine Learning and the Physical Sciences, 2023

arXiv:2311.07444 [pdf, other]

On the Robustness of Neural Collapse and the Neural Collapse of Robustness

Authors: **gtong Su, Ya Shi Zhang, Nikolaos Tsilivis, Julia Kempe

Abstract: Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex). While it has been observed empirically in various cases and has been theoretically motivated, its connection with crucial properties of neural networks, like their generalization and robustness,… ▽ More Neural Collapse refers to the curious phenomenon in the end of training of a neural network, where feature vectors and classification weights converge to a very simple geometrical arrangement (a simplex). While it has been observed empirically in various cases and has been theoretically motivated, its connection with crucial properties of neural networks, like their generalization and robustness, remains unclear. In this work, we study the stability properties of these simplices. We find that the simplex structure disappears under small adversarial attacks, and that perturbed examples "leap" between simplex vertices. We further analyze the geometry of networks that are optimized to be robust against adversarial perturbations of the input, and find that Neural Collapse is a pervasive phenomenon in these cases as well, with clean and perturbed representations forming aligned simplices, and giving rise to a robust simple nearest-neighbor classifier. By studying the propagation of the amount of collapse inside the network, we identify novel properties of both robust and non-robust machine learning models, and show that earlier, unlike later layers maintain reliable simplices on perturbed data. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.07025 [pdf, other]

Embarassingly Simple Dataset Distillation

Authors: Yunzhen Feng, Ramakrishna Vedantam, Julia Kempe

Abstract: Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced varian… ▽ More Dataset distillation extracts a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced variance in the gradients, computational burden, and long-term dependencies. We introduce an improved method: Random Truncated Backpropagation Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation coupled with a random window, effectively stabilizing the gradients and speeding up the optimization while covering long dependencies. This allows us to establish new state-of-the-art for a variety of standard dataset benchmarks. A deeper dive into the nature of distilled data unveils pronounced intercorrelation. In particular, subsets of distilled datasets tend to exhibit much worse performance than directly distilled smaller datasets of the same size. Leveraging RaT-BPTT, we devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: Short version appears at NeurIPS 2023 WANT workshop

arXiv:2307.02693 [pdf, other]

Kernels, Data & Physics

Authors: Francesco Cagnetta, Deborah Oliveira, Mahalakshmi Sabanayagam, Nikolaos Tsilivis, Julia Kempe

Abstract: Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as… ▽ More Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as data distillation and adversarial robustness, examples of inductive bias are also discussed. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: These are notes from the lecture of Julia Kempe given at the summer school "Statistical Physics \& Machine Learning", that took place in Les Houches School of Physics in France from 4th to 29th July 2022

arXiv:2304.09403 [pdf, other]

Wavelets Beat Monkeys at Adversarial Robustness

Authors: **gtong Su, Julia Kempe

Abstract: Research on improving the robustness of neural networks to adversarial noise - imperceptible malicious perturbations of the data - has received significant attention. The currently uncontested state-of-the-art defense to obtain robust deep neural networks is Adversarial Training (AT), but it consumes significantly more resources compared to standard training and trades off accuracy for robustness.… ▽ More Research on improving the robustness of neural networks to adversarial noise - imperceptible malicious perturbations of the data - has received significant attention. The currently uncontested state-of-the-art defense to obtain robust deep neural networks is Adversarial Training (AT), but it consumes significantly more resources compared to standard training and trades off accuracy for robustness. An inspiring recent work [Dapello et al.] aims to bring neurobiological tools to the question: How can we develop Neural Nets that robustly generalize like human vision? [Dapello et al.] design a network structure with a neural hidden first layer that mimics the primate primary visual cortex (V1), followed by a back-end structure adapted from current CNN vision models. It seems to achieve non-trivial adversarial robustness on standard vision benchmarks when tested on small perturbations. Here we revisit this biologically inspired work, and ask whether a principled parameter-free representation with inspiration from physics is able to achieve the same goal. We discover that the wavelet scattering transform can replace the complex V1-cortex and simple uniform Gaussian noise can take the role of neural stochasticity, to achieve adversarial robustness. In extensive experiments on the CIFAR-10 benchmark with adaptive adversarial attacks we show that: 1) Robustness of VOneBlock architectures is relatively weak (though non-zero) when the strength of the adversarial attack radius is set to commonly used benchmarks. 2) Replacing the front-end VOneBlock by an off-the-shelf parameter-free Scatternet followed by simple uniform Gaussian noise can achieve much more substantial adversarial robustness without adversarial training. Our work shows how physically inspired structures yield new insights into robustness that were previously only thought possible by meticulously mimicking the human cortex. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: Machine Learning and the Physical Sciences Workshop, NeurIPS 2022

arXiv:2210.05577 [pdf, other]

What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness?

Authors: Nikolaos Tsilivis, Julia Kempe

Abstract: The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon. Here, we study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods, namely the Neural Tangent Kernel (NTK… ▽ More The adversarial vulnerability of neural nets, and subsequent techniques to create robust models have attracted significant attention; yet we still lack a full understanding of this phenomenon. Here, we study adversarial examples of trained neural networks through analytical tools afforded by recent theory advances connecting neural networks and kernel methods, namely the Neural Tangent Kernel (NTK), following a growing body of work that leverages the NTK approximation to successfully analyze important deep learning phenomena and design algorithms for new applications. We show how NTKs allow to generate adversarial examples in a ``training-free'' fashion, and demonstrate that they transfer to fool their finite-width neural net counterparts in the ``lazy'' regime. We leverage this connection to provide an alternative view on robust and non-robust features, which have been suggested to underlie the adversarial brittleness of neural nets. Specifically, we define and study features induced by the eigendecomposition of the kernel to better understand the role of robust and non-robust features, the reliance on both for standard classification and the robustness-accuracy trade-off. We find that such features are surprisingly consistent across architectures, and that robust features tend to correspond to the largest eigenvalues of the model, and thus are learned early during training. Our framework allows us to identify and visualize non-robust yet useful features. Finally, we shed light on the robustness mechanism underlying adversarial training of neural nets used in practice: quantifying the evolution of the associated empirical NTK, we demonstrate that its dynamics falls much earlier into the ``lazy'' regime and manifests a much stronger form of the well known bias to prioritize learning features within the top eigenspaces of the kernel, compared to standard training. △ Less

Submitted 30 January, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022; added link to GitHub repository

arXiv:2210.01987 [pdf, other]

ImpressLearn: Continual Learning via Combined Task Impressions

Authors: Dhrupad Bhardwaj, Julia Kempe, Artem Vysogorets, Angela M. Teng, Evaristus C. Ezekwem

Abstract: This work proposes a new method to sequentially train deep neural networks on multiple tasks without suffering catastrophic forgetting, while endowing it with the capability to quickly adapt to unseen tasks. Starting from existing work on network masking (Wortsman et al., 2020), we show that simply learning a linear combination of a small number of task-specific supermasks (impressions) on a rando… ▽ More This work proposes a new method to sequentially train deep neural networks on multiple tasks without suffering catastrophic forgetting, while endowing it with the capability to quickly adapt to unseen tasks. Starting from existing work on network masking (Wortsman et al., 2020), we show that simply learning a linear combination of a small number of task-specific supermasks (impressions) on a randomly initialized backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on unseen tasks. In contrast to previous methods, we do not require to generate dedicated masks or contexts for each new task, instead leveraging transfer learning to keep per-task parameter overhead small. Our work illustrates the power of linearly combining individual impressions, each of which fares poorly in isolation, to achieve performance comparable to a dedicated mask. Moreover, even repeated impressions from the same task (homogeneous masks), when combined, can approach the performance of heterogeneous combinations if sufficiently many impressions are used. Our approach scales more efficiently than existing methods, often requiring orders of magnitude fewer parameters and can function without modification even when task identity is missing. In addition, in the setting where task labels are not given at inference, our algorithm gives an often favorable alternative to the one-shot procedure used by Wortsman et al., 2020. We evaluate our method on a number of well-known image classification datasets and network architectures. △ Less

Submitted 31 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2207.11727 [pdf, other]

Can we achieve robustness from data alone?

Authors: Nikolaos Tsilivis, **gtong Su, Julia Kempe

Abstract: We introduce a meta-learning algorithm for adversarially robust classification. The proposed method tries to be as model agnostic as possible and optimizes a dataset prior to its deployment in a machine learning system, aiming to effectively erase its non-robust features. Once the dataset has been created, in principle no specialized algorithm (besides standard gradient descent) is needed to train… ▽ More We introduce a meta-learning algorithm for adversarially robust classification. The proposed method tries to be as model agnostic as possible and optimizes a dataset prior to its deployment in a machine learning system, aiming to effectively erase its non-robust features. Once the dataset has been created, in principle no specialized algorithm (besides standard gradient descent) is needed to train a robust model. We formulate the data optimization procedure as a bi-level optimization problem on kernel regression, with a class of kernels that describe infinitely wide neural nets (Neural Tangent Kernels). We present extensive experiments on standard computer vision benchmarks using a variety of different models, demonstrating the effectiveness of our method, while also pointing out its current shortcomings. In parallel, we revisit prior work that also focused on the problem of data optimization for robust classification \citep{Ily+19}, and show that being robust to adversarial attacks after standard (gradient descent) training on a suitable dataset is more challenging than previously thought. △ Less

Submitted 30 January, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

arXiv:2107.02306 [pdf, other]

Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity

Authors: Artem Vysogorets, Julia Kempe

Abstract: Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes. Benchmarking in this domain heavily relies on faithful representation of the sparsity of subnetworks, which has been traditionally computed as the fraction of removed connections (direct sparsity). This definition, however, fails to recognize unpruned parameters that detached from input or output… ▽ More Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes. Benchmarking in this domain heavily relies on faithful representation of the sparsity of subnetworks, which has been traditionally computed as the fraction of removed connections (direct sparsity). This definition, however, fails to recognize unpruned parameters that detached from input or output layers of underlying subnetworks, potentially underestimating actual effective sparsity: the fraction of inactivated connections. While this effect might be negligible for moderately pruned networks (up to 10-100 compression rates), we find that it plays an increasing role for thinner subnetworks, greatly distorting comparison between different pruning algorithms. For example, we show that effective compression of a randomly pruned LeNet-300-100 can be orders of magnitude larger than its direct counterpart, while no discrepancy is ever observed when using SynFlow for pruning [Tanaka et al., 2020]. In this work, we adopt the lens of effective sparsity to reevaluate several recent pruning algorithms on common benchmark architectures (e.g., LeNet-300-100, VGG-19, ResNet-18) and discover that their absolute and relative performance changes dramatically in this new and more appropriate framework. To aim for effective, rather than direct, sparsity, we develop a low-cost extension to most pruning algorithms. Further, equipped with effective sparsity as a reference frame, we partially reconfirm that random pruning with appropriate sparsity allocation across layers performs as well or better than more sophisticated algorithms for pruning at initialization [Su et al., 2020]. In response to this observation, using a simple analogy of pressure distribution in coupled cylinders from physics, we design novel layerwise sparsity quotas that outperform all existing baselines in the context of random pruning. △ Less

Submitted 7 April, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

arXiv:1209.1055 [pdf, other]

doi 10.1007/978-3-642-31594-7_33

Hardness of approximation for quantum problems

Authors: Sevag Gharibian, Julia Kempe

Abstract: The polynomial hierarchy plays a central role in classical complexity theory. Here, we define a quantum generalization of the polynomial hierarchy, and initiate its study. We show that not only are there natural complete problems for the second level of this quantum hierarchy, but that these problems are in fact hard to approximate. Using these techniques, we also obtain hardness of approximation… ▽ More The polynomial hierarchy plays a central role in classical complexity theory. Here, we define a quantum generalization of the polynomial hierarchy, and initiate its study. We show that not only are there natural complete problems for the second level of this quantum hierarchy, but that these problems are in fact hard to approximate. Using these techniques, we also obtain hardness of approximation for the class QCMA. Our approach is based on the use of dispersers, and is inspired by the classical results of Umans regarding hardness of approximation for the second level of the classical polynomial hierarchy [Umans, FOCS 1999]. The problems for which we prove hardness of approximation for include, among others, a quantum version of the Succinct Set Cover problem, and a variant of the local Hamiltonian problem with hybrid classical-quantum ground states. △ Less

Submitted 5 September, 2012; originally announced September 2012.

Comments: 21 pages, 1 figure, extended abstract appeared in Proceedings of the 39th International Colloquium on Automata, Languages and Programming (ICALP), pages 387-398, Springer, 2012

Journal ref: Quantum Information & Computation 14 (5 & 6): 517-540, 2014. Also in Proceedings of ICALP 2012

arXiv:1101.3884 [pdf, ps, other]

doi 10.1137/110842272

Approximation algorithms for QMA-complete problems

Authors: Sevag Gharibian, Julia Kempe

Abstract: Approximation algorithms for classical constraint satisfaction problems are one of the main research areas in theoretical computer science. Here we define a natural approximation version of the QMA-complete local Hamiltonian problem and initiate its study. We present two main results. The first shows that a non-trivial approximation ratio can be obtained in the class NP using product states. The s… ▽ More Approximation algorithms for classical constraint satisfaction problems are one of the main research areas in theoretical computer science. Here we define a natural approximation version of the QMA-complete local Hamiltonian problem and initiate its study. We present two main results. The first shows that a non-trivial approximation ratio can be obtained in the class NP using product states. The second result (which builds on the first one), gives a polynomial time (classical) algorithm providing a similar approximation ratio for dense instances of the problem. The latter result is based on an adaptation of the "exhaustive sampling method" by Arora et al. [J. Comp. Sys. Sci. 58, p.193 (1999)] to the quantum setting, and might be of independent interest. △ Less

Submitted 20 January, 2011; originally announced January 2011.

Comments: 22 pages, comments welcome

Journal ref: SIAM Journal on Computing 41(4): 1028-1050, 2012. Also in Proceedings of 26th IEEE Conference on Computational Complexity (CCC), 178-188, 2011

arXiv:1012.4728 [pdf, ps, other]

Parallel Repetition of Entangled Games

Authors: Julia Kempe, Thomas Vidick

Abstract: We consider one-round games between a classical referee and two players. One of the main questions in this area is the parallel repetition question: Is there a way to decrease the maximum winning probability of a game without increasing the number of rounds or the number of players? Classically, efforts to resolve this question, open for many years, have culminated in Raz's celebrated parallel rep… ▽ More We consider one-round games between a classical referee and two players. One of the main questions in this area is the parallel repetition question: Is there a way to decrease the maximum winning probability of a game without increasing the number of rounds or the number of players? Classically, efforts to resolve this question, open for many years, have culminated in Raz's celebrated parallel repetition theorem on one hand, and in efficient product testers for PCPs on the other. In the case where players share entanglement, the only previously known results are for special cases of games, and are based on techniques that seem inherently limited. Here we show for the first time that the maximum success probability of entangled games can be reduced through parallel repetition, provided it was not initially 1. Our proof is inspired by a seminal result of Feige and Kilian in the context of classical two-prover one-round interactive proofs. One of the main components in our proof is an orthogonalization lemma for operators, which might be of independent interest. △ Less

Submitted 11 May, 2011; v1 submitted 21 December, 2010; originally announced December 2010.

Comments: v2: minor fixes and explanations added

arXiv:1005.0512 [pdf, ps, other]

Two-Source Extractors Secure Against Quantum Adversaries

Authors: Roy Kasher, Julia Kempe

Abstract: We initiate the study of multi-source extractors in the quantum world. In this setting, our goal is to extract random bits from two independent weak random sources, on which two quantum adversaries store a bounded amount of information. Our main result is a two-source extractor secure against quantum adversaries, with parameters closely matching the classical case and tight in several instances. M… ▽ More We initiate the study of multi-source extractors in the quantum world. In this setting, our goal is to extract random bits from two independent weak random sources, on which two quantum adversaries store a bounded amount of information. Our main result is a two-source extractor secure against quantum adversaries, with parameters closely matching the classical case and tight in several instances. Moreover, the extractor is secure even if the adversaries share entanglement. The construction is the Chor-Goldreich [CG88] two-source inner product extractor and its multi-bit variant by Dodis et al. [DEOR04]. Previously, research in this area focused on the construction of seeded extractors secure against quantum adversaries; the multi-source setting poses new challenges, among which is the presence of entanglement that could potentially break the independence of the sources. △ Less

Submitted 4 May, 2010; originally announced May 2010.

Comments: 20 pages, no figures

arXiv:0911.1696 [pdf, ps, other]

doi 10.1145/2371656.2371659

A Quantum Lovasz Local Lemma

Authors: Andris Ambainis, Julia Kempe, Or Sattath

Abstract: The Lovasz Local Lemma (LLL) is a powerful tool in probability theory to show the existence of combinatorial objects meeting a prescribed collection of "weakly dependent" criteria. We show that the LLL extends to a much more general geometric setting, where events are replaced with subspaces and probability is replaced with relative dimension, which allows to lower bound the dimension of the int… ▽ More The Lovasz Local Lemma (LLL) is a powerful tool in probability theory to show the existence of combinatorial objects meeting a prescribed collection of "weakly dependent" criteria. We show that the LLL extends to a much more general geometric setting, where events are replaced with subspaces and probability is replaced with relative dimension, which allows to lower bound the dimension of the intersection of vector spaces under certain independence conditions. Our result immediately applies to the k-QSAT problem: For instance we show that any collection of rank 1 projectors with the property that each qubit appears in at most $2^k/(e \cdot k)$ of them, has a joint satisfiable state. We then apply our results to the recently studied model of random k-QSAT. Recent works have shown that the satisfiable region extends up to a density of 1 in the large k limit, where the density is the ratio of projectors to qubits. Using a hybrid approach building on work by Laumann et al. we greatly extend the known satisfiable region for random k-QSAT to a density of $Ω(2^k/k^2)$. Since our tool allows us to show the existence of joint satisfying states without the need to construct them, we are able to penetrate into regions where the satisfying states are conjectured to be entangled, avoiding the need to construct them, which has limited previous approaches to product states. △ Less

Submitted 9 November, 2009; originally announced November 2009.

Comments: 19 pages

Journal ref: Journal of the ACM, Volume 59 Issue 5, October 2012, Article No. 24

arXiv:0911.0201 [pdf, ps, other]

No Strong Parallel Repetition with Entangled and Non-signaling Provers

Authors: Julia Kempe, Oded Regev

Abstract: We consider one-round games between a classical verifier and two provers. One of the main questions in this area is the \emph{parallel repetition question}: If the game is played $\ell$ times in parallel, does the maximum winning probability decay exponentially in $\ell$? In the classical setting, this question was answered in the affirmative by Raz. More recently the question arose whether the… ▽ More We consider one-round games between a classical verifier and two provers. One of the main questions in this area is the \emph{parallel repetition question}: If the game is played $\ell$ times in parallel, does the maximum winning probability decay exponentially in $\ell$? In the classical setting, this question was answered in the affirmative by Raz. More recently the question arose whether the decay is of the form $(1-Θ(\eps))^\ell$ where $1-\eps$ is the value of the game and $\ell$ is the number of repetitions. This question is known as the \emph{strong parallel repetition question} and was motivated by its connections to the unique games conjecture. It was resolved by Raz who showed that strong parallel repetition does \emph{not} hold, even in the very special case of games known as XOR games. This opens the question whether strong parallel repetition holds in the case when the provers share entanglement. Evidence for this is provided by the behavior of XOR games, which have strong (in fact \emph{perfect}) parallel repetition, and by the recently proved strong parallel repetition of linear unique games. A similar question was open for games with so-called non-signaling provers. Here the best known parallel repetition theorem is due to Holenstein, and is of the form $(1-Θ(\eps^2))^\ell$. We show that strong parallel repetition holds neither with entangled provers nor with non-signaling provers. In particular we obtain that Holenstein's bound is tight. Along the way we also provide a tight characterization of the asymptotic behavior of the entangled value under parallel repetition of unique games in terms of a semidefinite program. △ Less

Submitted 1 November, 2009; originally announced November 2009.

Comments: 15 pages, 2 figures

arXiv:0802.1464 [pdf, ps, other]

Upper Bounds on the Noise Threshold for Fault-tolerant Quantum Computing

Authors: Julia Kempe, Oded Regev, Falk Unger, Ronald de Wolf

Abstract: We prove new upper bounds on the tolerable level of noise in a quantum circuit. We consider circuits consisting of unitary k-qubit gates each of whose input wires is subject to depolarizing noise of strength p, as well as arbitrary one-qubit gates that are essentially noise-free. We assume that the output of the circuit is the result of measuring some designated qubit in the final state. Our mai… ▽ More We prove new upper bounds on the tolerable level of noise in a quantum circuit. We consider circuits consisting of unitary k-qubit gates each of whose input wires is subject to depolarizing noise of strength p, as well as arbitrary one-qubit gates that are essentially noise-free. We assume that the output of the circuit is the result of measuring some designated qubit in the final state. Our main result is that for p>1-Θ(1/\sqrt{k}), the output of any such circuit of large enough depth is essentially independent of its input, thereby making the circuit useless. For the important special case of k=2, our bound is p>35.7%. Moreover, if the only allowed gate on more than one qubit is the two-qubit CNOT gate, then our bound becomes 29.3%. These bounds on p are notably better than previous bounds, yet are incomparable because of the somewhat different circuit model that we are using. Our main technique is the use of a Pauli basis decomposition, which we believe should lead to further progress in deriving such bounds. △ Less

Submitted 11 February, 2008; originally announced February 2008.

Comments: 14 pages, 3 figures

arXiv:0711.3715 [pdf, ps, other]

Using Entanglement in Quantum Multi-Prover Interactive Proofs

Authors: Julia Kempe, Hirotada Kobayashi, Keiji Matsumoto, Thomas Vidick

Abstract: The central question in quantum multi-prover interactive proof systems is whether or not entanglement shared between provers affects the verification power of the proof system. We study for the first time positive aspects of prior entanglement and show that entanglement is useful even for honest provers. We show how to use shared entanglement to parallelize any multi-prover quantum interactive p… ▽ More The central question in quantum multi-prover interactive proof systems is whether or not entanglement shared between provers affects the verification power of the proof system. We study for the first time positive aspects of prior entanglement and show that entanglement is useful even for honest provers. We show how to use shared entanglement to parallelize any multi-prover quantum interactive proof system to a one-round system with perfect completeness, with one extra prover. Alternatively, we can also parallelize to a three-turn system with the same number of provers, where the verifier only broadcasts the outcome of a coin flip. This "public-coin" property is somewhat surprising, since in the classical case public-coin multi-prover interactive proofs are equivalent to single prover ones. △ Less

Submitted 23 November, 2007; originally announced November 2007.

Comments: 19 pages

arXiv:0710.0655 [pdf, ps, other]

Unique Games with Entangled Provers are Easy

Authors: Julia Kempe, Oded Regev, Ben Toner

Abstract: We consider one-round games between a classical verifier and two provers who share entanglement. We show that when the constraints enforced by the verifier are `unique' constraints (i.e., permutations), the value of the game can be well approximated by a semidefinite program. Essentially the only algorithm known previously was for the special case of binary answers, as follows from the work of T… ▽ More We consider one-round games between a classical verifier and two provers who share entanglement. We show that when the constraints enforced by the verifier are `unique' constraints (i.e., permutations), the value of the game can be well approximated by a semidefinite program. Essentially the only algorithm known previously was for the special case of binary answers, as follows from the work of Tsirelson in 1980. Among other things, our result implies that the variant of the unique games conjecture where we allow the provers to share entanglement is false. Our proof is based on a novel `quantum rounding technique', showing how to take a solution to an SDP and transform it to a strategy for entangled provers. Using our approximation by a semidefinite program we also show a parallel repetition theorem for unique entangled games. △ Less

Submitted 3 October, 2009; v1 submitted 2 October, 2007; originally announced October 2007.

Comments: 25 pages, revised version, contains parallel repetition result

arXiv:0705.4077 [pdf, ps, other]

doi 10.1007/s00220-008-0710-3

The power of quantum systems on a line

Authors: Dorit Aharonov, Daniel Gottesman, Sandy Irani, Julia Kempe

Abstract: We study the computational strength of quantum particles (each of finite dimensionality) arranged on a line. First, we prove that it is possible to perform universal adiabatic quantum computation using a one-dimensional quantum system (with 9 states per particle). This might have practical implications for experimentalists interested in constructing an adiabatic quantum computer. Building on the… ▽ More We study the computational strength of quantum particles (each of finite dimensionality) arranged on a line. First, we prove that it is possible to perform universal adiabatic quantum computation using a one-dimensional quantum system (with 9 states per particle). This might have practical implications for experimentalists interested in constructing an adiabatic quantum computer. Building on the same construction, but with some additional technical effort and 12 states per particle, we show that the problem of approximating the ground state energy of a system composed of a line of quantum particles is QMA-complete; QMA is a quantum analogue of NP. This is in striking contrast to the fact that the analogous classical problem, namely, one-dimensional MAX-2-SAT with nearest neighbor constraints, is in P. The proof of the QMA-completeness result requires an additional idea beyond the usual techniques in the area: Not all illegal configurations can be ruled out by local checks, so instead we rule out such illegal configurations because they would, in the future, evolve into a state which can be seen locally to be illegal. Our construction implies (assuming the quantum Church-Turing thesis and that quantum computers cannot efficiently solve QMA-complete problems) that there are one-dimensional systems which take an exponential time to relax to their ground states at any temperature, making them candidates for being one-dimensional spin glasses. △ Less

Submitted 19 February, 2009; v1 submitted 28 May, 2007; originally announced May 2007.

Comments: 21 pages. v2 has numerous corrections and clarifications, and most importantly a new author, merged from arXiv:0705.4067. v3 is the published version, with additional clarifications, publisher's version available at http://www.springerlink.com

Journal ref: Comm. Math. Physics, vol. 287, no. 1, pp. 41-65 (2009)

arXiv:0704.2903 [pdf, ps, other]

Entangled games are hard to approximate

Authors: Julia Kempe, Hirotada Kobayashi, Keiji Matsumoto, Ben Toner, Thomas Vidick

Abstract: We establish the first hardness results for the problem of computing the value of one-round games played by a verifier and a team of provers who can share quantum entanglement. In particular, we show that it is NP-hard to approximate within an inverse polynomial the value of a one-round game with (i) quantum verifier and two entangled provers or (ii) classical verifier and three entangled prover… ▽ More We establish the first hardness results for the problem of computing the value of one-round games played by a verifier and a team of provers who can share quantum entanglement. In particular, we show that it is NP-hard to approximate within an inverse polynomial the value of a one-round game with (i) quantum verifier and two entangled provers or (ii) classical verifier and three entangled provers. Previously it was not even known if computing the value exactly is NP-hard. We also describe a mathematical conjecture, which, if true, would imply hardness of approximation to within a constant. We start our proof by describing two ways to modify classical multi-prover games to make them resistant to entangled provers. We then show that a strategy for the modified game that uses entanglement can be ``rounded'' to one that does not. The results then follow from classical inapproximability bounds. Our work implies that, unless P=NP, the values of entangled-prover games cannot be computed by semidefinite programs that are polynomial in the size of the verifier's system, a method that has been successful for more restricted quantum games. △ Less

Submitted 21 November, 2007; v1 submitted 23 April, 2007; originally announced April 2007.

Comments: 26 pages, complete and much improved version with stronger results, supersedes the results in arXiv:quant-ph/0612063 proved with different techniques

arXiv:quant-ph/0612185 [pdf, ps, other]

Approaches to Quantum Error Correction

Authors: Julia Kempe

Abstract: The purpose of this little survey is to give a simple description of the main approaches to quantum error correction and quantum fault-tolerance. Our goal is to convey the necessary intuitions both for the problems and their solutions in this area. After characterising quantum errors we present several error-correction schemes and outline the elements of a full fledged fault-tolerant computation… ▽ More The purpose of this little survey is to give a simple description of the main approaches to quantum error correction and quantum fault-tolerance. Our goal is to convey the necessary intuitions both for the problems and their solutions in this area. After characterising quantum errors we present several error-correction schemes and outline the elements of a full fledged fault-tolerant computation, which works error-free even though all of its components can be faulty. We also mention alternative approaches to error-correction, so called error-avoiding or decoherence-free schemes. Technical details and generalisations are kept to a minimum. △ Less

Submitted 21 December, 2006; originally announced December 2006.

Comments: 29 pages, 5 figures, survey written for Poincare seminar lecture 19 Nov. 2005. Bookchapter in "Quantum Decoherence", Poincare seminar 2005, Progress in Mathematical Physics series, Birhaeuser, p. 85--123, 2006

arXiv:quant-ph/0612063 [pdf, ps, other]

On the Power of Entangled Quantum Provers

Authors: Julia Kempe, Thomas Vidick

Abstract: We show that the value of a general two-prover quantum game cannot be computed by a semi-definite program ofvpolynomial size (unless P=NP), a method that has been successful in more restricted quantum games. More precisely, we show that proof of membership in the NP-complete problem GAP-3D-Matching can be obtained by a 2-prover, 1-round quantum interactive proof system where the provers share en… ▽ More We show that the value of a general two-prover quantum game cannot be computed by a semi-definite program ofvpolynomial size (unless P=NP), a method that has been successful in more restricted quantum games. More precisely, we show that proof of membership in the NP-complete problem GAP-3D-Matching can be obtained by a 2-prover, 1-round quantum interactive proof system where the provers share entanglement, with perfect completeness and soundness s=1-2^(-O(n)), and such that the space of the verifier and the size of the messages are O(log n). This implies that QMIP^*_{log n,1,1-2^(-O(n))} \nsubseteq P unless P = NP and provides the first non-trivial lower bound on the power of entangled quantum provers, albeit with an exponentially small gap. The gap achievable by our proof system might in fact be larger, provided a certain conjecture on almost commuting versus nearly commuting projector matrices is true. △ Less

Submitted 8 December, 2006; originally announced December 2006.

Comments: 17 pages

arXiv:quant-ph/0611209 [pdf, ps, other]

Exponential separations for one-way quantum communication complexity, with applications to cryptography

Authors: Dmytro Gavinsky, Julia Kempe, Iordanis Kerenidis, Ran Raz, Ronald de Wolf

Abstract: We give an exponential separation between one-way quantum and classical communication protocols for a partial Boolean function (a variant of the Boolean Hidden Matching Problem of Bar-Yossef et al.) Earlier such an exponential separation was known only for a relational problem. The communication problem corresponds to a \emph{strong extractor} that fails against a small amount of \emph{quantum}… ▽ More We give an exponential separation between one-way quantum and classical communication protocols for a partial Boolean function (a variant of the Boolean Hidden Matching Problem of Bar-Yossef et al.) Earlier such an exponential separation was known only for a relational problem. The communication problem corresponds to a \emph{strong extractor} that fails against a small amount of \emph{quantum} information about its random source. Our proof uses the Fourier coefficients inequality of Kahn, Kalai, and Linial. We also give a number of applications of this separation. In particular, we show that there are privacy amplification schemes that are secure against classical adversaries but not against quantum adversaries; and we give the first example of a key-expansion scheme in the model of bounded-storage cryptography that is secure against classical memory-bounded adversaries but not against quantum ones. △ Less

Submitted 13 February, 2008; v1 submitted 20 November, 2006; originally announced November 2006.

Comments: 16 pages, improved version, supersedes quant-ph/0607173 and quant-ph/0607174 although some proofs are different

Journal ref: Proc. 39th STOC, p. 516-525 (2007)

arXiv:quant-ph/0607204 [pdf, ps, other]

Permutation groups, minimal degrees and quantum computing

Authors: Julia Kempe, Laszlo Pyber, Aner Shalev

Abstract: We study permutation groups of given minimal degree without the classical primitivity assumption. We provide sharp upper bounds on the order of a permutation group of minimal degree m and on the number of its elements of any given support. These results contribute to the foundations of a non-commutative coding theory. A main application of our results concerns the Hidden Subgroup Problem for t… ▽ More We study permutation groups of given minimal degree without the classical primitivity assumption. We provide sharp upper bounds on the order of a permutation group of minimal degree m and on the number of its elements of any given support. These results contribute to the foundations of a non-commutative coding theory. A main application of our results concerns the Hidden Subgroup Problem for the symmetric group in Quantum Computing. We completely characterize the hidden subgroups of the symmetric group that can be distinguished from identity with weak Quantum Fourier Sampling, showing these are exactly the subgroups with bounded minimal degree. This implies that the weak standard method for the symmetric group has no advantage whatsoever over classical exhaustive search. △ Less

Submitted 28 July, 2006; originally announced July 2006.

Comments: 28 pages, no figures

arXiv:quant-ph/0607174 [pdf, ps, other]

Exponential Separation of Quantum and Classical One-Way Communication Complexity for a Boolean Function

Authors: Dmytro Gavinsky, Julia Kempe, Ronald de Wolf

Abstract: We give an exponential separation between one-way quantum and classical communication complexity for a Boolean function. Earlier such a separation was known only for a relation. A very similar result was obtained earlier but independently by Kerenidis and Raz [KR06]. Our version of the result gives an example in the bounded storage model of cryptography, where the key is secure if the adversary… ▽ More We give an exponential separation between one-way quantum and classical communication complexity for a Boolean function. Earlier such a separation was known only for a relation. A very similar result was obtained earlier but independently by Kerenidis and Raz [KR06]. Our version of the result gives an example in the bounded storage model of cryptography, where the key is secure if the adversary has a certain amount of classical storage, but is completely insecure if he has a similar amount of quantum storage. △ Less

Submitted 25 July, 2006; originally announced July 2006.

Comments: 8 pages, no figures

arXiv:quant-ph/0603173 [pdf, ps, other]

Strengths and Weaknesses of Quantum Fingerprinting

Authors: Dmytro Gavinsky, Julia Kempe, Ronald de Wolf

Abstract: We study the power of quantum fingerprints in the simultaneous message passing (SMP) setting of communication complexity. Yao recently showed how to simulate, with exponential overhead, classical shared-randomness SMP protocols by means of quantum SMP protocols without shared randomness ($Q^\parallel$-protocols). Our first result is to extend Yao's simulation to the strongest possible model: eve… ▽ More We study the power of quantum fingerprints in the simultaneous message passing (SMP) setting of communication complexity. Yao recently showed how to simulate, with exponential overhead, classical shared-randomness SMP protocols by means of quantum SMP protocols without shared randomness ($Q^\parallel$-protocols). Our first result is to extend Yao's simulation to the strongest possible model: every many-round quantum protocol with unlimited shared entanglement can be simulated, with exponential overhead, by $Q^\parallel$-protocols. We apply our technique to obtain an efficient $Q^\parallel$-protocol for a function which cannot be efficiently solved through more restricted simulations. Second, we tightly characterize the power of the quantum fingerprinting technique by making a connection to arrangements of homogeneous halfspaces with maximal margin. These arrangements have been well studied in computational learning theory, and we use some strong results obtained in this area to exhibit weaknesses of quantum fingerprinting. In particular, this implies that for almost all functions, quantum fingerprinting protocols are exponentially worse than classical deterministic SMP protocols. △ Less

Submitted 20 March, 2006; originally announced March 2006.

Comments: 13 pages, no figures, to appear in CCC'06

Journal ref: Proc. 21st CCC (Complexity), p. 288-295 (2006)

arXiv:quant-ph/0511013 [pdf, ps, other]

Bounded-Error Quantum State Identification and Exponential Separations in Communication Complexity

Authors: Dmytro Gavinsky, Julia Kempe, Oded Regev, Ronald de Wolf

Abstract: We consider the problem of bounded-error quantum state identification: given either state α_0 or state α_1, we are required to output `0', `1' or `?' ("don't know"), such that conditioned on outputting `0' or `1', our guess is correct with high probability. The goal is to maximize the probability of not outputting `?'. We prove a direct product theorem: if we're given two such problems, with opt… ▽ More We consider the problem of bounded-error quantum state identification: given either state α_0 or state α_1, we are required to output `0', `1' or `?' ("don't know"), such that conditioned on outputting `0' or `1', our guess is correct with high probability. The goal is to maximize the probability of not outputting `?'. We prove a direct product theorem: if we're given two such problems, with optimal probabilities a and b, respectively, and the states in the first problem are pure, then the optimal probability for the joint bounded-error state identification problem is O(ab). Our proof is based on semidefinite programming duality and may be of wider interest. Using this result, we present two exponential separations in the simultaneous message passing model of communication complexity. Both are shown in the strongest possible sense. First, we describe a relation that can be computed with O(log n) classical bits of communication in the presence of shared randomness, but needs Omega(n^{1/3}) communication if the parties don't share randomness, even if communication is quantum. This shows the optimality of Yao's recent exponential simulation of shared-randomness protocols by quantum protocols without shared randomness. Second, we describe a relation that can be computed with O(log n) classical bits of communication in the presence of shared entanglement, but needs Omega((n/log n)^{1/3}) communication if the parties share randomness but no entanglement, even if communication is quantum. This is the first example in communication complexity of a situation where entanglement buys you much more than quantum communication does. △ Less

Submitted 2 November, 2005; originally announced November 2005.

Comments: 20 pages, no figures

arXiv:quant-ph/0411051 [pdf, ps, other]

Quantum Communication Cannot Simulate a Public Coin

Authors: Dmytro Gavinsky, Julia Kempe, Ronald de Wolf

Abstract: We study the simultaneous message passing model of communication complexity. Building on the quantum fingerprinting protocol of Buhrman et al., Yao recently showed that a large class of efficient classical public-coin protocols can be turned into efficient quantum protocols without public coin. This raises the question whether this can be done always, i.e. whether quantum communication can alway… ▽ More We study the simultaneous message passing model of communication complexity. Building on the quantum fingerprinting protocol of Buhrman et al., Yao recently showed that a large class of efficient classical public-coin protocols can be turned into efficient quantum protocols without public coin. This raises the question whether this can be done always, i.e. whether quantum communication can always replace a public coin in the SMP model. We answer this question in the negative, exhibiting a communication problem where classical communication with public coin is exponentially more efficient than quantum communication. Together with a separation in the other direction due to Bar-Yossef et al., this shows that the quantum SMP model is incomparable with the classical public-coin SMP model. In addition we give a characterization of the power of quantum fingerprinting by means of a connection to geometrical tools from machine learning, a quadratic improvement of Yao's simulation, and a nearly tight analysis of the Hamming distance problem from Yao's paper. △ Less

Submitted 8 November, 2004; originally announced November 2004.

Comments: 12 pages LaTeX

arXiv:quant-ph/0409084 [pdf, ps, other]

doi 10.1109/TAC.2006.871942

Generalized Performance of Concatenated Quantum Codes -- A Dynamical Systems Approach

Authors: Jesse Fern, Julia Kempe, Slobodan Simic, Shankar Sastry

Abstract: We apply a dynamical systems approach to concatenation of quantum error correcting codes, extending and generalizing the results of Rahn et al. [1] to both diagonal and nondiagonal channels. Our point of view is global: instead of focusing on particular types of noise channels, we study the geometry of the coding map as a discrete-time dynamical system on the entire space of noise channels. In t… ▽ More We apply a dynamical systems approach to concatenation of quantum error correcting codes, extending and generalizing the results of Rahn et al. [1] to both diagonal and nondiagonal channels. Our point of view is global: instead of focusing on particular types of noise channels, we study the geometry of the coding map as a discrete-time dynamical system on the entire space of noise channels. In the case of diagonal channels, we show that any code with distance at least three corrects (in the infinite concatenation limit) an open set of errors. For Calderbank-Shor-Steane (CSS) codes, we give a more precise characterization of that set. We show how to incorporate noise in the gates, thus completing the framework. We derive some general bounds for noise channels, which allows us to analyze several codes in detail. △ Less

Submitted 15 March, 2006; v1 submitted 15 September, 2004; originally announced September 2004.

Comments: 12 pages two-column format, no figures, slightly revised version

Journal ref: IEEE Trans. on Automatic Control 51:448-459 (March 2006)

arXiv:cond-mat/0407780 [pdf, ps, other]

doi 10.1103/PhysRevB.72.064511

Full protection of superconducting qubit systems from coupling errors

Authors: M. J. Storcz, J. Vala, K. R. Brown, J. Kempe, F. K. Wilhelm, K. B. Whaley

Abstract: Solid state qubits realized in superconducting circuits are potentially extremely scalable. However, strong decoherence may be transferred to the qubits by various elements of the circuits that couple individual qubits, particularly when coupling is implemented over long distances. We propose here an encoding that provides full protection against errors originating from these coupling elements,… ▽ More Solid state qubits realized in superconducting circuits are potentially extremely scalable. However, strong decoherence may be transferred to the qubits by various elements of the circuits that couple individual qubits, particularly when coupling is implemented over long distances. We propose here an encoding that provides full protection against errors originating from these coupling elements, for a chain of superconducting qubits with a nearest neighbor anisotropic XY-interaction. The encoding is also seen to provide partial protection against errors deriving from general electronic noise. △ Less

Submitted 9 August, 2005; v1 submitted 29 July, 2004; originally announced July 2004.

Comments: 4 pages, 1 figure

Report number: LMU-ASC 34/05

Journal ref: Phys. Rev. B 72, 064511 (2005)

arXiv:quant-ph/0406180 [pdf, ps, other]

The Complexity of the Local Hamiltonian Problem

Authors: Julia Kempe, Alexei Kitaev, Oded Regev

Abstract: The k-local Hamiltonian problem is a natural complete problem for the complexity class QMA, the quantum analog of NP. It is similar in spirit to MAX-k-SAT, which is NP-complete for k<=2. It was known that the problem is QMA-complete for any k <= 3. On the other hand 1-local Hamiltonian is in P, and hence not believed to be QMA-complete. The complexity of the 2-local Hamiltonian problem has long… ▽ More The k-local Hamiltonian problem is a natural complete problem for the complexity class QMA, the quantum analog of NP. It is similar in spirit to MAX-k-SAT, which is NP-complete for k<=2. It was known that the problem is QMA-complete for any k <= 3. On the other hand 1-local Hamiltonian is in P, and hence not believed to be QMA-complete. The complexity of the 2-local Hamiltonian problem has long been outstanding. Here we settle the question and show that it is QMA-complete. We provide two independent proofs; our first proof uses only elementary linear algebra. Our second proof uses a powerful technique for analyzing the sum of two Hamiltonians; this technique is based on perturbation theory and we believe that it might prove useful elsewhere. Using our techniques we also show that adiabatic computation with two-local interactions on qubits is equivalent to standard quantum computation. △ Less

Submitted 2 October, 2005; v1 submitted 24 June, 2004; originally announced June 2004.

Comments: 30 pages, 3 figures, replaced with revised version, numerous improvements to readability and expanded adiabatic section

Journal ref: SIAM Journal of Computing, Vol. 35(5), p. 1070-1097 (2006), conference version in Proc. 24th FSTTCS, p. 372-383 (2004)

arXiv:quant-ph/0406046 [pdf, ps, other]

The hidden subgroup problem and permutation group theory

Authors: Julia Kempe, Aner Shalev

Abstract: We employ concepts and tools from the theory of finite permutation groups in order to analyse the Hidden Subgroup Problem via Quantum Fourier Sampling (QFS) for the symmetric group. We show that under very general conditions both the weak and the random-strong form (strong form with random choices of basis) of QFS fail to provide any advantage over classical exhaustive search. In particular we g… ▽ More We employ concepts and tools from the theory of finite permutation groups in order to analyse the Hidden Subgroup Problem via Quantum Fourier Sampling (QFS) for the symmetric group. We show that under very general conditions both the weak and the random-strong form (strong form with random choices of basis) of QFS fail to provide any advantage over classical exhaustive search. In particular we give a complete characterisation of polynomial size subgroups, and of primitive subgroups, that can be distinguished from the identity subgroup with the above methods. Furthermore, assuming a plausible group theoretic conjecture for which we give supporting evidence, we show that weak and random-strong QFS for the symmetric group have no advantage whatsoever over classical search. △ Less

Submitted 8 June, 2004; originally announced June 2004.

Comments: 12 pages

Journal ref: Proc. 16th ACM-SIAM SODA, p. 1118-1125 (2005)

arXiv:quant-ph/0405098 [pdf, ps, other]

Adiabatic Quantum Computation is Equivalent to Standard Quantum Computation

Authors: Dorit Aharonov, Wim van Dam, Julia Kempe, Zeph Landau, Seth Lloyd, Oded Regev

Abstract: Adiabatic quantum computation has recently attracted attention in the physics and computer science communities, but its computational power was unknown. We describe an efficient adiabatic simulation of any given quantum algorithm, which implies that the adiabatic computation model and the conventional quantum computation model are polynomially equivalent. Our result can be extended to the physic… ▽ More Adiabatic quantum computation has recently attracted attention in the physics and computer science communities, but its computational power was unknown. We describe an efficient adiabatic simulation of any given quantum algorithm, which implies that the adiabatic computation model and the conventional quantum computation model are polynomially equivalent. Our result can be extended to the physically realistic setting of particles arranged on a two-dimensional grid with nearest neighbor interactions. The equivalence between the models provides a new vantage point from which to tackle the central issues in quantum computation, namely designing new quantum algorithms and constructing fault tolerant quantum computers. In particular, by translating the main open questions in the area of quantum algorithms to the language of spectral gaps of sparse matrices, the result makes these questions accessible to a wider scientific audience, acquainted with mathematical physics, expander theory and rapidly mixing Markov chains. △ Less

Submitted 26 March, 2005; v1 submitted 18 May, 2004; originally announced May 2004.

Comments: 30 pages, updated version

Journal ref: SIAM Journal of Computing, Vol. 37, Issue 1, p. 166-194 (2007), conference version in Proc. 45th FOCS, p. 42-51 (2004)

arXiv:quant-ph/0405086 [pdf, ps, other]

Quantum Color-Coding Is Better

Authors: Joshua Von Korff, Julia Kempe

Abstract: We describe a quantum scheme to ``color-code'' a set of objects in order to record which one is which. In the classical case, N distinct colors are required to color-code N objects. We show that in the quantum case, only N/e distinct ``colors'' are required, where e = 2.71828 . . . If the number of colors is less than optimal, the objects may still be correctly distinguished with some success pr… ▽ More We describe a quantum scheme to ``color-code'' a set of objects in order to record which one is which. In the classical case, N distinct colors are required to color-code N objects. We show that in the quantum case, only N/e distinct ``colors'' are required, where e = 2.71828 . . . If the number of colors is less than optimal, the objects may still be correctly distinguished with some success probability less than 1. We show that the success probability of the quantum scheme is better than the corresponding classical one and is information-theoretically optimal. △ Less

Submitted 16 May, 2004; originally announced May 2004.

Comments: 4 pages

Journal ref: Phys. Rev. Lett., Vol. 93 (26), 260502 (2004); new title "Quantum Advantage in Transmitting a Permutation"

arXiv:quant-ph/0402107 [pdf, ps, other]

Coins Make Quantum Walks Faster

Authors: Andris Ambainis, Julia Kempe, Alexander Rivosh

Abstract: We show how to search N items arranged on a $\sqrt{N}\times\sqrt{N}$ grid in time $O(\sqrt N \log N)$, using a discrete time quantum walk. This result for the first time exhibits a significant difference between discrete time and continuous time walks without coin degrees of freedom, since it has been shown recently that such a continuous time walk needs time $Ω(N)$ to perform the same task. Our… ▽ More We show how to search N items arranged on a $\sqrt{N}\times\sqrt{N}$ grid in time $O(\sqrt N \log N)$, using a discrete time quantum walk. This result for the first time exhibits a significant difference between discrete time and continuous time walks without coin degrees of freedom, since it has been shown recently that such a continuous time walk needs time $Ω(N)$ to perform the same task. Our result furthermore improves on a previous bound for quantum local search by Aaronson and Ambainis. We generalize our result to 3 and more dimensions where the walk yields the optimal performance of $O(\sqrt{N})$ and give several extensions of quantum walk search algorithms for general graphs. The coin-flip operation needs to be chosen judiciously: we show that another ``natural'' choice of coin gives a walk that takes $Ω(N)$ steps. We also show that in 2 dimensions it is sufficient to have a two-dimensional coin-space to achieve the time $O(\sqrt{N} \log N)$. △ Less

Submitted 16 February, 2004; originally announced February 2004.

Comments: 25 pages, no figures

Journal ref: Proc. 16th ACM-SIAM SODA, p. 1099-1108 (2005)

arXiv:quant-ph/0309002 [pdf, ps, other]

An Explicit Universal Gate-set for Exchange-Only Quantum Computation

Authors: M. Hsieh, J. Kempe, S. Myrgren, K. B. Whaley

Abstract: A single physical interaction might not be universal for quantum computation in general. It has been shown, however, that in some cases it can generate universal quantum computation over a subspace. For example, by encoding logical qubits into arrays of multiple physical qubits, a single isotropic or anisotropic exchange interaction can generate a universal logical gate-set. Recently, encoded un… ▽ More A single physical interaction might not be universal for quantum computation in general. It has been shown, however, that in some cases it can generate universal quantum computation over a subspace. For example, by encoding logical qubits into arrays of multiple physical qubits, a single isotropic or anisotropic exchange interaction can generate a universal logical gate-set. Recently, encoded universality for the exchange interaction was explicitly demonstrated on three-qubit arrays, the smallest nontrivial encoding. We now present the exact specification of a discrete universal logical gate-set on four-qubit arrays. We show how to implement the single qubit operations exactly with at most 3 nearest neighbor exchange operations and how to generate the encoded controlled-not with 29 parallel nearest neighbor exchange interactions or 54 serial gates, obtained from extensive numerical optimization using genetic algorithms and Nelder-Mead searches. Our gate-sequences are immediately applicable to implementations of quantum circuits with the exchange interaction. △ Less

Submitted 22 December, 2003; v1 submitted 29 August, 2003; originally announced September 2003.

Comments: 16 pages, 6 figures, new appendix and figures, revised version as accepted for publication

Journal ref: Quantum Information Processing, Vol. 2 (4), p. 289-307, 2003

arXiv:quant-ph/0303081 [pdf, ps, other]

doi 10.1080/00107151031000110776

Quantum random walks - an introductory overview

Authors: Julia Kempe

Abstract: This article aims to provide an introductory survey on quantum random walks. Starting from a physical effect to illustrate the main ideas we will introduce quantum random walks, review some of their properties and outline their striking differences to classical walks. We will touch upon both physical effects and computer science applications, introducing some of the main concepts and language of… ▽ More This article aims to provide an introductory survey on quantum random walks. Starting from a physical effect to illustrate the main ideas we will introduce quantum random walks, review some of their properties and outline their striking differences to classical walks. We will touch upon both physical effects and computer science applications, introducing some of the main concepts and language of present day quantum information science in this context. We will mention recent developments in this new area and outline some open questions. △ Less

Submitted 13 March, 2003; originally announced March 2003.

Comments: 20 pages, 13 figures, to appear in Contemporary Physics

Journal ref: Contemporary Physics, Vol. 44 (4), p.307-327, 2003

arXiv:quant-ph/0302079 [pdf, ps, other]

3-Local Hamiltonian is QMA-complete

Authors: Julia Kempe, Oded Regev

Abstract: It has been shown by Kitaev that the 5-local Hamiltonian problem is QMA-complete. Here we reduce the locality of the problem by showing that 3-local Hamiltonian is already QMA-complete. It has been shown by Kitaev that the 5-local Hamiltonian problem is QMA-complete. Here we reduce the locality of the problem by showing that 3-local Hamiltonian is already QMA-complete. △ Less

Submitted 20 May, 2003; v1 submitted 10 February, 2003; originally announced February 2003.

Comments: 7 pages, minor changes and corrections, published version

Journal ref: Quantum Computation and Information, Vol. 3(3), p. 258-64, 2003

arXiv:quant-ph/0210064 [pdf, ps, other]

doi 10.1103/PhysRevA.67.052307

A Quantum Random Walk Search Algorithm

Authors: Neil Shenvi, Julia Kempe, K. Birgitta Whaley

Abstract: Quantum random walks on graphs have been shown to display many interesting properties, including exponentially fast hitting times when compared with their classical counterparts. However, it is still unclear how to use these novel properties to gain an algorithmic speed-up over classical algorithms. In this paper, we present a quantum search algorithm based on the quantum random walk architectur… ▽ More Quantum random walks on graphs have been shown to display many interesting properties, including exponentially fast hitting times when compared with their classical counterparts. However, it is still unclear how to use these novel properties to gain an algorithmic speed-up over classical algorithms. In this paper, we present a quantum search algorithm based on the quantum random walk architecture that provides such a speed-up. It will be shown that this algorithm performs an oracle search on a database of $N$ items with $O(\sqrt{N})$ calls to the oracle, yielding a speed-up similar to other quantum search algorithms. It appears that the quantum random walk formulation has considerable flexibility, presenting interesting opportunities for development of other, possibly novel quantum algorithms. △ Less

Submitted 9 October, 2002; originally announced October 2002.

Comments: 13 pages, 3 figures

Journal ref: Phys. Rev. A, Vol. 67 (5), 052307 (2003)

arXiv:quant-ph/0205083 [pdf, ps, other]

Quantum Random Walks Hit Exponentially Faster

Authors: Julia Kempe

Abstract: We show that the hitting time of the discrete time quantum random walk on the n-bit hypercube from one corner to its opposite is polynomial in n. This gives the first exponential quantum-classical gap in the hitting time of discrete quantum random walks. We provide the framework for quantum hitting time and give two alternative definitions to set the ground for its study on general graphs. We th… ▽ More We show that the hitting time of the discrete time quantum random walk on the n-bit hypercube from one corner to its opposite is polynomial in n. This gives the first exponential quantum-classical gap in the hitting time of discrete quantum random walks. We provide the framework for quantum hitting time and give two alternative definitions to set the ground for its study on general graphs. We then give an application to random routing. △ Less

Submitted 14 May, 2002; originally announced May 2002.

Comments: 15 pages, no Figures

Journal ref: Probability Theory and Related Fields, Vol. 133(2), p. 215-235 (2005), conference version in Proc. 7th RANDOM, p. 354-69, 2003

arXiv:quant-ph/0112014 [pdf, ps, other]

doi 10.1103/PhysRevA.65.052330

Exact gate-sequences for universal quantum computation using the XY-interaction alone

Authors: J. Kempe, K. B. Whaley

Abstract: In a previous publication [1] we showed that it is possible to implement universal quantum computation with the anisotropic XY-Heisenberg exchange acting as a single interaction. To achieve this we used encodings of the states of the computation into a larger Hilbert space. This proof is non- constructive, however, and did not explicitly give the trade-offs in time that are required to implement… ▽ More In a previous publication [1] we showed that it is possible to implement universal quantum computation with the anisotropic XY-Heisenberg exchange acting as a single interaction. To achieve this we used encodings of the states of the computation into a larger Hilbert space. This proof is non- constructive, however, and did not explicitly give the trade-offs in time that are required to implement encoded single qubit operations and encoded two-qubit gates. Here we explicitly give the gate-sequences needed to simulate these operations on encoded qubits and qutrits (three-level systems) and analyze the trade-offs involved. We also propose a possible layout for the qubits in a triangular arrangement. △ Less

Submitted 22 May, 2002; v1 submitted 3 December, 2001; originally announced December 2001.

Comments: 6 pages, 5 figures

Journal ref: Phys. Rev. A, Vol. 65 (5), 052330 (2002)

Showing 1–50 of 64 results for author: Kempe, J