Skip to main content

Showing 1–50 of 83 results for author: Choromanski, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19800  [pdf, other

    cs.LG cs.RO

    Modeling the Real World with High-Density Visual Particle Dynamics

    Authors: William F. Whitney, Jacob Varley, Deepali Jain, Krzysztof Choromanski, Sumeet Singh, Vikas Sindhwani

    Abstract: We present High-Density Visual Particle Dynamics (HD-VPD), a learned world model that can emulate the physical dynamics of real scenes by processing massive latent point clouds containing 100K+ particles. To enable efficiency at this scale, we introduce a novel family of Point Cloud Transformers (PCTs) called Interlacers leveraging intertwined linear-attention Performer layers and graph-based neig… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.17740  [pdf, other

    cs.LG cs.AI cs.CV

    Structured Unrestricted-Rank Matrices for Parameter Efficient Fine-tuning

    Authors: Arijit Sehanobish, Avinava Dubey, Krzysztof Choromanski, Somnath Basu Roy Chowdhury, Deepali Jain, Vikas Sindhwani, Snigdha Chaturvedi

    Abstract: Recent efforts to scale Transformer models have demonstrated rapid progress across a wide range of tasks (Wei et al., 2022). However, fine-tuning these models for downstream tasks is expensive due to their large parameter counts. Parameter-efficient fine-tuning (PEFT) approaches have emerged as a viable alternative by allowing us to fine-tune models by updating only a small number of parameters. I… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Work in progress

  3. arXiv:2406.16257  [pdf, other

    cs.LG

    Towards Scalable Exact Machine Unlearning Using Parameter-Efficient Fine-Tuning

    Authors: Somnath Basu Roy Chowdhury, Krzysztof Choromanski, Arijit Sehanobish, Avinava Dubey, Snigdha Chaturvedi

    Abstract: Machine unlearning is the process of efficiently removing the influence of a training data instance from a trained machine learning model without retraining it from scratch. A popular subclass of unlearning approaches is exact machine unlearning, which focuses on techniques that explicitly guarantee the removal of the influence of a data instance from a model. Exact unlearning approaches use a mac… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  4. arXiv:2406.15881  [pdf, other

    cs.LG cs.AI

    Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers

    Authors: Krzysztof Choromanski, Arijit Sehanobish, Somnath Basu Roy Chowdhury, Han Lin, Avinava Dubey, Tamas Sarlos, Snigdha Chaturvedi

    Abstract: We present a new class of fast polylog-linear algorithms based on the theory of structured matrices (in particular low displacement rank) for integrating tensor fields defined on weighted trees. Several applications of the resulting fast tree-field integrators (FTFIs) are presented, including (a) approximation of graph metrics with tree metrics, (b) graph classification, (c) modeling on meshes, an… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Preprint. Comments welcome

  5. arXiv:2405.16541  [pdf, other

    stat.ML cs.LG

    Variance-Reducing Couplings for Random Features: Perspectives from Optimal Transport

    Authors: Isaac Reid, Stratis Markou, Krzysztof Choromanski, Richard E. Turner, Adrian Weller

    Abstract: Random features (RFs) are a popular technique to scale up kernel methods in machine learning, replacing exact kernel evaluations with stochastic Monte Carlo estimates. They underpin models as diverse as efficient transformers (by approximating attention) to sparse spectrum Gaussian processes (by approximating the covariance function). Efficiency can be further improved by speeding up the convergen… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  6. arXiv:2404.03570  [pdf, other

    cs.RO

    Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity

    Authors: Jake Varley, Sumeet Singh, Deepali Jain, Krzysztof Choromanski, Andy Zeng, Somnath Basu Roy Chowdhury, Avinava Dubey, Vikas Sindhwani

    Abstract: We present an embodied AI system which receives open-ended natural language instructions from a human, and controls two arms to collaboratively accomplish potentially long-horizon tasks over a large workspace. Our system is modular: it deploys state of the art Large Language Models for task planning,Vision-Language models for semantic perception, and Point Cloud transformers for gras**. With sem… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  7. arXiv:2312.01990  [pdf, other

    cs.RO cs.AI

    SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention

    Authors: Isabel Leal, Krzysztof Choromanski, Deepali Jain, Avinava Dubey, Jake Varley, Michael Ryoo, Yao Lu, Frederick Liu, Vikas Sindhwani, Quan Vuong, Tamas Sarlos, Ken Oslund, Karol Hausman, Kanishka Rao

    Abstract: We present Self-Adaptive Robust Attention for Robotics Transformers (SARA-RT): a new paradigm for addressing the emerging challenge of scaling up Robotics Transformers (RT) for on-robot deployment. SARA-RT relies on the new method of fine-tuning proposed by us, called up-training. It converts pre-trained or already fine-tuned Transformer-based robotic policies of quadratic time complexity (includi… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  8. arXiv:2310.13225  [pdf, other

    cs.LG cs.AI

    Scalable Neural Network Kernels

    Authors: Arijit Sehanobish, Krzysztof Choromanski, Yunfan Zhao, Avinava Dubey, Valerii Likhosherstov

    Abstract: We introduce the concept of scalable neural network kernels (SNNKs), the replacements of regular feedforward layers (FFLs), capable of approximating the latter, but with favorable computational properties. SNNKs effectively disentangle the inputs from the parameters of the neural network in the FFL, only to connect them in the final computation via the dot-product kernel. They are also strictly mo… ▽ More

    Submitted 5 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  9. arXiv:2310.04859  [pdf, other

    stat.ML cs.LG

    General Graph Random Features

    Authors: Isaac Reid, Krzysztof Choromanski, Eli Berger, Adrian Weller

    Abstract: We propose a novel random walk-based algorithm for unbiased estimation of arbitrary functions of a weighted adjacency matrix, coined universal graph random features (u-GRFs). This includes many of the most popular examples of kernels defined on the nodes of a graph. Our algorithm enjoys subquadratic time complexity with respect to the number of nodes, overcoming the notoriously prohibitive cubic s… ▽ More

    Submitted 24 May, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

  10. arXiv:2310.04854  [pdf, other

    stat.ML cs.LG

    Repelling Random Walks

    Authors: Isaac Reid, Eli Berger, Krzysztof Choromanski, Adrian Weller

    Abstract: We present a novel quasi-Monte Carlo mechanism to improve graph-based sampling, coined repelling random walks. By inducing correlations between the trajectories of an interacting ensemble such that their marginal transition probabilities are unmodified, we are able to explore the graph more efficiently, improving the concentration of statistical estimators whilst leaving them unbiased. The mechani… ▽ More

    Submitted 24 May, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

  11. arXiv:2309.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Augmenting conformers with structured state-space sequence models for online speech recognition

    Authors: Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath

    Abstract: Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems. In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space sequence models (S4), a family of models that provide a parameter-efficient way of accessing arbitrarily long left context. We performed systematic ablat… ▽ More

    Submitted 27 December, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: ICASSP 2024

  12. Robotic Table Tennis: A Case Study into a High Speed Learning System

    Authors: David B. D'Ambrosio, Jonathan Abelian, Saminda Abeyruwan, Michael Ahn, Alex Bewley, Justin Boyd, Krzysztof Choromanski, Omar Cortes, Erwin Coumans, Tianli Ding, Wenbo Gao, Laura Graesser, Atil Iscen, Navdeep Jaitly, Deepali Jain, Juhana Kangaspunta, Satoshi Kataoka, Gus Kouretas, Yuheng Kuang, Nevena Lazic, Corey Lynch, Reza Mahjourian, Sherry Q. Moore, Thinh Nguyen, Ken Oslund , et al. (10 additional authors not shown)

    Abstract: We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real w… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: Published and presented at Robotics: Science and Systems (RSS2023)

  13. arXiv:2307.15818  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal , et al. (29 additional authors not shown)

    Abstract: We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Website: https://robotics-transformer.github.io/

  14. arXiv:2306.08205  [pdf, other

    cs.RO

    Agile Catching with Whole-Body MPC and Blackbox Policy Learning

    Authors: Saminda Abeyruwan, Alex Bewley, Nicholas M. Boffi, Krzysztof Choromanski, David D'Ambrosio, Deepali Jain, Pannag Sanketi, Anish Shankar, Vikas Sindhwani, Sumeet Singh, Jean-Jacques Slotine, Stephen Tu

    Abstract: We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) M… ▽ More

    Submitted 19 October, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: L4DC 2023

  15. arXiv:2305.12470  [pdf, other

    stat.ML cs.LG

    Quasi-Monte Carlo Graph Random Features

    Authors: Isaac Reid, Krzysztof Choromanski, Adrian Weller

    Abstract: We present a novel mechanism to improve the accuracy of the recently-introduced class of graph random features (GRFs). Our method induces negative correlations between the lengths of the algorithm's random walks by imposing antithetic termination: a procedure to sample more diverse random walks which may be of independent interest. It has a trivial drop-in implementation. We derive strong theoreti… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  16. arXiv:2305.00156  [pdf, other

    cs.LG

    Taming graph kernels with random features

    Authors: Krzysztof Choromanski

    Abstract: We introduce in this paper the mechanism of graph random features (GRFs). GRFs can be used to construct unbiased randomized estimators of several important kernels defined on graphs' nodes, in particular the regularized Laplacian kernel. As regular RFs for non-graph kernels, they provide means to scale up kernel methods defined on graphs to larger networks. Importantly, they give substantial compu… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

  17. arXiv:2304.00171  [pdf, other

    cs.CL cs.SD eess.AS

    Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR

    Authors: Rami Botros, Anmol Gulati, Tara N. Sainath, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu

    Abstract: Conformer models maintain a large number of internal states, the vast majority of which are associated with self-attention layers. With limited memory bandwidth, reading these from memory at each inference step can slow down inference. In this paper, we design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs. We explore various ideas to impr… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

  18. arXiv:2302.01925  [pdf, other

    cs.LG

    Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

    Authors: Krzysztof Marcin Choromanski, Shanda Li, Valerii Likhosherstov, Kumar Avinava Dubey, Shengjie Luo, Di He, Yiming Yang, Tamas Sarlos, Thomas Weingarten, Adrian Weller

    Abstract: We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs). These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces. FLTs construct the optimal RPE mechanism implicitly by learning… ▽ More

    Submitted 3 April, 2024; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: AISTATS 2024

  19. arXiv:2302.01128  [pdf, other

    cs.LG cs.AI

    Mnemosyne: Learning to Train Transformers with Transformers

    Authors: Deepali Jain, Krzysztof Marcin Choromanski, Avinava Dubey, Sumeet Singh, Vikas Sindhwani, Tingnan Zhang, Jie Tan

    Abstract: In this work, we propose a new class of learnable optimizers, called \textit{Mnemosyne}. It is based on the novel spatio-temporal low-rank implicit attention Transformers that can learn to train entire neural network architectures, including other Transformers, without any task-specific optimizer tuning. We show that Mnemosyne: (a) outperforms popular LSTM optimizers (also with new feature enginee… ▽ More

    Submitted 16 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  20. arXiv:2302.00942  [pdf, other

    cs.LG

    Efficient Graph Field Integrators Meet Point Clouds

    Authors: Krzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian Weller

    Abstract: We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Metho… ▽ More

    Submitted 4 October, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Journal ref: ICML 2023

  21. arXiv:2302.00787  [pdf, other

    cs.LG stat.ML

    FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

    Abstract: The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result. Such operators emerge in important applications ranging from kernel methods to efficient Transformers. We propose parameterized, positive, non-trigonometric RFs which approximate Gaussian… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  22. arXiv:2301.13856  [pdf, other

    stat.ML cs.LG

    Simplex Random Features

    Authors: Isaac Reid, Krzysztof Choromanski, Valerii Likhosherstov, Adrian Weller

    Abstract: We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels by geometrical correlation of random projection vectors. We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels among the class of weight-independent geometrically-coupled positive random feature (… ▽ More

    Submitted 7 October, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

  23. arXiv:2211.14312  [pdf

    q-bio.QM cs.CV cs.LG eess.IV

    Karyotype AI for Precision Oncology

    Authors: Zahra Shamsi, Drew Bryant, Jacob Wilson, Xiaoyu Qu, Avinava Dubey, Konik Kothari, Mostafa Dehghani, Mariya Chavarha, Valerii Likhosherstov, Brian Williams, Michael Frumkin, Fred Appelbaum, Krzysztof Choromanski, Ali Bashir, Min Fang

    Abstract: Chromosome analysis is essential for diagnosing genetic disorders. For hematologic malignancies, identification of somatic clonal aberrations by karyotype analysis remains the standard of care. However, karyoty** is costly and time-consuming because of the largely manual process and the expertise required in identifying and annotating aberrations. Efforts to automate karyotype analysis to date f… ▽ More

    Submitted 19 October, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

  24. arXiv:2209.10780  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation

    Authors: Xuesu Xiao, Tingnan Zhang, Krzysztof Choromanski, Edward Lee, Anthony Francis, Jake Varley, Stephen Tu, Sumeet Singh, Peng Xu, Fei Xia, Sven Mikael Persson, Dmitry Kalashnikov, Leila Takayama, Roy Frostig, Jie Tan, Carolina Parada, Vikas Sindhwani

    Abstract: Despite decades of research, existing navigation systems still face real-world challenges when deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints from Model Predictive Control (MPC). Our approach… ▽ More

    Submitted 23 September, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

  25. arXiv:2209.06291  [pdf, other

    cs.CV cs.RO

    Multiple View Performers for Shape Completion

    Authors: David Watkins, Peter Allen, Krzysztof Choromanski, Jacob Varley, Nicholas Waytowich

    Abstract: We propose the Multiple View Performer (MVP) - a new architecture for 3D shape completion from a series of temporally sequential views. MVP accomplishes this task by using linear-attention Transformers called Performers. Our model allows the current observation of the scene to attend to the previous ones for more accurate infilling. The history of past observations is compressed via the compact as… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 6 pages, 2 pages of references, 6 figures, 3 tables

  26. arXiv:2208.01191  [pdf, other

    cs.LG cs.AI cs.NE

    Implicit Two-Tower Policies

    Authors: Yunfan Zhao, Qingkai Pan, Krzysztof Choromanski, Deepali Jain, Vikas Sindhwani

    Abstract: We present a new class of structured reinforcement learning policy-architectures, Implicit Two-Tower (ITT) policies, where the actions are chosen based on the attention scores of their learnable latent representations with those of the input states. By explicitly disentangling action from state processing in the policy stack, we achieve two main goals: substantial computational gains and better pe… ▽ More

    Submitted 25 October, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

  27. arXiv:2207.06572  [pdf, other

    cs.RO

    i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops

    Authors: Saminda Abeyruwan, Laura Graesser, David B. D'Ambrosio, Avi Singh, Anish Shankar, Alex Bewley, Deepali Jain, Krzysztof Choromanski, Pannag R. Sanketi

    Abstract: Sim-to-real transfer is a powerful paradigm for robotic reinforcement learning. The ability to train policies in simulation enables safe exploration and large-scale data collection quickly at low cost. However, prior works in sim-to-real transfer of robotic policies typically do not involve any human-robot interaction because accurately simulating human behavior is an open problem. In this work, o… ▽ More

    Submitted 21 November, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: 8+24 pages

  28. arXiv:2205.15317  [pdf, other

    cs.LG cs.AI

    Chefs' Random Tables: Non-Trigonometric Random Features

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

    Abstract: We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels. CRTs are an alternative to standard random kitchen sink (RKS) methods, which inherently rely on the trigonometric maps. We present variants of CRTs where RFs are positive, a key requirement for applications in recent low-rank Transformers. Further variance r… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  29. arXiv:2204.00598  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

    Authors: Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence

    Abstract: Large pretrained (e.g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on. While these domains are generic, they may only barely overlap. For example, visual-language models (VLMs) are trained on Internet-scale image captions, but large language models (LMs) are further trained on Internet-scale text with no images (e.g., spreadsheets, SAT quest… ▽ More

    Submitted 27 May, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: https://socraticmodels.github.io/

  30. arXiv:2111.12993  [pdf, other

    cs.CV cs.LG

    PolyViT: Co-training Vision Transformers on Images, Videos and Audio

    Authors: Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

    Abstract: Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters? We present PolyViT, a model trained on image, audio and video which answers this question. By co-training different tasks on a single modality, we are able to improve the accuracy of each individual task and achieve state-of-the-art results on 5 sta… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

  31. arXiv:2110.04367  [pdf, other

    cs.LG stat.ML

    Hybrid Random Features

    Authors: Krzysztof Choromanski, Haoxian Chen, Han Lin, Yuanzhe Ma, Arijit Sehanobish, Deepali Jain, Michael S Ryoo, Jake Varley, Andy Zeng, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

    Abstract: We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest. Special instantiations of HRFs lead to well-known methods such as trigonometric (Rahimi and Recht, 2007) or (recently introduced in the… ▽ More

    Submitted 30 January, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at ICLR 2022

  32. arXiv:2107.07999  [pdf, other

    cs.LG cs.AI

    From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

    Authors: Krzysztof Choromanski, Han Lin, Haoxian Chen, Tianyi Zhang, Arijit Sehanobish, Valerii Likhosherstov, Jack Parker-Holder, Tamas Sarlos, Adrian Weller, Thomas Weingarten

    Abstract: In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way. We show that recent results on linear causal attention (Choromanski et al., 2021) and log-linear RPE-attention (Luo et al., 2021) are special cases of this general mechanism. However by casting the problem as a topo… ▽ More

    Submitted 27 March, 2023; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: 20 pages, 12 figures

  33. arXiv:2106.03764  [pdf, other

    cs.LG

    On the Expressive Power of Self-Attention Matrices

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Adrian Weller

    Abstract: Transformer networks are able to capture patterns in data coming from many domains (text, images, videos, proteins, etc.) with little or no change to architecture components. We perform a theoretical analysis of the core component responsible for signal propagation between elements, i.e. the self-attention matrix. In practice, this matrix typically exhibits two properties: (1) it is sparse, meanin… ▽ More

    Submitted 8 June, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

  34. arXiv:2106.02487  [pdf, other

    cs.LG

    Debiasing a First-order Heuristic for Approximate Bi-level Optimization

    Authors: Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

    Abstract: Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops. While ABLO has many applications across deep learning, it suffers from time and memory complexity proportional to the length $r$ of its inner optimization loop. To address this complexity, an earlier first-order method (FOM) was proposed as a heuristic tha… ▽ More

    Submitted 8 June, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021. arXiv admin note: text overlap with arXiv:2006.03631

  35. arXiv:2102.04353  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Unlocking Pixels for Reinforcement Learning via Implicit Attention

    Authors: Krzysztof Marcin Choromanski, Deepali Jain, Wenhao Yu, Xingyou Song, Jack Parker-Holder, Tingnan Zhang, Valerii Likhosherstov, Aldo Pacchiano, Anirban Santara, Yunhao Tang, Jie Tan, Adrian Weller

    Abstract: There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and the potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is an attention bottleneck, which provides a simple and effective framework for learning h… ▽ More

    Submitted 1 October, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

  36. arXiv:2101.07415  [pdf, other

    cs.LG cs.NE cs.RO

    ES-ENAS: Efficient Evolutionary Optimization for Large Hybrid Search Spaces

    Authors: Xingyou Song, Krzysztof Choromanski, Jack Parker-Holder, Yunhao Tang, Qiuyi Zhang, Daiyi Peng, Deepali Jain, Wenbo Gao, Aldo Pacchiano, Tamas Sarlos, Yuxiang Yang

    Abstract: In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters. We demonstrate that previous evolutionary algorithms which rely on mutation-based approaches, while flexible over combinatorial spaces, suffer from a curse of dimensionality in high dimensional continuous spaces both theoretically and e… ▽ More

    Submitted 15 March, 2023; v1 submitted 18 January, 2021; originally announced January 2021.

    Comments: Previously published at ICLR 2020 NAS Workshop. See https://github.com/google-research/google-research/tree/master/es_enas for associated code

  37. arXiv:2101.04808  [pdf, other

    cs.PL cs.LG

    MLGO: a Machine Learning Guided Compiler Optimizations Framework

    Authors: Mircea Trofin, Yundi Qian, Eugene Brevdo, Zinan Lin, Krzysztof Choromanski, David Li

    Abstract: Leveraging machine-learning (ML) techniques for compiler optimizations has been widely studied and explored in academia. However, the adoption of ML in general-purpose, industry strength compilers has yet to happen. We propose MLGO, a framework for integrating ML techniques systematically in an industrial compiler -- LLVM. As a case study, we present the details and results of replacing the heuris… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: First two authors are equal contributors

  38. arXiv:2012.11346  [pdf, other

    cs.LG

    Sub-Linear Memory: How to Make Performers SLiM

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Jared Davis, Xingyou Song, Adrian Weller

    Abstract: The Transformer architecture has revolutionized deep learning on sequential data, becoming ubiquitous in state-of-the-art solutions for a wide variety of applications. Yet vanilla Transformers are notoriously resource-expensive, requiring $O(L^2)$ in serial time and memory as functions of input length $L$. Recent works proposed various linear self-attention mechanisms, scaling only as $O(L)$ for s… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

  39. arXiv:2009.14794  [pdf, other

    cs.LG cs.CL stat.ML

    Rethinking Attention with Performers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random featu… ▽ More

    Submitted 19 November, 2022; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: Published as a conference paper + oral presentation at ICLR 2021. 38 pages. See https://github.com/google-research/google-research/tree/master/protein_lm for protein language model code, and https://github.com/google-research/google-research/tree/master/performer for Performer code. See https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html for Google AI Blog

  40. arXiv:2006.11911  [pdf, other

    cs.LG stat.ML

    Towards Tractable Optimism in Model-Based Reinforcement Learning

    Authors: Aldo Pacchiano, Philip J. Ball, Jack Parker-Holder, Krzysztof Choromanski, Stephen Roberts

    Abstract: The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate the true value function (optimism) but not by so much that it is inaccurate (estimation error). In the tabular setting, many state-of-the-art methods produce the… ▽ More

    Submitted 3 December, 2021; v1 submitted 21 June, 2020; originally announced June 2020.

    Comments: Presented as a conference paper at UAI 2021

  41. arXiv:2006.11421  [pdf, other

    cs.LG math.CA math.DS math.OC stat.ML

    An Ode to an ODE

    Authors: Krzysztof Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

    Abstract: We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d). This nested system of two flows, where the parameter-flow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem wh… ▽ More

    Submitted 22 June, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: 20 pages, 9 figures

  42. arXiv:2006.07554  [pdf, other

    cs.LG cs.NE stat.ML

    Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies

    Authors: Yunhao Tang, Krzysztof Choromanski

    Abstract: Off-policy learning algorithms have been known to be sensitive to the choice of hyper-parameters. However, unlike near on-policy algorithms for which hyper-parameters could be optimized via e.g. meta-gradients, similar techniques could not be straightforwardly applied to off-policy learning. In this work, we propose a framework which entails the application of Evolutionary Strategies to online hyp… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  43. arXiv:2006.03631  [pdf, other

    cs.LG math.OC stat.ML

    UFO-BLO: Unbiased First-Order Bilevel Optimization

    Authors: Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

    Abstract: Bilevel optimization (BLO) is a popular approach with many applications including hyperparameter optimization, neural architecture search, adversarial robustness and model-agnostic meta-learning. However, the approach suffers from time and memory complexity proportional to the length $r$ of its inner optimization loop, which has led to several modifications being proposed. One such modification is… ▽ More

    Submitted 7 June, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

  44. arXiv:2006.03555  [pdf, other

    cs.LG cs.CL stat.ML

    Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost of training the attention mechanism to learn complex dependencies between distant inputs continues to grow. In response, solutions that exploit the structure and sparsity of the learned attention matrix have blossomed. However, real-world applications that involve long sequen… ▽ More

    Submitted 30 September, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: This arXiv submission has been deprecated. Please see "Rethinking Attention with Performers" at arXiv:2009.14794 for the most updated version of the paper

  45. arXiv:2005.13590  [pdf, other

    cs.LG stat.ML

    Demystifying Orthogonal Monte Carlo and Beyond

    Authors: Han Lin, Haoxian Chen, Tianyi Zhang, Clement Laroche, Krzysztof Choromanski

    Abstract: Orthogonal Monte Carlo (OMC) is a very effective sampling algorithm imposing structural geometric conditions (orthogonality) on samples for variance reduction. Due to its simplicity and superior performance as compared to its Quasi Monte Carlo counterparts, OMC is used in a wide spectrum of challenging machine learning applications ranging from scalable kernel methods to predictive recurrent neura… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 22 pages, 4 figures

  46. arXiv:2005.01906  [pdf, other

    cs.LG stat.ML

    Time Dependence in Non-Autonomous Neural ODEs

    Authors: Jared Quincy Davis, Krzysztof Choromanski, Jake Varley, Honglak Lee, Jean-Jacques Slotine, Valerii Likhosterov, Adrian Weller, Ameesh Makadia, Vikas Sindhwani

    Abstract: Neural Ordinary Differential Equations (ODEs) are elegant reinterpretations of deep networks where continuous time can replace the discrete notion of depth, ODE solvers perform forward propagation, and the adjoint method enables efficient, constant memory backpropagation. Neural ODEs are universal approximators only when they are non-autonomous, that is, the dynamics depends explicitly on time. We… ▽ More

    Submitted 6 May, 2020; v1 submitted 4 May, 2020; originally announced May 2020.

  47. arXiv:2004.08675  [pdf, other

    cs.LG stat.ML

    CWY Parametrization: a Solution for Parallelized Optimization of Orthogonal and Stiefel Matrices

    Authors: Valerii Likhosherstov, Jared Davis, Krzysztof Choromanski, Adrian Weller

    Abstract: We introduce an efficient approach for optimization over orthogonal groups on highly parallel computation units such as GPUs or TPUs. As in earlier work, we parametrize an orthogonal matrix as a product of Householder reflections. However, to overcome low parallelization capabilities of computing Householder reflections sequentially, we propose employing an accumulation scheme called the compact W… ▽ More

    Submitted 16 February, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, San Diego, California, USA. PMLR: Volume 130. Copyright 2021 by the author(s)

  48. arXiv:2003.14398  [pdf, other

    cs.LG cs.RO stat.ML

    Robotic Table Tennis with Model-Free Reinforcement Learning

    Authors: Wenbo Gao, Laura Graesser, Krzysztof Choromanski, Xingyou Song, Nevena Lazic, Pannag Sanketi, Vikas Sindhwani, Navdeep Jaitly

    Abstract: We propose a model-free algorithm for learning efficient policies capable of returning table tennis balls by controlling robot joints at a rate of 100Hz. We demonstrate that evolutionary search (ES) methods acting on CNN-based policy architectures for non-visual inputs and convolving across time learn compact controllers leading to smooth motions. Furthermore, we show that with appropriately tuned… ▽ More

    Submitted 27 May, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

    Comments: V2: new URL of supplementary video. 8 pages, 4 figures

    ACM Class: I.2.6; I.2.9

  49. arXiv:2003.13563  [pdf, other

    cs.LG stat.ML

    Stochastic Flows and Geometric Optimization on the Orthogonal Group

    Authors: Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

    Abstract: We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinf… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  50. arXiv:2003.01239  [pdf, other

    cs.RO cs.LG cs.NE

    Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

    Authors: Xingyou Song, Yuxiang Yang, Krzysztof Choromanski, Ken Caluwaerts, Wenbo Gao, Chelsea Finn, Jie Tan

    Abstract: Learning adaptable policies is crucial for robots to operate autonomously in our complex and quickly changing world. In this work, we present a new meta-learning method that allows robots to quickly adapt to changes in dynamics. In contrast to gradient-based meta-learning algorithms that rely on second-order gradient estimation, we introduce a more noise-tolerant Batch Hill-Climbing adaptation ope… ▽ More

    Submitted 29 July, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: Published as a conference paper at International Conference on Intelligent Robots and Systems (IROS) 2020. See http://youtu.be/_QPMCDdFC3E for associated video file, http://github.com/google-research/google-research/tree/master/es_maml for associated code, and https://ai.googleblog.com/2020/04/exploring-evolutionary-meta-learning-in.html for the corresponding Google AI Blog post