Search | arXiv e-print repository

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Authors: William Brandon, Mayank Mishra, Aniruddha Nrusimha, Rameswar Panda, Jonathan Ragan Kelly

Abstract: Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory required to store the KV cache can become prohibitive at long sequence lengths and large batch sizes. Since the invention of the transformer, two of the most effective interventions discovered for reducing the size of the KV cache… ▽ More Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory required to store the KV cache can become prohibitive at long sequence lengths and large batch sizes. Since the invention of the transformer, two of the most effective interventions discovered for reducing the size of the KV cache have been Multi-Query Attention (MQA) and its generalization, Grouped-Query Attention (GQA). MQA and GQA both modify the design of the attention block so that multiple query heads can share a single key/value head, reducing the number of distinct key/value heads by a large factor while only minimally degrading accuracy. In this paper, we show that it is possible to take Multi-Query Attention a step further by also sharing key and value heads between adjacent layers, yielding a new attention design we call Cross-Layer Attention (CLA). With CLA, we find that it is possible to reduce the size of the KV cache by another 2x while maintaining nearly the same accuracy as unmodified MQA. In experiments training 1B- and 3B-parameter models from scratch, we demonstrate that CLA provides a Pareto improvement over the memory/accuracy tradeoffs which are possible with traditional MQA, enabling inference with longer sequence lengths and larger batch sizes than would otherwise be possible △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.13557 [pdf, other]

Preconditioned Neural Posterior Estimation for Likelihood-free Inference

Authors: Xiaoyu Wang, Ryan P. Kelly, David J. Warne, Christopher Drovandi

Abstract: Simulation based inference (SBI) methods enable the estimation of posterior distributions when the likelihood function is intractable, but where model simulation is feasible. Popular neural approaches to SBI are the neural posterior estimator (NPE) and its sequential version (SNPE). These methods can outperform statistical SBI approaches such as approximate Bayesian computation (ABC), particularly… ▽ More Simulation based inference (SBI) methods enable the estimation of posterior distributions when the likelihood function is intractable, but where model simulation is feasible. Popular neural approaches to SBI are the neural posterior estimator (NPE) and its sequential version (SNPE). These methods can outperform statistical SBI approaches such as approximate Bayesian computation (ABC), particularly for relatively small numbers of model simulations. However, we show in this paper that the NPE methods are not guaranteed to be highly accurate, even on problems with low dimension. In such settings the posterior cannot be accurately trained over the prior predictive space, and even the sequential extension remains sub-optimal. To overcome this, we propose preconditioned NPE (PNPE) and its sequential version (PSNPE), which uses a short run of ABC to effectively eliminate regions of parameter space that produce large discrepancy between simulations and data and allow the posterior emulator to be more accurately trained. We present comprehensive empirical evidence that this melding of neural and statistical SBI methods improves performance over a range of examples, including a motivating example involving a complex agent-based model applied to real tumour growth data. △ Less

Submitted 21 April, 2024; originally announced April 2024.

Comments: 31 pages, 11 figures

arXiv:2403.11164 [pdf, other]

doi 10.1145/3613904.3642919

The Effects of Generative AI on Design Fixation and Divergent Thinking

Authors: Samangi Wadinambiarachchi, Ryan M. Kelly, Saumya Pareek, Qiushi Zhou, Eduardo Velloso

Abstract: Generative AI systems have been heralded as tools for augmenting human creativity and inspiring divergent thinking, though with little empirical evidence for these claims. This paper explores the effects of exposure to AI-generated images on measures of design fixation and divergent thinking in a visual ideation task. Through a between-participants experiment (N=60), we found that support from an… ▽ More Generative AI systems have been heralded as tools for augmenting human creativity and inspiring divergent thinking, though with little empirical evidence for these claims. This paper explores the effects of exposure to AI-generated images on measures of design fixation and divergent thinking in a visual ideation task. Through a between-participants experiment (N=60), we found that support from an AI image generator during ideation leads to higher fixation on an initial example. Participants who used AI produced fewer ideas, with less variety and lower originality compared to a baseline. Our qualitative analysis suggests that the effectiveness of co-ideation with AI rests on participants' chosen approach to prompt creation and on the strategies used by participants to generate ideas in response to the AI's suggestions. We discuss opportunities for designing generative AI systems for ideation support and incorporating these AI tools into ideation workflows. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted at the CHI Conference on Human Factors in Computing Systems (CHI 24),18 pages, 15 figures,

arXiv:2402.01040 [pdf]

Everyday Uses of Music Listening and Music Technologies by Caregivers and People with Dementia: Survey and Focus Group Study

Authors: Dianna Vidas, Romina Carrasco, Ryan M. Kelly, Jenny Waycott, Jeanette Tamplin, Kate McMahon, Libby M. Flynn, Phoebe A. Stretton-Smith, Tanara Vieira Sousa, Felicity A. Baker

Abstract: Music is a valuable non-pharmacological tool that provides benefits for people with dementia, and there is interest in designing technologies to support music use in dementia care. To ensure music technologies are appropriately designed for supporting caregivers and people living with dementia, there remains a need to better understand how music is currently used in everyday care at home. We aimed… ▽ More Music is a valuable non-pharmacological tool that provides benefits for people with dementia, and there is interest in designing technologies to support music use in dementia care. To ensure music technologies are appropriately designed for supporting caregivers and people living with dementia, there remains a need to better understand how music is currently used in everyday care at home. We aimed to understand how people with dementia and their caregivers use music technologies in everyday caring, as well as challenges they experience using music and technology. This study used a mixed methods design. A survey was completed by 77 caregivers and people with dementia to understand their use of music and technology. Of these, 18 survey respondents (12 family caregivers, 6 people living with dementia) participated in focus groups about their experiences of using music and technology in care. Transcripts were analysed with reflexive thematic analysis. Most survey respondents used music often in their daily lives, reporting a range of music technologies such as CDs, radio, and streaming. Focus groups highlighted benefits and challenges of music technologies in everyday care. Participants used music and music technologies to regulate mood, provide joy, facilitate social connection, encourage reminiscence, provide continuity before and after diagnosis, and to make caregiving easier. Challenges of using music technology in care included difficulties staying up to date with evolving technology, and low self-efficacy for technology use expressed by people living with dementia. Evidently, people living with dementia and their caregivers use music technologies to support their everyday care needs. Results suggest opportunities to design technologies enabling easier access to music and supporting people living with dementia with recreational and therapeutic music listening and music-based activities. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.11628 [pdf]

Older Adults Imagining Future Technologies in Participatory Design Workshops: Supporting Continuity in the Pursuit of Meaningful Activities

Authors: Wei Zhao, Ryan M. Kelly, Melissa J. Rogerson, Jenny Waycott

Abstract: Recent innovations in digital technology offer significant opportunities for older adults to engage in meaningful activities. To investigate older adults' perceptions of using existing and emerging technologies for meaningful activities, we conducted three participatory design workshops and follow-up interviews with adults aged over 65. The workshops encompassed discussions on existing technologie… ▽ More Recent innovations in digital technology offer significant opportunities for older adults to engage in meaningful activities. To investigate older adults' perceptions of using existing and emerging technologies for meaningful activities, we conducted three participatory design workshops and follow-up interviews with adults aged over 65. The workshops encompassed discussions on existing technologies for meaningful activities, demonstrations of emerging technologies such as VR, AR, and AI, and design activities including prototy** and storyboarding. Our findings show that while participants had diverse interpretations of meaningful activities, they sought to use technologies to support continuity in the pursuit of these activities. Specifically, participants highlighted the importance of safe aging at home, which provides a pathway for meaningful activities in later life. We further discuss participants' discerning attitudes when assessing the use of different technologies for meaningful activities and several values and attributes they desire when envisioning future technologies, including simplicity, positivity, proactivity, and integration. △ Less

Submitted 23 May, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

arXiv:2309.01670 [pdf, other]

Blind Biological Sequence Denoising with Self-Supervised Set Learning

Authors: Nathan Ng, Ji Won Park, Jae Hyeon Lee, Ryan Lewis Kelly, Stephen Ra, Kyunghyun Cho

Abstract: Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are avai… ▽ More Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are available or error rates are too high. In this paper, we propose a novel method for blindly denoising sets of sequences without directly observing clean source sequence labels. Our method, Self-Supervised Set Learning (SSSL), gathers subreads together in an embedding space and estimates a single set embedding as the midpoint of the subreads in both the latent and sequence spaces. This set embedding represents the "average" of the subreads and can be decoded into a prediction of the clean sequence. In experiments on simulated long-read DNA data, SSSL methods denoise small reads of $\leq 6$ subreads with 17% fewer errors and large reads of $>6$ subreads with 8% fewer errors compared to the best baseline. On a real dataset of antibody sequences, SSSL improves over baselines on two self-supervised metrics, with a significant improvement on difficult small reads that comprise over 60% of the test set. By accurately denoising these reads, SSSL promises to better realize the potential of high-throughput DNA sequencing data for downstream scientific applications. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2301.13368 [pdf, other]

Misspecification-robust Sequential Neural Likelihood for Simulation-based Inference

Authors: Ryan P. Kelly, David J. Nott, David T. Frazier, David J. Warne, Chris Drovandi

Abstract: Simulation-based inference techniques are indispensable for parameter estimation of mechanistic and simulable models with intractable likelihoods. While traditional statistical approaches like approximate Bayesian computation and Bayesian synthetic likelihood have been studied under well-specified and misspecified settings, they often suffer from inefficiencies due to wasted model simulations. Neu… ▽ More Simulation-based inference techniques are indispensable for parameter estimation of mechanistic and simulable models with intractable likelihoods. While traditional statistical approaches like approximate Bayesian computation and Bayesian synthetic likelihood have been studied under well-specified and misspecified settings, they often suffer from inefficiencies due to wasted model simulations. Neural approaches, such as sequential neural likelihood (SNL) avoid this wastage by utilising all model simulations to train a neural surrogate for the likelihood function. However, the performance of SNL under model misspecification is unreliable and can result in overconfident posteriors centred around an inaccurate parameter estimate. In this paper, we propose a novel SNL method, which through the incorporation of additional adjustment parameters, is robust to model misspecification and capable of identifying features of the data that the model is not able to recover. We demonstrate the efficacy of our approach through several illustrative examples, where our method gives more accurate point estimates and uncertainty quantification than SNL. △ Less

Submitted 7 March, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2210.10838 [pdf, other]

A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences

Authors: Nataša Tagasovska, Nathan C. Frey, Andreas Loukas, Isidro Hötzel, Julien Lafrance-Vanasse, Ryan Lewis Kelly, Yan Wu, Arvind Rajpal, Richard Bonneau, Kyunghyun Cho, Stephen Ra, Vladimir Gligorijević

Abstract: Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution. This multi-objective optimization becomes more challenging when properties are independent or orthogonal to each other… ▽ More Deep generative models have emerged as a popular machine learning-based approach for inverse design problems in the life sciences. However, these problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution. This multi-objective optimization becomes more challenging when properties are independent or orthogonal to each other. In this work, we propose a Pareto-compositional energy-based model (pcEBM), a framework that uses multiple gradient descent for sampling new designs that adhere to various constraints in optimizing distinct properties. We demonstrate its ability to learn non-convex Pareto fronts and generate sequences that simultaneously satisfy multiple desired properties across a series of real-world antibody design tasks. △ Less

Submitted 19 October, 2022; originally announced October 2022.

arXiv:2104.00870 [pdf, other]

doi 10.1145/3453988

GAVIN: Gaze-Assisted Voice-Based Implicit Note-taking

Authors: Anam Ahmad Khan, Joshua Newn, Ryan Kelly, Namrata Srivastava, James Bailey, Eduardo Velloso

Abstract: Annotation is an effective reading strategy people often undertake while interacting with digital text. It involves highlighting pieces of text and making notes about them. Annotating while reading in a desktop environment is considered trivial but, in a mobile setting where people read while hand-holding devices, the task of highlighting and ty** notes on a mobile display is challenging. In thi… ▽ More Annotation is an effective reading strategy people often undertake while interacting with digital text. It involves highlighting pieces of text and making notes about them. Annotating while reading in a desktop environment is considered trivial but, in a mobile setting where people read while hand-holding devices, the task of highlighting and ty** notes on a mobile display is challenging. In this paper, we introduce GAVIN, a gaze-assisted voice note-taking application, which enables readers to seamlessly take voice notes on digital documents by implicitly anchoring them to text passages. We first conducted a contextual enquiry focusing on participants' note-taking practices on digital documents. Using these findings, we propose a method which leverages eye-tracking and machine learning techniques to annotate voice notes with reference text passages. To evaluate our approach, we recruited 32 participants performing voice note-taking. Following, we trained a classifier on the data collected to predict text passage where participants made voice notes. Lastly, we employed the classifier to built GAVIN and conducted a user study to demonstrate the feasibility of the system. This research demonstrates the feasibility of using gaze as a resource for implicit anchoring of voice notes, enabling the design of systems that allow users to record voice notes with minimal effort and high accuracy. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: In press, ACM Transactions on Computer-Human Interaction

Journal ref: ACM Trans. Comput.-Hum. Interact. 28 (2021) 1-32

arXiv:1911.03028 [pdf, other]

Lock-Free Hopscotch Hashing

Authors: Robert Kelly, Barak A. Pearlmutter, Phil Maguire

Abstract: In this paper we present a lock-free version of Hopscotch Hashing. Hopscotch Hashing is an open addressing algorithm originally proposed by Herlihy, Shavit, and Tzafrir, which is known for fast performance and excellent cache locality. The algorithm allows users of the table to skip or jump over irrelevant entries, allowing quick search, insertion, and removal of entries. Unlike traditional linear… ▽ More In this paper we present a lock-free version of Hopscotch Hashing. Hopscotch Hashing is an open addressing algorithm originally proposed by Herlihy, Shavit, and Tzafrir, which is known for fast performance and excellent cache locality. The algorithm allows users of the table to skip or jump over irrelevant entries, allowing quick search, insertion, and removal of entries. Unlike traditional linear probing, Hopscotch Hashing is capable of operating under a high load factor, as probe counts remain small. Our lock-free version improves on both speed, cache locality, and progress guarantees of the original, being a chimera of two concurrent hash tables. We compare our data structure to various other lock-free and blocking hashing algorithms and show that its performance is in many cases superior to existing strategies. The proposed lock-free version overcomes some of the drawbacks associated with the original blocking version, leading to a substantial boost in scalability while maintaining attractive features like physical deletion or probe-chain compression. △ Less

Submitted 7 November, 2019; originally announced November 2019.

Comments: 15 pages, to appear in APOCS20

arXiv:1809.04339 [pdf, other]

Concurrent Robin Hood Hashing

Authors: Robert Kelly, Barak A. Pearlmutter, Phil Maguire

Abstract: In this paper we examine the issues involved in adding concurrency to the Robin Hood hash table algorithm. We present a non-blocking obstruction-free K-CAS Robin Hood algorithm which requires only a single word compare-and-swap primitive, thus making it highly portable. The implementation maintains the attractive properties of the original Robin Hood structure, such as a low expected probe length,… ▽ More In this paper we examine the issues involved in adding concurrency to the Robin Hood hash table algorithm. We present a non-blocking obstruction-free K-CAS Robin Hood algorithm which requires only a single word compare-and-swap primitive, thus making it highly portable. The implementation maintains the attractive properties of the original Robin Hood structure, such as a low expected probe length, capability to operate effectively under a high load factor and good cache locality, all of which are essential for high performance on modern computer architectures. We compare our data-structures to various other lock-free and concurrent algorithms, as well as a simple hardware transactional variant, and show that our implementation performs better across a number of contexts. △ Less

Submitted 14 November, 2018; v1 submitted 12 September, 2018; originally announced September 2018.

Comments: 16 pages, 12 figures

arXiv:1611.03429 [pdf, ps, other]

Evolving the Incremental λ Calculus into a Model of Forward Automatic Differentiation (AD)

Authors: Robert Kelly, Barak A. Pearlmutter, Jeffrey Mark Siskind

Abstract: Formal transformations somehow resembling the usual derivative are surprisingly common in computer science, with two notable examples being derivatives of regular expressions and derivatives of types. A newcomer to this list is the incremental $λ$-calculus, or ILC, a "theory of changes" that deploys a formal apparatus allowing the automatic generation of efficient update functions which perform in… ▽ More Formal transformations somehow resembling the usual derivative are surprisingly common in computer science, with two notable examples being derivatives of regular expressions and derivatives of types. A newcomer to this list is the incremental $λ$-calculus, or ILC, a "theory of changes" that deploys a formal apparatus allowing the automatic generation of efficient update functions which perform incremental computation. The ILC is not only defined, but given a formal machine-understandable definition---accompanied by mechanically verifiable proofs of various properties, including in particular correctness of various sorts. Here, we show how the ILC can be mutated into propagating tangents, thus serving as a model of Forward Accumulation Mode Automatic Differentiation. This mutation is done in several steps. These steps can also be applied to the proofs, resulting in machine-checked proofs of the correctness of this model of forward AD. △ Less

Submitted 10 November, 2016; originally announced November 2016.

Comments: Extended abstract presented at the AD 2016 Conference, Sep 2016, Oxford UK

Showing 1–12 of 12 results for author: Kelly, R