Skip to main content

Showing 1–50 of 94 results for author: Dao, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07887  [pdf, other

    cs.LG cs.CL

    An Empirical Study of Mamba-based Language Models

    Authors: Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, Garvit Kulshreshtha, Vartika Singh, Jared Casper, Jan Kautz, Mohammad Shoeybi, Bryan Catanzaro

    Abstract: Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory requirements from the key-value cache. Moreover, recent studies have shown that SSMs can match or exceed the language modeling capabilities of Transformers, making them an attractive alternative. In a contr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2406.03288  [pdf, other

    cs.LG stat.ML

    Embarrassingly Parallel GFlowNets

    Authors: Tiago da Silva, Luiz Max Carvalho, Amauri Souza, Samuel Kaski, Diego Mesquita

    Abstract: GFlowNets are a promising alternative to MCMC sampling for discrete compositional random variables. Training GFlowNets requires repeated evaluations of the unnormalized target distribution or reward function. However, for large-scale posterior sampling, this may be prohibitive since it incurs traversing the data several times. Moreover, if the data are distributed across clients, employing standar… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  3. arXiv:2405.21060  [pdf, other

    cs.LG

    Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

    Authors: Tri Dao, Albert Gu

    Abstract: While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these families of models are actually quite closely related, and develop a rich framework of theoretical connections between SSMs and variants of attention,… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  4. arXiv:2405.20670  [pdf

    cs.DL

    Twitter should now be referred to as X: How academics, journals and publishers need to make the nomenclatural transition

    Authors: Jaime A. Teixeira da Silva, Serhii Nazarovets

    Abstract: Here, we note how academics, journals and publishers should no longer refer to the social media platform Twitter as such, rather as X. Relying on Google Scholar, we found 16 examples of papers published in the last months of 2023 - essentially during the transition period between Twitter and X - that used Twitter and X, but in different ways. Unlike that transition period in which the binary Twitt… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  5. arXiv:2405.06870  [pdf, other

    cs.IT

    Noise-Tolerant Codebooks for Semi-Quantitative Group Testing: Application to Spatial Genomics

    Authors: Kok Hao Chen, Duc Tu Dao, Han Mao Kiah, Van Long Phuoc Pham, Eitan Yaakobi

    Abstract: Motivated by applications in spatial genomics, we revisit group testing (Dorfman~1943) and propose the class of $λ$-{\sf ADD}-codes, studying such codes with certain distance $d$ and codelength $n$. When $d$ is constant, we provide explicit code constructions with rates close to $1/2$. When $d$ is proportional to $n$, we provide a GV-type lower bound whose rates are efficiently computable. Upper b… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: To appear in ISIT 2024 Proceedings

  6. arXiv:2403.18101  [pdf, other

    cs.AI cs.LG

    Towards Explainable Clustering: A Constrained Declarative based Approach

    Authors: Mathieu Guilbert, Christel Vrain, Thi-Bich-Hanh Dao

    Abstract: The domain of explainable AI is of interest in all Machine Learning fields, and it is all the more important in clustering, an unsupervised task whose result must be validated by a domain expert. We aim at finding a clustering that has high quality in terms of classic clustering criteria and that is explainable, and we argue that these two dimensions must be considered when building the clustering… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  7. arXiv:2403.14709  [pdf, other

    cs.CY cs.LG

    ClimateQ&A: Bridging the gap between climate scientists and the general public

    Authors: Natalia De La Calzada, Théo Alves Da Costa, Annabelle Blangero, Nicolas Chesneau

    Abstract: This research paper investigates public views on climate change and biodiversity loss by analyzing questions asked to the ClimateQ&A platform. ClimateQ&A is a conversational agent that uses LLMs to respond to queries based on over 14,000 pages of scientific literature from the IPCC and IPBES reports. Launched online in March 2023, the tool has gathered over 30,000 questions, mainly from a French a… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted as a workshop paper at "Tackling Climate Change with Machine Learning", ICLR 2024

  8. arXiv:2403.10304  [pdf, other

    cs.AI cs.DB

    KIF: A Framework for Virtual Integration of Heterogeneous Knowledge Bases using Wikidata

    Authors: Guilherme Lima, Marcelo Machado, Elton Soares, Sandro R. Fiorini, Raphael Thiago, Leonardo G. Azevedo, Viviane T. da Silva, Renato Cerqueira

    Abstract: We present a knowledge integration framework (called KIF) that uses Wikidata as a lingua franca to integrate heterogeneous knowledge bases. These can be triplestores, relational databases, CSV files, etc., which may or may not use the Wikidata dialect of RDF. KIF leverages Wikidata's data model and vocabulary plus user-defined map**s to expose a unified view of the integrated bases while kee**… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  9. arXiv:2403.03234  [pdf, other

    q-bio.GN cs.LG

    Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

    Authors: Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov

    Abstract: Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstream regions of the genome, and the reverse complementarity (RC) of DNA. Here, we propose an architecture motivated by these challenges that builds off… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: ICML 2024; Code to reproduce our experiments is available at https://github.com/kuleshov-group/caduceus

  10. arXiv:2402.19173  [pdf, other

    cs.SE cs.AI

    StarCoder 2 and The Stack v2: The Next Generation

    Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

    Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  11. arXiv:2402.14712  [pdf, other

    cs.IT cs.DM math.CO

    Gilbert-Varshamov Bound for Codes in $L_1$ Metric using Multivariate Analytic Combinatorics

    Authors: Keshav Goyal, Duc Tu Dao, Mladen Kovačević, Han Mao Kiah

    Abstract: Analytic combinatorics in several variables refers to a suite of tools that provide sharp asymptotic estimates for certain combinatorial quantities. In this paper, we apply these tools to determine the Gilbert--Varshamov lower bound on the rate of optimal codes in $L_1$ metric. Several different code spaces are analyzed, including the simplex and the hypercube in $\mathbb{Z^n}$, all of which are i… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 33 pages, 3 figures, submitted to IEEE Transactions on Information Theory

  12. arXiv:2402.10193  [pdf, other

    cs.LG cs.CL

    BitDelta: Your Fine-Tune May Only Be Worth One Bit

    Authors: James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

    Abstract: Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that fine-tuning adds less new information to the model, and is thus more compressible. We explore this assumption by decomposing the weights of fine-tuned models into t… ▽ More

    Submitted 27 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  13. arXiv:2401.17824  [pdf, other

    cs.CL

    A Survey of Pre-trained Language Models for Processing Scientific Text

    Authors: Xanh Ho, Anh Khoa Duong Nguyen, An Tuan Dao, Junfeng Jiang, Yuki Chida, Kaito Sugimoto, Huy Quoc To, Florian Boudin, Akiko Aizawa

    Abstract: The number of Language Models (LMs) dedicated to processing scientific text is on the rise. Kee** pace with the rapid growth of scientific LMs (SciLMs) has become a daunting task for researchers. To date, no comprehensive surveys on SciLMs have been undertaken, leaving this issue unaddressed. Given the constant stream of new SciLMs, appraising the state-of-the-art and how they compare to each ot… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Resources are available at https://github.com/Alab-NII/Awesome-SciLM

  14. arXiv:2401.10774  [pdf, other

    cs.LG cs.CL

    Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

    Authors: Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

    Abstract: Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Bandwidth Memory (HBM) to the accelerator's cache. While methods such as speculative decoding have been suggested to address this issue, their implementa… ▽ More

    Submitted 14 June, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: The code for this implementation is available at https://github.com/FasterDecoding/Medusa

  15. arXiv:2401.09252  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    3D Scene Geometry Estimation from 360$^\circ$ Imagery: A Survey

    Authors: Thiago Lopes Trugillo da Silveira, Paulo Gamarra Lessa Pinto, Jeffri Erwin Murrugarra Llerena, Claudio Rosito Jung

    Abstract: This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Published in ACM Computing Surveys

    Journal ref: ACM Comput. Surv. 55, 4, Article 68, 2023

  16. arXiv:2312.17205  [pdf, other

    cs.CV

    EFHQ: Multi-purpose ExtremePose-Face-HQ dataset

    Authors: Trung Tuan Dao, Duc Hong Vu, Cuong Pham, Anh Tran

    Abstract: The existing facial datasets, while having plentiful images at near frontal views, lack images with extreme head poses, leading to the downgraded performance of deep learning models when dealing with profile or pitched faces. This work aims to address this gap by introducing a novel dataset named Extreme Pose Face High-Quality Dataset (EFHQ), which includes a maximum of 450k high-quality images of… ▽ More

    Submitted 11 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Project Page: https://bomcon123456.github.io/efhq/

  17. arXiv:2312.16626  [pdf, other

    cs.CV cs.AI cs.LG

    Sorting of Smartphone Components for Recycling Through Convolutional Neural Networks

    Authors: Álvaro G. Becker, Marcelo P. Cenci, Thiago L. T. da Silveira, Hugo M. Veit

    Abstract: The recycling of waste electrical and electronic equipment is an essential tool in allowing for a circular economy, presenting the potential for significant environmental and economic gain. However, traditional material separation techniques, based on physical and chemical processes, require substantial investment and do not apply to all cases. In this work, we investigate using an image classific… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  18. arXiv:2312.03046  [pdf, other

    cs.CV

    Diversified in-domain synthesis with efficient fine-tuning for few-shot classification

    Authors: Victor G. Turrisi da Costa, Nicola Dall'Asen, Yiming Wang, Nicu Sebe, Elisa Ricci

    Abstract: Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class. A recent research direction for improving few-shot classifiers involves augmenting the labelled samples with synthetic images created by state-of-the-art text-to-image generation models. Following this trend, we propose Diversified In-domain Synthesis with Efficient Fine-tuning (DI… ▽ More

    Submitted 6 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 14 pages, 6 figures, 8 tables

  19. arXiv:2312.00752  [pdf, other

    cs.LG cs.AI

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Authors: Albert Gu, Tri Dao

    Abstract: Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long… ▽ More

    Submitted 31 May, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  20. arXiv:2311.05281  [pdf, other

    cs.CR cs.SE

    Finding Software Vulnerabilities in Open-Source C Projects via Bounded Model Checking

    Authors: Janislley Oliveira de Sousa, Bruno Carvalho de Farias, Thales Araujo da Silva, Eddie Batista de Lima Filho, Lucas C. Cordeiro

    Abstract: Computer-based systems have solved several domain problems, including industrial, military, education, and wearable. Nevertheless, such arrangements need high-quality software to guarantee security and safety as both are mandatory for modern software products. We advocate that bounded model-checking techniques can efficiently detect vulnerabilities in general software systems. However, such an app… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 27 pages, submitted to STTT journal

  21. arXiv:2310.18324  [pdf, ps, other

    cs.AI cs.CL cs.CY cs.LG

    "A Nova Eletricidade: Aplicações, Riscos e Tendências da IA Moderna -- "The New Electricity": Applications, Risks, and Trends in Current AI

    Authors: Ana L. C. Bazzan, Anderson R. Tavares, André G. Pereira, Cláudio R. Jung, Jacob Scharcanski, Joel Luis Carbonera, Luís C. Lamb, Mariana Recamonde-Mendoza, Thiago L. T. da Silveira, Viviane Moreira

    Abstract: The thought-provoking analogy between AI and electricity, made by computer scientist and entrepreneur Andrew Ng, summarizes the deep transformation that recent advances in Artificial Intelligence (AI) have triggered in the world. This chapter presents an overview of the ever-evolving landscape of AI, written in Portuguese. With no intent to exhaust the subject, we explore the AI applications that… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: In Portuguese

    MSC Class: 68 ACM Class: I.2

  22. arXiv:2310.17157  [pdf, other

    cs.LG

    Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

    Authors: Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen

    Abstract: Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at inference time. Sparsity is a natural approach to reduce this cost, but existing methods either require costly retraining, have to forgo LLM's in-context learning ability, or do not yield wall-clock time speedup on modern hardware.… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, 2023, 919

  23. arXiv:2309.12032  [pdf, other

    cs.LG stat.ML

    Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets

    Authors: Tiago da Silva, Eliezer Silva, Adèle Ribeiro, António Góis, Dominik Heider, Samuel Kaski, Diego Mesquita

    Abstract: Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inf… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  24. arXiv:2308.11763  [pdf, other

    physics.data-an cs.DM cs.PF math.CO

    Efficient set-theoretic algorithms for computing high-order Forman-Ricci curvature on abstract simplicial complexes

    Authors: Danillo Barros de Souza, Jonatas T. S. da Cunha, Fernando A. N. Santos, Jürgen Jost, Serafim Rodrigues

    Abstract: Forman-Ricci curvature (FRC) is a potent and powerful tool for analysing empirical networks, as the distribution of the curvature values can identify structural information that is not readily detected by other geometrical methods. Crucially, FRC captures higher-order structural information of clique complexes of a graph or Vietoris-Rips complexes, which is not readily accessible to alternative me… ▽ More

    Submitted 9 May, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

  25. arXiv:2307.08691  [pdf, other

    cs.LG

    FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

    Authors: Tri Dao

    Abstract: Scaling Transformers to longer sequence lengths has been a major problem in the last several years, promising to improve performance in language modeling and high-resolution image understanding, as well as to unlock new applications in code, audio, and video generation. The attention layer is the main bottleneck in scaling to longer sequences, as its runtime and memory increase quadratically in th… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  26. arXiv:2305.06161  [pdf, other

    cs.CL cs.AI cs.PL cs.SE

    StarCoder: may the source be with you!

    Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

    Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More

    Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  27. Unsupervised out-of-distribution detection for safer robotically guided retinal microsurgery

    Authors: Alain Jungo, Lars Doorenbos, Tommaso Da Col, Maarten Beelen, Martin Zinkernagel, Pablo Márquez-Neila, Raphael Sznitman

    Abstract: Purpose: A fundamental problem in designing safe machine learning systems is identifying when samples presented to a deployed model differ from those observed at training time. Detecting so-called out-of-distribution (OoD) samples is crucial in safety-critical applications such as robotically guided retinal microsurgery, where distances between the instrument and the retina are derived from sequen… ▽ More

    Submitted 3 May, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: Accepted at IPCAI 2023

  28. arXiv:2303.11059  [pdf, other

    cs.RO eess.SP

    Six-degree-of-freedom Localization Under Multiple Permanent Magnets Actuation

    Authors: Tomas da Veiga, Giovanni Pittiglio, Michael Brockdorff, James H. Chandler, Pietro Valdastri

    Abstract: Localization of magnetically actuated medical robots is essential for accurate actuation, closed loop control and delivery of functionality. Despite extensive progress in the use of magnetic field and inertial measurements for pose estimation, these have been either under single external permanent magnet actuation or coil systems. With the advent of new magnetic actuation systems comprised of mult… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Under second round of review at Robotics and Automation Letters

  29. arXiv:2303.09489  [pdf, other

    cs.LG cs.AI

    Effectively Modeling Time Series with Simple Discrete State Spaces

    Authors: Michael Zhang, Khaled K. Saab, Michael Poli, Tri Dao, Karan Goel, Christopher Ré

    Abstract: Time series modeling is a well-established problem, which often requires that methods (1) expressively represent complicated dependencies, (2) forecast long horizons, and (3) efficiently train over long sequences. State-space models (SSMs) are classical models for time series, and prior works combine SSMs with deep learning layers for efficient sequence modeling. However, we find fundamental limit… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: 45 pages, 8 figures, 20 tables, ICLR 2023

  30. arXiv:2303.01842  [pdf, ps, other

    cs.RO

    Independent Control of Two Magnetic Robots using External Permanent Magnets: A Feasibility Study

    Authors: Joshua Davy, Tomas da Veiga, Giovanni Pittiglio, James H. Chandler, Pietro Valdastri

    Abstract: The ability to have multiple magnetic robots operate independently in the same workspace would increase the clinical potential of these systems allowing collaborative operation. In this work, we investigate the feasibility of actuating two magnetic robots operating within the same workspace using external permanent magnets. Unlike actuation systems based on pairs of electromagnetic coils, the use… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: 7 pages, 6 figures, conference

  31. arXiv:2302.13714  [pdf, other

    cs.IT math.CO

    On the Design of Codes for DNA Computing: Secondary Structure Avoidance Codes

    Authors: Tuan Thanh Nguyen, Kui Cai, Han Mao Kiah, Duc Tu Dao, Kees A. Schouhamer Immink

    Abstract: In this work, we investigate a challenging problem, which has been considered to be an important criterion in designing codewords for DNA computing purposes, namely secondary structure avoidance in single-stranded DNA molecules. In short, secondary structure refers to the tendency of a single-stranded DNA sequence to fold back upon itself, thus becoming inactive in the computation process. While s… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  32. arXiv:2302.12133  [pdf, other

    cs.MM

    Practical Analyses of How Common Social Media Platforms and Photo Storage Services Handle Uploaded Images

    Authors: Duc-Tien Dang-Nguyen, Vegard Velle Sjøen, Dinh-Hai Le, Thien-Phu Dao, Anh-Duy Tran, Minh-Triet Tran

    Abstract: The research done in this study has delved deeply into the changes made to digital images that are uploaded to three of the major social media platforms and image storage services in today's society: Facebook, Flickr, and Google Photos. In addition to providing up-to-date data on an ever-changing landscape of different social media networks' digital fingerprints, a deep analysis of the social netw… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  33. arXiv:2302.10866  [pdf, other

    cs.LG cs.CL

    Hyena Hierarchy: Towards Larger Convolutional Language Models

    Authors: Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré

    Abstract: Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting the amount of context accessible. Existing subquadratic methods based on low-rank and sparse approximations need to be combined with dense attentio… ▽ More

    Submitted 19 April, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Additional details

  34. arXiv:2302.08905  [pdf, other

    cs.DL

    GraphLED: A graph-based approach to process and visualise linked engineering documents

    Authors: Vanessa Telles da Silva, Lucas de Angelo Martins Ribeiro, Willian Borges de Lemos, Sílvia Silva da Costa Botelho, Nelson Lopes Duarte Filho, Marcelo Rita Pias

    Abstract: The architecture, engineering and construction (AEC) sector extensively uses documents supporting product and process development. As part of this, organisations should handle big data of hundreds, or even thousands, of technical documents strongly linked together, including CAD design of industrial plants, equipment purchase orders, quality certificates, and part material analysis. However, analy… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  35. arXiv:2302.06646  [pdf, other

    cs.LG

    Simple Hardware-Efficient Long Convolutions for Sequence Modeling

    Authors: Daniel Y. Fu, Elliot L. Epstein, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré

    Abstract: State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime performance. We study whether a simple alternative can match SSMs in performance and efficiency: directly learning long convolutions over the sequence. We find that a key requirement to achieving high performance… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  36. arXiv:2301.06031  [pdf

    cs.CR cs.LG

    A Review on the effectiveness of Dimensional Reduction with Computational Forensics: An Application on Malware Analysis

    Authors: Aye Thaw Da Naing, Justin Soh Beng Guan, Yarzar Shwe Win, Jonathan Pan

    Abstract: The Android operating system is pervasively adopted as the operating system platform of choice for smart devices. However, the strong adoption has also resulted in exponential growth in the number of Android based malicious software or malware. To deal with such cyber threats as part of cyber investigation and digital forensics, computational techniques in the form of machine learning algorithms a… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

    Comments: 18 pages

  37. arXiv:2301.03322  [pdf, other

    cs.CV

    Simplifying Open-Set Video Domain Adaptation with Contrastive Learning

    Authors: Giacomo Zara, Victor Guilherme Turrisi da Costa, Subhankar Roy, Paolo Rota, Elisa Ricci

    Abstract: In an effort to reduce annotation costs in action recognition, unsupervised video domain adaptation methods have been proposed that aim to adapt a predictive model from a labelled dataset (i.e., source domain) to an unlabelled dataset (i.e., target domain). In this work we address a more realistic scenario, called open-set video domain adaptation (OUVDA), where the target dataset contains "unknown… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: Currently under review at Computer Vision and Image Understanding (CVIU) journal

  38. arXiv:2212.14052  [pdf, other

    cs.LG cs.CL

    Hungry Hungry Hippos: Towards Language Modeling with State Space Models

    Authors: Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré

    Abstract: State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between S… ▽ More

    Submitted 28 April, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: ICLR 2023 Camera-Ready (Notable-top-25% / Spotlight)

  39. arXiv:2211.14453  [pdf, other

    cs.LG cs.AI eess.SY

    Transform Once: Efficient Operator Learning in Frequency Domain

    Authors: Michael Poli, Stefano Massaroli, Federico Berto, **ykoo Park, Tri Dao, Christopher Ré, Stefano Ermon

    Abstract: Spectral analysis provides one of the most effective paradigms for information-preserving dimensionality reduction, as simple descriptions of naturally occurring signals are often obtained via few terms of periodic basis functions. In this work, we study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time: fr… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Published at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  40. arXiv:2211.01438  [pdf, other

    eess.AS cs.CL cs.SD

    Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

    Authors: Pawel Swietojanski, Stefan Braun, Dogan Can, Thiago Fraga da Silva, Arnab Ghoshal, Takaaki Hori, Roger Hsiao, Henry Mason, Erik McDermott, Honza Silovsky, Ruchir Travadi, Xiaodan Zhuang

    Abstract: This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries,… ▽ More

    Submitted 18 April, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: To appear in ICASSP 2023

    Journal ref: International Conference on Acoustics, Speech, and Signal Processing, 2023 International Conference on Acoustics, Speech, and Signal Processing International Conference on Acoustics, Speech, and Signal Processing

  41. arXiv:2210.12214  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Optimizing Bilingual Neural Transducer with Synthetic Code-switching Text Generation

    Authors: Thien Nguyen, Nathalie Tran, Liuhui Deng, Thiago Fraga da Silva, Matthew Radzihovsky, Roger Hsiao, Henry Mason, Stefan Braun, Erik McDermott, Dogan Can, Pawel Swietojanski, Lyan Verwimp, Sibel Oyman, Tresi Arvizo, Honza Silovsky, Arnab Ghoshal, Mathieu Martel, Bharat Ram Ambati, Mohamed Ali

    Abstract: Code-switching describes the practice of using more than one language in the same sentence. In this study, we investigate how to optimize a neural transducer based bilingual automatic speech recognition (ASR) model for code-switching speech. Focusing on the scenario where the ASR model is trained without supervised code-switching data, we found that semi-supervised training and synthetic code-swit… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: 5 pages, 1 figure, submitted to ICASSP 2023, *: equal contributions

  42. arXiv:2210.06583  [pdf, other

    cs.CV cs.LG eess.IV

    S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces

    Authors: Eric Nguyen, Karan Goel, Albert Gu, Gordon W. Downs, Preey Shah, Tri Dao, Stephen A. Baccus, Christopher Ré

    Abstract: Visual data such as images and videos are typically modeled as discretizations of inherently continuous, multidimensional signals. Existing continuous-signal models attempt to exploit this fact by modeling the underlying signals of visual (e.g., image) data directly. However, these models have not yet been able to achieve competitive performance on practical vision tasks such as large-scale image… ▽ More

    Submitted 13 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  43. arXiv:2210.02390  [pdf, other

    cs.CV cs.AI cs.LG

    Bayesian Prompt Learning for Image-Language Model Generalization

    Authors: Mohammad Mahdi Derakhshani, Enrique Sanchez, Adrian Bulat, Victor Guilherme Turrisi da Costa, Cees G. M. Snoek, Georgios Tzimiropoulos, Brais Martinez

    Abstract: Foundational image-language models have generated considerable interest due to their efficient adaptation to downstream tasks by prompt learning. Prompt learning treats part of the language model input as trainable while freezing the rest, and optimizes an Empirical Risk Minimization objective. However, Empirical Risk Minimization is known to suffer from distributional shifts which hurt generaliza… ▽ More

    Submitted 20 August, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted at ICCV 2023

  44. arXiv:2209.13774  [pdf, other

    cs.LG cs.AI stat.ML

    ButterflyFlow: Building Invertible Layers with Butterfly Matrices

    Authors: Chenlin Meng, Linqi Zhou, Kristy Choi, Tri Dao, Stefano Ermon

    Abstract: Normalizing flows model complex probability distributions using maps obtained by composing invertible layers. Special linear layers such as masked and 1x1 convolutions play a key role in existing architectures because they increase expressive power while having tractable Jacobians and inverses. We propose a new family of invertible linear layers based on butterfly layers, which are known to theore… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: ICML 2022

  45. arXiv:2207.12842  [pdf, other

    cs.CV

    Unsupervised Domain Adaptation for Video Transformers in Action Recognition

    Authors: Victor G. Turrisi da Costa, Giacomo Zara, Paolo Rota, Thiago Oliveira-Santos, Nicu Sebe, Vittorio Murino, Elisa Ricci

    Abstract: Over the last few years, Unsupervised Domain Adaptation (UDA) techniques have acquired remarkable importance and popularity in computer vision. However, when compared to the extensive literature available for images, the field of videos is still relatively unexplored. On the other hand, the performance of a model in action recognition is heavily affected by domain shift. In this paper, we propose… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: Accepted at ICPR 2022

  46. arXiv:2207.11638  [pdf, other

    eess.SP cs.CV cs.MM eess.IV stat.ME

    DCT Approximations Based on Chen's Factorization

    Authors: C. J. Tablada, T. L. T. da Silveira, R. J. Cintra, F. M. Bayer

    Abstract: In this paper, two 8-point multiplication-free DCT approximations based on the Chen's factorization are proposed and their fast algorithms are also derived. Both transformations are assessed in terms of computational cost, error energy, and coding gain. Experiments with a JPEG-like image compression scheme are performed and results are compared with competing methods. The proposed low-complexity t… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: 19 pages, 8 figures, 5 tables

    Journal ref: Signal Processing: Image Communication, Volume 58, October 2017, Pages 14-23

  47. arXiv:2206.14097  [pdf, other

    cs.IR

    Item Matching using Text Description and Similarity Search

    Authors: Ana Paula Appel, Anderson Luis de Paula Silva, Adriana Reigota Silva, Caique Dutra Santos, Thiago Logo da Silva, Rafael Poggi de Araujo, Luiz Carlos Faray de Aquino

    Abstract: In this paper, we focus on the problem of item matching using only the description. Those specific items not only lack a unique code but also contain short text descriptions, making the item matching process difficult. Our goal is to compare products using only the description provided by the purchase process. Therefore, evaluating other characteristics and differences can uncover possible flaws d… ▽ More

    Submitted 1 July, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

  48. arXiv:2206.01299  [pdf, other

    cs.LG cs.DC

    Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees

    Authors: Jue Wang, Binhang Yuan, Luka Rimanic, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang

    Abstract: Communication compression is a crucial technique for modern distributed learning systems to alleviate their communication bottlenecks over slower networks. Despite recent intensive studies of gradient compression for data parallel-style training, compressing the activations for models trained with pipeline parallelism is still an open problem. In this paper, we propose AC-SGD, a novel activation c… ▽ More

    Submitted 6 March, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

  49. arXiv:2206.01288  [pdf, other

    cs.DC cs.LG

    Decentralized Training of Foundation Models in Heterogeneous Environments

    Authors: Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, Ce Zhang

    Abstract: Training foundation models, such as GPT-3 and PaLM, can be extremely expensive, often involving tens of thousands of GPUs running continuously for months. These models are typically trained in specialized clusters featuring fast, homogeneous interconnects and using carefully designed software systems that support both data parallelism and model/pipeline parallelism. Such dedicated clusters can be… ▽ More

    Submitted 21 June, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

  50. arXiv:2206.00122  [pdf, other

    eess.IV cs.MM eess.SP math.NA stat.ME

    A Class of Low-complexity DCT-like Transforms for Image and Video Coding

    Authors: T. L. T. da Silveira, D. R. Canterle, D. F. G. Coelho, V. A. Coutinho, F. M. Bayer, R. J. Cintra

    Abstract: The discrete cosine transform (DCT) is a relevant tool in signal processing applications, mainly known for its good decorrelation properties. Current image and video coding standards -- such as JPEG and HEVC -- adopt the DCT as a fundamental building block for compression. Recent works have introduced low-complexity approximations for the DCT, which become paramount in applications demanding real-… ▽ More

    Submitted 8 December, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

    Comments: Corrected a typo in the general expression for the diagonal matrix S(a) (Equation 11, Section 3.1). Manuscript has 20 pages, 8 figures, 9 tables

    MSC Class: 94A08; 65D15

    Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, v. 32, n. 7, July 2022