Skip to main content

Showing 1–23 of 23 results for author: Dangovski, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06576  [pdf, other

    cs.CL cs.AI cs.LG

    OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

    Authors: Owen Dugan, Donato Manuel Jimenez Beneto, Charlotte Loh, Zhuo Chen, Rumen Dangovski, Marin Soljačić

    Abstract: Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. To achieve accurate calculations, language model systems often enable LLMs to generate code for arithmetic operations. However, this approach compromises speed and security and, if finetuning is involved, risks the language mo… ▽ More

    Submitted 29 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2406.00132  [pdf, other

    cs.LG quant-ph

    QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

    Authors: Zhuo Chen, Rumen Dangovski, Charlotte Loh, Owen Dugan, Di Luo, Marin Soljačić

    Abstract: We propose Quantum-informed Tensor Adaptation (QuanTA), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)--low-rank approximation may fail for com… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  3. arXiv:2312.00111  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Multimodal Learning for Materials

    Authors: Viggo Moro, Charlotte Loh, Rumen Dangovski, Ali Ghorashi, Andrew Ma, Zhuo Chen, Samuel Kim, Peter Y. Lu, Thomas Christensen, Marin Soljačić

    Abstract: Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials, but also a greater variety and quantity of their associated properties. Existing machine learning effo… ▽ More

    Submitted 12 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: 11 pages, 4 figures

  4. arXiv:2304.00601  [pdf, other

    cs.CV cs.LG

    Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

    Authors: Ligong Han, Seungwook Han, Shivchander Sudalairaj, Charlotte Loh, Rumen Dangovski, Fei Deng, Pulkit Agrawal, Dimitris Metaxas, Leonid Karlinsky, Tsui-Wei Weng, Akash Srivastava

    Abstract: Transformations based on domain expertise (expert transformations), such as random-resized-crop and color-jitter, have proven critical to the success of contrastive learning techniques such as SimCLR. Recently, several attempts have been made to replace such domain-specific, human-designed transformations with generated views that are learned. However for imagery data, so far none of these view-ge… ▽ More

    Submitted 8 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: Accepted at Generative Models for Computer Vision Workshop 2023

  5. arXiv:2303.11277  [pdf, other

    cs.LG

    Model Stitching: Looking For Functional Similarity Between Representations

    Authors: Adriano Hernandez, Rumen Dangovski, Peter Y. Lu, Marin Soljacic

    Abstract: Model stitching (Lenc & Vedaldi 2015) is a compelling methodology to compare different neural network representations, because it allows us to measure to what degree they may be interchanged. We expand on a previous work from Bansal, Nakkiran & Barak which used model stitching to compare representations of the same shapes learned by differently seeded and/or trained neural networks of the same arc… ▽ More

    Submitted 31 August, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: 5 pages, 2 figures

  6. arXiv:2303.02484  [pdf, other

    cs.LG cs.AI cs.CV

    Multi-Symmetry Ensembles: Improving Diversity and Generalization via Opposing Symmetries

    Authors: Charlotte Loh, Seungwook Han, Shivchander Sudalairaj, Rumen Dangovski, Kai Xu, Florian Wenzel, Marin Soljacic, Akash Srivastava

    Abstract: Deep ensembles (DE) have been successful in improving model performance by learning diverse members via the stochasticity of random initialization. While recent works have attempted to promote further diversity in DE via hyperparameters or regularizing loss functions, these methods primarily still rely on a stochastic approach to explore the hypothesis space. In this work, we present Multi-Symmetr… ▽ More

    Submitted 19 June, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: Camera Ready Revision. ICML 2023

  7. arXiv:2302.12235  [pdf, other

    quant-ph cond-mat.dis-nn cond-mat.quant-gas cs.LG physics.comp-ph

    Q-Flow: Generative Modeling for Differential Equations of Open Quantum Dynamics with Normalizing Flows

    Authors: Owen Dugan, Peter Y. Lu, Rumen Dangovski, Di Luo, Marin Soljačić

    Abstract: Studying the dynamics of open quantum systems can enable breakthroughs both in fundamental physics and applications to quantum engineering and quantum computation. Since the density matrix $ρ$, which is the fundamental description for the dynamics of such systems, is high-dimensional, customized deep generative neural networks have been instrumental in modeling $ρ$. However, the complex-valued nat… ▽ More

    Submitted 6 June, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Report number: MIT-CTP/5533

  8. arXiv:2211.01365  [pdf, other

    quant-ph cs.AI cs.LG math.OC physics.comp-ph

    QuACK: Accelerating Gradient-Based Quantum Optimization with Koopman Operator Learning

    Authors: Di Luo, Jiayu Shen, Rumen Dangovski, Marin Soljačić

    Abstract: Quantum optimization, a key application of quantum computing, has traditionally been stymied by the linearly increasing complexity of gradient calculations with an increasing number of parameters. This work bridges the gap between Koopman operator theory, which has found utility in applications because it allows for a linear representation of nonlinear dynamical systems, and natural gradient metho… ▽ More

    Submitted 4 May, 2024; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Advances in Neural Information Processing Systems 36 (NeurIPS 2023) spotlight

    Report number: MIT-CTP/5488

  9. arXiv:2210.06171  [pdf, other

    cs.LG

    Learning to Optimize Quasi-Newton Methods

    Authors: Isaac Liao, Rumen R. Dangovski, Jakob N. Foerster, Marin Soljačić

    Abstract: Fast gradient-based optimization algorithms have become increasingly essential for the computationally efficient training of machine learning models. One technique is to multiply the gradient by a preconditioner matrix to produce a step, but it is unclear what the best preconditioner matrix is. This paper introduces a novel machine learning optimizer called LODO, which tries to online meta-learn t… ▽ More

    Submitted 11 September, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    ACM Class: I.2.6

  10. arXiv:2210.04783  [pdf, other

    cs.LG cs.CV physics.app-ph

    On the Importance of Calibration in Semi-supervised Learning

    Authors: Charlotte Loh, Rumen Dangovski, Shivchander Sudalairaj, Seungwook Han, Ligong Han, Leonid Karlinsky, Marin Soljacic, Akash Srivastava

    Abstract: State-of-the-art (SOTA) semi-supervised learning (SSL) methods have been highly successful in leveraging a mix of labeled and unlabeled data by combining techniques of consistency regularization and pseudo-labeling. During pseudo-labeling, the model's predictions on unlabeled data are used for training and thus, model calibration is important in mitigating confirmation bias. Yet, many SOTA methods… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: 24 pages

  11. arXiv:2210.00563  [pdf, other

    cs.SC cs.LG econ.EM

    AI-Assisted Discovery of Quantitative and Formal Models in Social Science

    Authors: Julia Balla, Sihao Huang, Owen Dugan, Rumen Dangovski, Marin Soljacic

    Abstract: In social science, formal and quantitative models, such as ones describing economic growth and collective action, are used to formulate mechanistic explanations, provide predictions, and uncover questions about observed phenomena. Here, we demonstrate the use of a machine learning system to aid the discovery of symbolic models that capture nonlinear and dynamical relationships in social science da… ▽ More

    Submitted 16 August, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: 19 pages, 4 figures

  12. arXiv:2208.14995  [pdf, other

    physics.comp-ph cs.LG nlin.CD nlin.SI physics.data-an

    Discovering Conservation Laws using Optimal Transport and Manifold Learning

    Authors: Peter Y. Lu, Rumen Dangovski, Marin Soljačić

    Abstract: Conservation laws are key theoretical and practical tools for understanding, characterizing, and modeling nonlinear dynamical systems. However, for many complex systems, the corresponding conserved quantities are difficult to identify, making it hard to analyze their dynamics and build stable predictive models. Current approaches for discovering conservation laws often depend on detailed dynamical… ▽ More

    Submitted 22 August, 2023; v1 submitted 31 August, 2022; originally announced August 2022.

    Comments: 30 pages, 15 figures (7 main text, 8 supplemental), 3 tables (supplemental)

    Journal ref: Nat. Commun. 14, 4744 (2023)

  13. arXiv:2204.10298  [pdf, other

    cs.CL

    DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

    Authors: Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljačić, Shang-Wen Li, Wen-tau Yih, Yoon Kim, James Glass

    Abstract: We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings. DiffCSE learns sentence embeddings that are sensitive to the difference between the original sentence and an edited sentence, where the edited sentence is obtained by stochastically masking out the original sentence and then sampling from a masked language model. We show that DiffSCE is an instance… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Comments: NAACL 2022 main conference (Long paper). Pretrained models and code are available at https://github.com/voidism/DiffCSE

  14. arXiv:2112.11929  [pdf, other

    cs.CV cs.LG

    Meta-Learning and Self-Supervised Pretraining for Real World Image Translation

    Authors: Ileana Rugina, Rumen Dangovski, Mark Veillette, Pooya Khorrami, Brian Cheung, Olga Simek, Marin Soljačić

    Abstract: Recent advances in deep learning, in particular enabled by hardware advances and big data, have provided impressive results across a wide range of computational problems such as computer vision, natural language, or reinforcement learning. Many of these improvements are however constrained to problems with large-scale curated data-sets which require a lot of human labor to gather. Additionally, th… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: 10 pages, 8 figures, 2 tables

  15. arXiv:2111.00899  [pdf, other

    cs.CV cs.LG eess.IV physics.app-ph

    Equivariant Contrastive Learning

    Authors: Rumen Dangovski, Li **g, Charlotte Loh, Seungwook Han, Akash Srivastava, Brian Cheung, Pulkit Agrawal, Marin Soljačić

    Abstract: In state-of-the-art self-supervised learning (SSL) pre-training produces semantically good representations by encouraging them to be invariant under meaningful transformations prescribed from human knowledge. In fact, the property of invariance is a trivial instance of a broader class called equivariance, which can be intuitively understood as the property that representations transform according… ▽ More

    Submitted 14 March, 2022; v1 submitted 28 October, 2021; originally announced November 2021.

    Comments: Camera Ready Revision. ICLR 2022. Discussion: https://openreview.net/forum?id=gKLAAfiytI Code: https://github.com/rdangovs/essl

  16. arXiv:2110.08406  [pdf, other

    cs.LG cond-mat.mtrl-sci physics.app-ph physics.optics

    Surrogate- and invariance-boosted contrastive learning for data-scarce applications in science

    Authors: Charlotte Loh, Thomas Christensen, Rumen Dangovski, Samuel Kim, Marin Soljacic

    Abstract: Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labelled data needed to train the model; this poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Here, we… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

    Comments: 21 pages, 10 figures

  17. arXiv:2012.02030  [pdf, other

    cs.CL

    Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks

    Authors: Ileana Rugina, Rumen Dangovski, Li **g, Preslav Nakov, Marin Soljačić

    Abstract: Attention mechanisms play a crucial role in the neural revolution of Natural Language Processing (NLP). With the growth of attention-based models, several pruning techniques have been developed to identify and exploit sparseness, making these models more efficient. Most efforts focus on hard-coding attention patterns or pruning attention weights based on training data. We propose Attention Pruning… ▽ More

    Submitted 17 May, 2024; v1 submitted 20 November, 2020; originally announced December 2020.

    Comments: Presented at LREC-COLING 2024: 12 pages, 4 figures, 11 tables

  18. arXiv:2010.08412  [pdf, other

    cs.CL cs.AR

    Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications

    Authors: Matthew Khoury, Rumen Dangovski, Longwu Ou, Preslav Nakov, Yichen Shen, Li **g

    Abstract: Deep neural networks have become the standard approach to building reliable Natural Language Processing (NLP) applications, ranging from Neural Machine Translation (NMT) to dialogue systems. However, improving accuracy by increasing the model size requires a large number of hardware computations, which can slow down NLP applications significantly at inference time. To address this issue, we propos… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: To appear at the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP '20), November 16-20, 2020, NMT, AI accelerators, co-design, TPU, OPU, 10 pages, 3 figures, 4 tables

  19. arXiv:2007.10784  [pdf, other

    cs.LG cs.NE stat.ML

    OccamNet: A Fast Neural Model for Symbolic Regression at Scale

    Authors: Owen Dugan, Rumen Dangovski, Allan Costa, Samuel Kim, Pawan Goyal, Joseph Jacobson, Marin Soljačić

    Abstract: Neural networks' expressiveness comes at the cost of complex, black-box models that often extrapolate poorly beyond the domain of the training dataset, conflicting with the goal of finding compact analytic expressions to describe scientific data. We introduce OccamNet, a neural network model that finds interpretable, compact, and sparse symbolic fits to data, à la Occam's razor. Our model defines… ▽ More

    Submitted 27 November, 2023; v1 submitted 16 July, 2020; originally announced July 2020.

  20. arXiv:2007.10143  [pdf, other

    cs.LG cs.CV cs.NE stat.ML

    Contextualizing Enhances Gradient Based Meta Learning

    Authors: Evan Vogelbaum, Rumen Dangovski, Li **g, Marin Soljačić

    Abstract: Meta learning methods have found success when applied to few shot classification problems, in which they quickly adapt to a small number of labeled examples. Prototypical representations, each representing a particular class, have been of particular importance in this setting, as they provide a compact form to convey information learned from the labeled examples. However, these prototypes are just… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

  21. arXiv:2007.09456  [pdf, ps, other

    cs.CL cs.LG stat.ML

    On a Novel Application of Wasserstein-Procrustes for Unsupervised Cross-Lingual Learning

    Authors: Guillem Ramírez, Rumen Dangovski, Preslav Nakov, Marin Soljačić

    Abstract: The emergence of unsupervised word embeddings, pre-trained on very large monolingual text corpora, is at the core of the ongoing neural revolution in Natural Language Processing (NLP). Initially introduced for English, such pre-trained word embeddings quickly emerged for a number of other languages. Subsequently, there have been a number of attempts to align the embedding spaces across languages,… ▽ More

    Submitted 16 June, 2024; v1 submitted 18 July, 2020; originally announced July 2020.

    Journal ref: Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) at LREC-COLING 2024

  22. arXiv:1811.11644  [pdf, other

    cs.LG cs.CV stat.ML

    WaveletNet: Logarithmic Scale Efficient Convolutional Neural Networks for Edge Devices

    Authors: Li **g, Rumen Dangovski, Marin Soljacic

    Abstract: We present a logarithmic-scale efficient convolutional neural network architecture for edge devices, named WaveletNet. Our model is based on the well-known depthwise convolution, and on two new layers, which we introduce in this work: a wavelet convolution and a depthwise fast wavelet transform. By breaking the symmetry in channel dimensions and applying a fast algorithm, WaveletNet shrinks the co… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Comments: 10 pages, 5 figures

  23. arXiv:1710.09537  [pdf, other

    cs.LG cs.NE stat.ML

    Rotational Unit of Memory

    Authors: Rumen Dangovski, Li **g, Marin Soljacic

    Abstract: The concepts of unitary evolution matrices and associative memory have boosted the field of Recurrent Neural Networks (RNN) to state-of-the-art performance in a variety of sequential tasks. However, RNN still have a limited capacity to manipulate long-term memory. To bypass this weakness the most successful applications of RNN use external techniques such as attention mechanisms. In this paper we… ▽ More

    Submitted 26 October, 2017; originally announced October 2017.