Skip to main content

Showing 1–25 of 25 results for author: Shaham, U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06634  [pdf, other

    eess.AS cs.LG cs.SD

    Sparse Binarization for Fast Keyword Spotting

    Authors: Jonathan Svirsky, Uri Shaham, Ofir Lindenbaum

    Abstract: With the increasing prevalence of voice-activated devices and applications, keyword spotting (KWS) models enable users to interact with technology hands-free, enhancing convenience and accessibility in various contexts. Deploying KWS models on edge devices, such as smartphones and embedded systems, offers significant benefits for real-time applications, privacy, and bandwidth efficiency. However,… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  2. arXiv:2401.01854  [pdf, other

    cs.CL cs.AI cs.LG

    Multilingual Instruction Tuning With Just a Pinch of Multilinguality

    Authors: Uri Shaham, Jonathan Herzig, Roee Aharoni, Idan Szpektor, Reut Tsarfaty, Matan Eyal

    Abstract: As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-follo… ▽ More

    Submitted 21 May, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Findings of ACL 2024

  3. arXiv:2305.14196  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding

    Authors: Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, Omer Levy

    Abstract: We introduce ZeroSCROLLS, a zero-shot benchmark for natural language understanding over long texts, which contains only test and small validation sets, without training data. We adapt six tasks from the SCROLLS benchmark, and add four new datasets, including two novel information fusing tasks, such as aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a comprehensive eva… ▽ More

    Submitted 17 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Findings of EMNLP 2023

  4. arXiv:2212.07530  [pdf, other

    cs.CL cs.AI cs.LG

    Causes and Cures for Interference in Multilingual Translation

    Authors: Uri Shaham, Maha Elbayad, Vedanuj Goswami, Omer Levy, Shruti Bhosale

    Abstract: Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference. While there is a growing number of sophisticated methods that aim to eliminate interference, our understanding of interference as a phenomenon is still limited. This work identifies the main factors that contribute to interference in multilingual machine translation… ▽ More

    Submitted 19 May, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  5. arXiv:2208.00748  [pdf, other

    cs.CL cs.AI cs.LG

    Efficient Long-Text Understanding with Short-Text Models

    Authors: Maor Ivgi, Uri Shaham, Jonathan Berant

    Abstract: Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scientific articles and long documents, due to their quadratic complexity. While a myriad of efficient transformer variants have been proposed, they are typically based on custom implementations that require expensive pretraining from scr… ▽ More

    Submitted 27 December, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

    Comments: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2023. Authors' final version (pre-MIT)

  6. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  7. arXiv:2205.10782  [pdf, other

    cs.CL

    Instruction Induction: From Few Examples to Natural Language Task Descriptions

    Authors: Or Honovich, Uri Shaham, Samuel R. Bowman, Omer Levy

    Abstract: Large language models are able to perform a task by conditioning on a few input-output demonstrations - a paradigm known as in-context learning. We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples. To explore this ability, we introduce the instruction induction challenge,… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

  8. arXiv:2201.03533  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    SCROLLS: Standardized CompaRison Over Long Language Sequences

    Authors: Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy

    Abstract: NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing infor… ▽ More

    Submitted 11 October, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: EMNLP 2022

  9. arXiv:2110.05887  [pdf, other

    stat.ML cs.LG

    Discovery of Single Independent Latent Variable

    Authors: Uri Shaham, Jonathan Svirsky, Ori Katz, Ronen Talmon

    Abstract: Latent variable discovery is a central problem in data analysis with a broad range of applications in applied science. In this work, we consider data given as an invertible mixture of two statistically independent components and assume that one of the components is observed while the other is hidden. Our goal is to recover the hidden component. For this purpose, we propose an autoencoder equipped… ▽ More

    Submitted 7 March, 2023; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at Neurips 2022. In the current version the proof of the lemma is modified

    Journal ref: Advances in Neural Information Processing Systems 2022

  10. arXiv:2110.05306  [pdf, other

    stat.ML cs.AI cs.LG

    Deep Unsupervised Feature Selection by Discarding Nuisance and Correlated Features

    Authors: Uri Shaham, Ofir Lindenbaum, Jonathan Svirsky, Yuval Kluger

    Abstract: Modern datasets often contain large subsets of correlated features and nuisance features, which are not or loosely related to the main underlying structures of the data. Nuisance features can be identified using the Laplacian score criterion, which evaluates the importance of a given feature via its consistency with the Graph Laplacians' leading eigenvectors. We demonstrate that in the presence of… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

  11. arXiv:2107.09729  [pdf, other

    cs.CL cs.AI cs.LG

    What Do You Get When You Cross Beam Search with Nucleus Sampling?

    Authors: Uri Shaham, Omer Levy

    Abstract: We combine beam search with the probabilistic pruning technique of nucleus sampling to create two deterministic nucleus search algorithms for natural language generation. The first algorithm, p-exact search, locally prunes the next-token distribution and performs an exact search over the remaining space. The second algorithm, dynamic beam search, shrinks and expands the beam size according to the… ▽ More

    Submitted 2 May, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: The Third Workshop on Insights from Negative Results in NLP

  12. arXiv:2103.01242  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language

    Authors: Avia Efrat, Uri Shaham, Dan Kilman, Omer Levy

    Abstract: Current NLP datasets targeting ambiguity can be solved by a native speaker with relative ease. We present Cryptonite, a large-scale dataset based on cryptic crosswords, which is both linguistically complex and naturally sourced. Each example in Cryptonite is a cryptic clue, a short phrase or sentence with a misleading surface reading, whose solving requires disambiguating semantic, syntactic, and… ▽ More

    Submitted 1 November, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: EMNLP 2021

  13. arXiv:2011.07607  [pdf, other

    stat.ML cs.LG

    Deep Ordinal Regression using Optimal Transport Loss and Unimodal Output Probabilities

    Authors: Uri Shaham, Igal Zaidman, Jonathan Svirsky

    Abstract: It is often desired that ordinal regression models yield unimodal predictions. However, in many recent works this characteristic is either absent, or implemented using soft targets, which do not guarantee unimodal outputs at inference. In addition, we argue that the standard maximum likelihood objective is not suitable for ordinal regression problems, and that optimal transport is better suited fo… ▽ More

    Submitted 18 November, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

  14. arXiv:2008.09396  [pdf, other

    cs.CL cs.LG stat.ML

    Neural Machine Translation without Embeddings

    Authors: Uri Shaham, Omer Levy

    Abstract: Many NLP models operate over sequences of subword tokens produced by hand-crafted tokenization rules and heuristic subword induction algorithms. A simple universal alternative is to represent every computerized text as a sequence of bytes via UTF-8, obviating the need for an embedding layer since there are fewer token types (256) than dimensions. Surprisingly, replacing the ubiquitous embedding la… ▽ More

    Submitted 12 April, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

    Comments: NAACL 2021

  15. arXiv:2007.04728  [pdf, other

    cs.LG stat.ML

    Differentiable Unsupervised Feature Selection based on a Gated Laplacian

    Authors: Ofir Lindenbaum, Uri Shaham, Jonathan Svirsky, Erez Peterfreund, Yuval Kluger

    Abstract: Scientific observations may consist of a large number of variables (features). Identifying a subset of meaningful features is often ignored in unsupervised learning, despite its potential for unraveling clear patterns hidden in the ambient space. In this paper, we present a method for unsupervised feature selection, and we demonstrate its use for the task of clustering. We propose a differentiable… ▽ More

    Submitted 9 November, 2020; v1 submitted 9 July, 2020; originally announced July 2020.

  16. arXiv:2004.00994  [pdf, other

    cs.LG cs.AI stat.ML

    Learning to Ask Medical Questions using Reinforcement Learning

    Authors: Uri Shaham, Tom Zahavy, Cesar Caraballo, Shiwani Mahajan, Daisy Massey, Harlan Krumholz

    Abstract: We propose a novel reinforcement learning-based approach for adaptive and iterative feature selection. Given a masked vector of input features, a reinforcement learning agent iteratively selects certain features to be unmasked, and uses them to predict an outcome when it is sufficiently confident. The algorithm makes use of a novel environment setting, corresponding to a non-stationary Markov Deci… ▽ More

    Submitted 25 May, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

  17. arXiv:1807.10597  [pdf

    cs.CV

    Automated Characterization of Stenosis in Invasive Coronary Angiography Images with Convolutional Neural Networks

    Authors: Benjamin Au, Uri Shaham, Sanket Dhruva, Georgios Bouras, Ecaterina Cristea, Alexandra Lansky MD, Andreas Coppi, Fred Warner, Shu-Xia Li, Harlan Krumholz

    Abstract: The determination of a coronary stenosis and its severity in current clinical workflow is typically accomplished manually via physician visual assessment (PVA) during invasive coronary angiography. While PVA has shown large inter-rater variability, the more reliable and accurate alternative of Quantitative Coronary Angiography (QCA) is challenging to perform in real-time due to the busy workflow i… ▽ More

    Submitted 19 July, 2018; originally announced July 2018.

  18. arXiv:1803.10840  [pdf, other

    stat.ML cs.LG

    Defending against Adversarial Images using Basis Functions Transformations

    Authors: Uri Shaham, James Garritano, Yutaro Yamada, Ethan Weinberger, Alex Cloninger, Xiuyuan Cheng, Kelly Stanton, Yuval Kluger

    Abstract: We study the effectiveness of various approaches that defend against adversarial attacks on deep networks via manipulations based on basis function representations of images. Specifically, we experiment with low-pass filtering, PCA, JPEG compression, low resolution wavelet approximation, and soft-thresholding. We evaluate these defense techniques using three types of popular attacks in black, gray… ▽ More

    Submitted 16 April, 2018; v1 submitted 28 March, 2018; originally announced March 2018.

    Comments: added link to GitHub repository

  19. arXiv:1801.01587  [pdf, other

    stat.ML cs.LG

    SpectralNet: Spectral Clustering using Deep Neural Networks

    Authors: Uri Shaham, Kelly Stanton, Henry Li, Boaz Nadler, Ronen Basri, Yuval Kluger

    Abstract: Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data p… ▽ More

    Submitted 4 April, 2018; v1 submitted 4 January, 2018; originally announced January 2018.

    Comments: Added citations. Accepted to ICLR 2018

  20. DeepSurv: Personalized Treatment Recommender System Using A Cox Proportional Hazards Deep Neural Network

    Authors: Jared Katzman, Uri Shaham, Jonathan Bates, Alexander Cloninger, Tingting Jiang, Yuval Kluger

    Abstract: Medical practitioners use survival models to explore and understand the relationships between patients' covariates (e.g. clinical and genetic features) and the effectiveness of various treatment options. Standard survival models like the linear Cox proportional hazards model require extensive feature engineering or prior medical knowledge to model treatment interaction at an individual level. Whil… ▽ More

    Submitted 8 August, 2017; v1 submitted 2 June, 2016; originally announced June 2016.

    Comments: Presented at the International Conference of Machine Learning Computational Biology Workshop 2016

  21. arXiv:1602.02285  [pdf, other

    stat.ML cs.LG

    A Deep Learning Approach to Unsupervised Ensemble Learning

    Authors: Uri Shaham, Xiuyuan Cheng, Omer Dror, Ariel Jaffe, Boaz Nadler, Joseph Chang, Yuval Kluger

    Abstract: We show how deep learning methods can be applied in the context of crowdsourcing and unsupervised ensemble learning. First, we prove that the popular model of Dawid and Skene, which assumes that all classifiers are conditionally independent, is {\em equivalent} to a Restricted Boltzmann Machine (RBM) with a single hidden node. Hence, under this model, the posterior probabilities of the true labels… ▽ More

    Submitted 6 February, 2016; originally announced February 2016.

    Report number: PMLR 48:30-39

  22. arXiv:1512.08806  [pdf, other

    stat.ML cs.LG cs.NE

    Common Variable Learning and Invariant Representation Learning using Siamese Neural Networks

    Authors: Uri Shaham, Roy Lederman

    Abstract: We consider the statistical problem of learning common source of variability in data which are synchronously captured by multiple sensors, and demonstrate that Siamese neural networks can be naturally applied to this problem. This approach is useful in particular in exploratory, data-driven applications, where neither a model nor label information is available. In recent years, many researchers ha… ▽ More

    Submitted 11 May, 2016; v1 submitted 29 December, 2015; originally announced December 2015.

  23. Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization

    Authors: Uri Shaham, Yutaro Yamada, Sahand Negahban

    Abstract: We propose a general framework for increasing local stability of Artificial Neural Nets (ANNs) using Robust Optimization (RO). We achieve this through an alternating minimization-maximization procedure, in which the loss of the network is minimized over perturbed examples that are generated at each parameter update. We show that adversarial training of ANNs is in fact robustification of the networ… ▽ More

    Submitted 16 January, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

  24. arXiv:1509.07385  [pdf, other

    stat.ML cs.LG cs.NE

    Provable approximation properties for deep neural networks

    Authors: Uri Shaham, Alexander Cloninger, Ronald R. Coifman

    Abstract: We discuss approximation of functions using deep neural nets. Given a function $f$ on a $d$-dimensional manifold $Γ\subset \mathbb{R}^m$, we construct a sparsely-connected depth-4 neural network and bound its error in approximating $f$. The size of the network depends on dimension and curvature of the manifold $Γ$, the complexity of $f$, in terms of its wavelet description, and only weakly on the… ▽ More

    Submitted 28 March, 2016; v1 submitted 24 September, 2015; originally announced September 2015.

    Comments: accepted for publication in Applied and Computational Harmonic Analysis

  25. arXiv:1506.07840  [pdf, other

    stat.ML cs.LG math.CA

    Diffusion Nets

    Authors: Gal Mishne, Uri Shaham, Alexander Cloninger, Israel Cohen

    Abstract: Non-linear manifold learning enables high-dimensional data analysis, but requires out-of-sample-extension methods to process new data points. In this paper, we propose a manifold learning algorithm based on deep learning to create an encoder, which maps a high-dimensional dataset and its low-dimensional embedding, and a decoder, which takes the embedded data back to the high-dimensional space. Sta… ▽ More

    Submitted 25 June, 2015; originally announced June 2015.

    Comments: 24 pages, 12 figures