Skip to main content

Showing 1–50 of 112 results for author: Schwartz, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15936  [pdf, other

    cs.CY cs.AI cs.DB cs.LG

    An Automated SQL Query Grading System Using An Attention-Based Convolutional Neural Network

    Authors: Donald R. Schwartz, Pablo Rivas

    Abstract: Grading SQL queries can be a time-consuming, tedious and challenging task, especially as the number of student submissions increases. Several systems have been introduced in an attempt to mitigate these challenges, but those systems have their own limitations. This paper describes our novel approach to automating the process of grading SQL queries. Unlike previous approaches, we employ a unique co… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 12 pages, 8 figures, paper accepted at "The 18th International Conference on Frontiers in Education: Computer Science and Computer Engineering"

    ACM Class: I.2.6; H.2.3; K.3.2

  2. arXiv:2406.06386  [pdf, other

    cs.CV

    FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography

    Authors: Julia Yang, Alina Jade Barnett, Jon Donnelly, Satvik Kishore, Jerry Fang, Fides Regina Schwartz, Chaofan Chen, Joseph Y. Lo, Cynthia Rudin

    Abstract: Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency t… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 8 pages, 6 figures, Accepted for oral presentation at the 2024 CVPR Workshop on Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis (DEF-AI-MIA)

  3. arXiv:2405.06563  [pdf, other

    cs.CL

    What Can Natural Language Processing Do for Peer Review?

    Authors: Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, Tom Hope, Dirk Hovy, Jonathan K. Kummerfeld, Anne Lauscher, Kevin Leyton-Brown, Sheng Lu, Mausam, Margot Mieskes, Aurélie Névéol, Danish Pruthi, Lizhen Qu, Roy Schwartz, Noah A. Smith, Thamar Solorio, **gyan Wang, Xiaodan Zhu, Anna Rogers, Nihar B. Shah, Iryna Gurevych

    Abstract: The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  4. arXiv:2405.04304  [pdf, other

    cs.CL

    Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models

    Authors: Jonathan Mamou, Oren Pereg, Daniel Korat, Moshe Berchansky, Nadav Timor, Moshe Wasserblat, Roy Schwartz

    Abstract: Speculative decoding is commonly used for reducing the inference latency of large language models. Its effectiveness depends highly on the speculation lookahead (SL)-the number of tokens generated by the draft model at each iteration. In this work we show that the common practice of using the same SL for all iterations (static SL) is suboptimal. We introduce DISCO (DynamIc SpeCulation lookahead Op… ▽ More

    Submitted 23 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  5. arXiv:2405.02743  [pdf, other

    cs.CL

    Beyond Performance: Quantifying and Mitigating Label Bias in LLMs

    Authors: Yuval Reif, Roy Schwartz

    Abstract: Large language models (LLMs) have shown remarkable adaptability to diverse tasks, by leveraging context prompts containing instructions, or minimal input-output examples. However, recent work revealed they also exhibit label bias -- an undesirable preference toward predicting certain answers over others. Still, detecting and measuring this bias reliably and at scale has remained relatively unexplo… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: NAACL 2024

  6. arXiv:2404.00725  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    The Larger the Better? Improved LLM Code-Generation via Budget Reallocation

    Authors: Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi

    Abstract: It is a common belief that large language models (LLMs) are better than smaller-sized ones. However, larger models also require significantly more time and compute during inference. This begs the question: what happens when both models operate under the same budget? (e.g., compute, run-time). To address this question, we analyze code generation LLMs of various sizes and make comparisons such as ru… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  7. arXiv:2401.06104  [pdf, other

    cs.CL

    Transformers are Multi-State RNNs

    Authors: Matanel Oren, Michael Hassid, Nir Yarden, Yossi Adi, Roy Schwartz

    Abstract: Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only transformers can in fact be conceptualized as unbounded multi-state RNNs - an RNN variant with unlimited hidden state size. We further show that transformers can be converted into $\textit{bounded}$ multi-s… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: preprint

  8. arXiv:2311.15639  [pdf, other

    cs.DS

    On Approximating Cutwidth and Pathwidth

    Authors: Nikhil Bansal, Dor Katzelnick, Roy Schwartz

    Abstract: We study graph ordering problems with a min-max objective. A classical problem of this type is cutwidth, where given a graph we want to order its vertices such that the number of edges crossing any point is minimized. We give a $ \log^{1+o(1)}(n)$ approximation for the problem, substantially improving upon the previous poly-logarithmic guarantees based on the standard recursive balanced partitioni… ▽ More

    Submitted 12 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  9. arXiv:2311.02251  [pdf

    cs.LG cs.AI eess.SP

    The Potential of Wearable Sensors for Assessing Patient Acuity in Intensive Care Unit (ICU)

    Authors: Jessica Sena, Mohammad Tahsin Mostafiz, Jiaqing Zhang, Andrea Davidson, Sabyasachi Bandyopadhyay, Ren Yuanfang, Tezcan Ozrazgat-Baslanti, Benjamin Shickel, Tyler Loftus, William Robson Schwartz, Azra Bihorac, Parisa Rashidi

    Abstract: Acuity assessments are vital in critical care settings to provide timely interventions and fair resource allocation. Traditional acuity scores rely on manual assessments and documentation of physiological states, which can be time-consuming, intermittent, and difficult to use for healthcare providers. Furthermore, such scores do not incorporate granular information such as patients' mobility level… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  10. arXiv:2311.00400  [pdf, other

    cs.CV

    Open-Set Face Recognition with Maximal Entropy and Objectosphere Loss

    Authors: Rafael Henrique Vareto, Yu Linghu, Terrance E. Boult, William Robson Schwartz, Manuel Günther

    Abstract: Open-set face recognition characterizes a scenario where unknown individuals, unseen during the training and enrollment stages, appear on operation time. This work concentrates on watchlists, an open-set task that is expected to operate at a low False Positive Identification Rate and generally includes only a few enrollment samples per identity. We introduce a compact adapter network that benefits… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted for publication in Image and Vision Computing 2023

  11. arXiv:2310.18877  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition

    Authors: Isaac Slaughter, Craig Greenberg, Reva Schwartz, Aylin Caliskan

    Abstract: Previous work has established that a person's demographics and speech style affect how well speech processing models perform for them. But where does this bias come from? In this work, we present the Speech Embedding Association Test (SpEAT), a method for detecting bias in one type of model used for many speech tasks: pre-trained models. The SpEAT is inspired by word embedding association tests in… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  12. arXiv:2309.05088  [pdf

    cs.CY q-bio.OT

    Towards Trustworthy Artificial Intelligence for Equitable Global Health

    Authors: Hong Qin, Jude Kong, Wandi Ding, Ramneek Ahluwalia, Christo El Morr, Zeynep Engin, Jake Okechukwu Effoduh, Rebecca Hwa, Serena **gchuan Guo, Laleh Seyyed-Kalantari, Sylvia Kiwuwa Muyingo, Candace Makeda Moore, Ravi Parikh, Reva Schwartz, Dongxiao Zhu, Xiaoqian Wang, Yiye Zhang

    Abstract: Artificial intelligence (AI) can potentially transform global health, but algorithmic bias can exacerbate social inequities and disparity. Trustworthy AI entails the intentional design to ensure equity and mitigate potential biases. To advance trustworthy AI in global health, we convened a workshop on Fairness in Machine Intelligence for Global Health (FairMI4GH). The event brought together a glob… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: 7 pages

  13. arXiv:2308.12371  [pdf, other

    cs.CV cs.AI cs.LG

    Open-set Face Recognition with Neural Ensemble, Maximal Entropy Loss and Feature Augmentation

    Authors: Rafael Henrique Vareto, Manuel Günther, William Robson Schwartz

    Abstract: Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of inter… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Journal ref: 36th Conference on Graphics, Patterns and Images (SIBGRAPI 2023)

  14. arXiv:2308.07746  [pdf, other

    cs.DS

    A Tight Competitive Ratio for Online Submodular Welfare Maximization

    Authors: Amit Ganz, Pranav Nuti, Roy Schwartz

    Abstract: In this paper we consider the online Submodular Welfare (SW) problem. In this problem we are given $n$ bidders each equipped with a general (not necessarily monotone) submodular utility and $m$ items that arrive online. The goal is to assign each item, once it arrives, to a bidder or discard it, while maximizing the sum of utilities. When an adversary determines the items' arrival order we present… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  15. Open-set Face Recognition using Ensembles trained on Clustered Data

    Authors: Rafael Henrique Vareto, William Robson Schwartz

    Abstract: Open-set face recognition describes a scenario where unknown subjects, unseen during the training stage, appear on test time. Not only it requires methods that accurately identify individuals of interest, but also demands approaches that effectively deal with unfamiliar faces. This work details a scalable open-set face identification approach to galleries composed of hundreds and thousands of subj… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: [Original paper title: Unconstrained Face Identification using Ensembles trained on Clustered Data] [2020 IEEE International Joint Conference on Biometrics (IJCB)] [https://ieeexplore.ieee.org/document/9304882]

  16. arXiv:2308.03516  [pdf, other

    cs.DS

    An Improved Approximation Algorithm for the Max-$3$-Section Problem

    Authors: Dor Katzelnick, Aditya Pillai, Roy Schwartz, Mohit Singh

    Abstract: We consider the Max-$3$-Section problem, where we are given an undirected graph $ G=(V,E)$ equipped with non-negative edge weights $w :E\rightarrow \mathbb{R}_+$ and the goal is to find a partition of $V$ into three equisized parts while maximizing the total weight of edges crossing between different parts. Max-$3$-Section is closely related to other well-studied graph partitioning problems, e.g.,… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  17. arXiv:2307.04532  [pdf, other

    cs.CV cs.AI cs.CL eess.AS

    Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

    Authors: Netta Madvil, Yonatan Bitton, Roy Schwartz

    Abstract: The prevalence of large-scale multimodal datasets presents unique challenges in assessing dataset quality. We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it. Our method sheds light on the importance of different modalities in datasets, as well as the relationship bet… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  18. arXiv:2306.16900  [pdf, other

    cs.CL

    Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

    Authors: Ji-Ung Lee, Haritz Puerto, Betty van Aken, Yuki Arase, Jessica Zosa Forde, Leon Derczynski, Andreas Rücklé, Iryna Gurevych, Roy Schwartz, Emma Strubell, Jesse Dodge

    Abstract: Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of parameters. Large model sizes makes computational cost one of the main limiting factors for training and evaluating such models; and has raised severe concerns about the sustainability, reproducibility, and inclusiveness for researching PLMs. These concerns are often based… ▽ More

    Submitted 9 November, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

  19. Morphosyntactic probing of multilingual BERT models

    Authors: Judit Acs, Endre Hamerlik, Roy Schwartz, Noah A. Smith, Andras Kornai

    Abstract: We introduce an extensive dataset for multilingual probing of morphological information in language models (247 tasks across 42 languages from 10 families), each consisting of a sentence with a target word and a morphological tag as the desired label, derived from the Universal Dependencies treebanks. We find that pre-trained Transformer models (mBERT and XLM-RoBERTa) learn features that attain st… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: to appear in the Journal of Natural Language Engineering

  20. arXiv:2306.02307  [pdf, other

    cs.CL cs.AI cs.LG

    Finding the SWEET Spot: Analysis and Improvement of Adaptive Inference in Low Resource Settings

    Authors: Daniel Rotem, Michael Hassid, Jonathan Mamou, Roy Schwartz

    Abstract: Adaptive inference is a simple method for reducing inference costs. The method works by maintaining multiple classifiers of different capacities, and allocating resources to each test instance according to its difficulty. In this work, we compare the two main approaches for adaptive inference, Early-Exit and Multi-Model, when training data is limited. First, we observe that for models with the sam… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: Proceedings of ACL 2023

  21. arXiv:2305.18917  [pdf, other

    cs.CL

    Fighting Bias with Bias: Promoting Model Robustness by Amplifying Dataset Biases

    Authors: Yuval Reif, Roy Schwartz

    Abstract: NLP models often rely on superficial cues known as dataset biases to achieve impressive performance, and can fail on examples where these biases do not hold. Recent work sought to develop robust, unbiased models by filtering biased examples from training sets. In this work, we argue that such filtering can obscure the true capabilities of models to overcome biases, which might never be removed in… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Findings of ACL 2023

  22. Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers

    Authors: Valfride Nascimento, Rayson Laroca, Jorge de A. Lambert, William Robson Schwartz, David Menotti

    Abstract: Recent years have seen significant developments in the field of License Plate Recognition (LPR) through the integration of deep learning techniques and the increasing availability of training data. Nevertheless, reconstructing license plates (LPs) from low-resolution (LR) surveillance footage remains challenging. To address this issue, we introduce a Single-Image Super-Resolution (SISR) approach t… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Journal ref: Computers & Graphics, vol. 113, pp. 69-76, 2023

  23. arXiv:2305.13009  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Textually Pretrained Speech Language Models

    Authors: Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi

    Abstract: Speech language models (SpeechLMs) process and generate acoustic data only, without textual supervision. In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models. We show using both automatic and human evaluations that TWIST outperforms a cold-start SpeechLM across the board. We empirically analyze the effect of different model de… ▽ More

    Submitted 30 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  24. arXiv:2303.07274  [pdf, other

    cs.CV cs.AI cs.CL

    Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

    Authors: Nitzan Bitton-Guetta, Yonatan Bitton, Jack Hessel, Ludwig Schmidt, Yuval Elovici, Gabriel Stanovsky, Roy Schwartz

    Abstract: Weird, unusual, and uncanny images pique the curiosity of observers because they challenge commonsense. For example, an image released during the 2022 world cup depicts the famous soccer stars Lionel Messi and Cristiano Ronaldo playing chess, which playfully violates our expectation that their competition should occur on the football field. Humans can easily recognize and interpret these unconvent… ▽ More

    Submitted 12 August, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023. Website: whoops-benchmark.github.io

  25. arXiv:2212.04542  [pdf, other

    cs.CV cs.AI cs.CL

    VASR: Visual Analogies of Situation Recognition

    Authors: Yonatan Bitton, Ron Yosef, Eli Strugo, Dafna Shahaf, Roy Schwartz, Gabriel Stanovsky

    Abstract: A core process in human cognition is analogical map**: the ability to identify a similar relational structure between different situations. We introduce a novel task, Visual Analogies of Situation Recognition, adapting the classical word-analogy task into the visual domain. Given a triplet of images, the task is to select an image candidate B' that completes the analogy (A to A' is like B to wha… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: Accepted to AAAI 2023. Website: https://vasr-dataset.github.io/

  26. arXiv:2211.03495  [pdf, other

    cs.CL cs.LG

    How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

    Authors: Michael Hassid, Hao Peng, Daniel Rotem, Jungo Kasai, Ivan Montero, Noah A. Smith, Roy Schwartz

    Abstract: The attention mechanism is considered the backbone of the widely-used Transformer architecture. It contextualizes the input by computing input-specific attention matrices. We find that this mechanism, while powerful and elegant, is not as important as typically thought for pretrained language models. We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: Findings of EMNLP 2022

  27. Combining Attention Module and Pixel Shuffle for License Plate Super-Resolution

    Authors: Valfride Nascimento, Rayson Laroca, Jorge de A. Lambert, William Robson Schwartz, David Menotti

    Abstract: The License Plate Recognition (LPR) field has made impressive advances in the last decade due to novel deep learning approaches combined with the increased availability of training data. However, it still has some open issues, especially when the data come from low-resolution (LR) and low-quality images/videos, as in surveillance systems. This work focuses on license plate (LP) reconstruction in L… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: Accepted for presentation at the Conference on Graphics, Patterns and Images (SIBGRAPI) 2022

  28. arXiv:2209.00099  [pdf, other

    cs.CL

    Efficient Methods for Natural Language Processing: A Survey

    Authors: Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

    Abstract: Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require few… ▽ More

    Submitted 24 March, 2023; v1 submitted 31 August, 2022; originally announced September 2022.

    Comments: Accepted at TACL, pre publication version

  29. arXiv:2207.12576  [pdf, other

    cs.CL cs.AI cs.CV cs.HC

    WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

    Authors: Yonatan Bitton, Nitzan Bitton Guetta, Ron Yosef, Yuval Elovici, Mohit Bansal, Gabriel Stanovsky, Roy Schwartz

    Abstract: While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills. In this work, we introduce WinoGAViL: an online game of vision-and-language associations (e.g., between werewolves and a full moon), used as a dynamic evaluation benchmark. Inspired by the popular card game Codenames, a spymaster gives a… ▽ More

    Submitted 11 October, 2022; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted to NeurIPS 2022, Datasets and Benchmarks. Website: https://winogavil.github.io/

  30. arXiv:2206.09860  [pdf, other

    cs.CL

    Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias

    Authors: Yarden Tal, Inbal Magar, Roy Schwartz

    Abstract: The size of pretrained models is increasing, and so is their performance on a variety of NLP tasks. However, as their memorization capacity grows, they might pick up more social biases. In this work, we examine the connection between model size and its gender bias (specifically, occupational gender bias). We measure bias in three masked language model families (RoBERTa, DeBERTa, and T5) in two set… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  31. arXiv:2206.05229  [pdf, other

    cs.LG

    Measuring the Carbon Intensity of AI in Cloud Instances

    Authors: Jesse Dodge, Taylor Prewitt, Remi Tachet Des Combes, Erika Odmark, Roy Schwartz, Emma Strubell, Alexandra Sasha Luccioni, Noah A. Smith, Nicole DeCario, Will Buchanan

    Abstract: By providing unprecedented access to computational resources, cloud computing has enabled rapid growth in technologies such as machine learning, the computational demands of which incur a high energy cost and a commensurate carbon footprint. As a result, recent scholarship has called for better estimates of the greenhouse gas impact of AI: data scientists today do not have easy or reliable access… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: In ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT) 2022

  32. arXiv:2204.12708  [pdf, other

    cs.CL

    On the Limitations of Dataset Balancing: The Lost Battle Against Spurious Correlations

    Authors: Roy Schwartz, Gabriel Stanovsky

    Abstract: Recent work has shown that deep learning models in NLP are highly sensitive to low-level correlations between simple features and specific output labels, leading to overfitting and lack of generalization. To mitigate this problem, a common practice is to balance datasets by adding new instances or by filtering out "easy" instances (Sakaguchi et al., 2020), culminating in a recent proposal to elimi… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: Findings of NAACL 2022

  33. arXiv:2204.06271  [pdf, other

    cs.CL cs.AI

    TangoBERT: Reducing Inference Cost by using Cascaded Architecture

    Authors: Jonathan Mamou, Oren Pereg, Moshe Wasserblat, Roy Schwartz

    Abstract: The remarkable success of large transformer-based models such as BERT, RoBERTa and XLNet in many NLP tasks comes with a large increase in monetary and environmental cost due to their high computational load and energy consumption. In order to reduce this computational load in inference time, we present TangoBERT, a cascaded model architecture in which instances are first processed by an efficient… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

  34. arXiv:2204.02406  [pdf

    eess.IV cs.AI cs.CV

    A deep learning framework for the detection and quantification of drusen and reticular pseudodrusen on optical coherence tomography

    Authors: Roy Schwartz, Hagar Khalid, Sandra Liakopoulos, Yanling Ouyang, Coen de Vente, Cristina González-Gonzalo, Aaron Y. Lee, Robyn Guymer, Emily Y. Chew, Catherine Egan, Zhichao Wu, Himeesh Kumar, Joseph Farrington, Clara I. Sánchez, Adnan Tufail

    Abstract: Purpose - To develop and validate a deep learning (DL) framework for the detection and quantification of drusen and reticular pseudodrusen (RPD) on optical coherence tomography scans. Design - Development and validation of deep learning models for classification and feature segmentation. Methods - A DL framework was developed consisting of a classification model and an out-of-distribution (OOD… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 26 pages, 7 figures

  35. arXiv:2203.08242  [pdf, other

    cs.CL cs.LG

    Data Contamination: From Memorization to Exploitation

    Authors: Inbal Magar, Roy Schwartz

    Abstract: Pretrained language models are typically trained on massive web-based datasets, which are often "contaminated" with downstream test sets. It is not clear to what extent models exploit the contaminated data for downstream tasks. We present a principled method to study this question. We pretrain BERT models on joint corpora of Wikipedia and labeled downstream datasets, and fine-tune them on the rele… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

    Comments: Accepted to ACL 2022

  36. arXiv:2112.12297  [pdf

    cs.LG physics.optics

    High Throughput Multi-Channel Parallelized Diffraction Convolutional Neural Network Accelerator

    Authors: Zibo Hu, Shurui Li, Russell L. T. Schwartz, Maria Solyanik-Gorgone, Mario Miscuglio, Puneet Gupta, Volker J. Sorger

    Abstract: Convolutional neural networks are paramount in image and signal processing including the relevant classification and training tasks alike and constitute for the majority of machine learning compute demand today. With convolution operations being computationally intensive, next generation hardware accelerators need to offer parallelization and algorithmic-hardware homomorphism. Fortunately, diffrac… ▽ More

    Submitted 7 July, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: 13 pages, 4 figures

  37. arXiv:2110.02488  [pdf, other

    cs.CL

    ABC: Attention with Bounded-memory Control

    Authors: Hao Peng, Jungo Kasai, Nikolaos Pappas, Dani Yogatama, Zhaofeng Wu, Lingpeng Kong, Roy Schwartz, Noah A. Smith

    Abstract: Transformer architectures have achieved state-of-the-art results on a variety of sequence modeling tasks. However, their attention mechanism comes with a quadratic complexity in sequence lengths, making the computational overhead prohibitive, especially for long sequences. Attention context can be seen as a random-access memory with each token taking a slot. Under this perspective, the memory size… ▽ More

    Submitted 1 June, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

  38. arXiv:2110.00613  [pdf, other

    cs.CL

    Expected Validation Performance and Estimation of a Random Variable's Maximum

    Authors: Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith

    Abstract: Research in NLP is often supported by experimental results, and improved reporting of such results can lead to better understanding and more reproducible science. In this paper we analyze three statistical estimators for expected validation performance, a tool used for reporting performance (e.g., accuracy) as a function of computational budget (e.g., number of hyperparameter tuning experiments).… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

  39. arXiv:2109.02040  [pdf, other

    cs.CL cs.CV cs.LG

    Data Efficient Masked Language Modeling for Vision and Language

    Authors: Yonatan Bitton, Gabriel Stanovsky, Michael Elhadad, Roy Schwartz

    Abstract: Masked language modeling (MLM) is one of the key sub-tasks in vision-language pretraining. In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text. In this paper, we observe several key disadvantages of MLM in this setting. First, as captions tend to be short, in a third of the sentences no token is sampled. Sec… ▽ More

    Submitted 5 September, 2021; originally announced September 2021.

    Comments: Accepted to Findings of EMNLP 2021

  40. arXiv:2107.05605  [pdf, other

    cs.CV cs.LG

    Interpretable Mammographic Image Classification using Case-Based Reasoning and Deep Learning

    Authors: Alina Jade Barnett, Fides Regina Schwartz, Chaofan Tao, Chaofan Chen, Yinhao Ren, Joseph Y. Lo, Cynthia Rudin

    Abstract: When we deploy machine learning models in high-stakes medical settings, we must ensure these models make accurate predictions that are consistent with known medical science. Inherently interpretable networks address this need by explaining the rationale behind each decision while maintaining equal or higher accuracy compared to black-box models. In this work, we present a novel interpretable neura… ▽ More

    Submitted 4 October, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: 10 pages, 6 figures, accepted for oral presentation at the IJCAI-21 Workshop on Deep Learning, Case-Based Reasoning, and AutoML: Present and Future Synergies. arXiv admin note: substantial text overlap with arXiv:2103.12308

    ACM Class: I.2.6; I.4.9; I.2.10

  41. arXiv:2106.05939  [pdf, other

    cs.DS

    Graph Balancing with Orientation Costs

    Authors: Roy Schwartz, Ran Yeheskel

    Abstract: Motivated by the classic Generalized Assignment Problem, we consider the Graph Balancing problem in the presence of orientation costs: given an undirected multi-graph G = (V,E) equipped with edge weights and orientation costs on the edges, the goal is to find an orientation of the edges that minimizes both the maximum weight of edges oriented toward any vertex (makespan) and total orientation cost… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Journal ref: ESA 2019: 82:1-82:15

  42. arXiv:2105.07238  [pdf

    cs.IT

    Using Ethnographic Methods to Classify the Human Experience in Medicine: A Case Study of the Presence Ontology

    Authors: Amrapali Maitra, Maulik R. Kamdar, Donna M. Zulman, Marie C. Haverfield, Cati Brown-Johnson, Rachel Schwartz, Sonoo Thadaney Israni, Abraham Verghese, Mark A. Musen

    Abstract: Objective Although social and environmental factors are central to provider patient interactions, the data that reflect these factors can be incomplete, vague, and subjective. We sought to create a conceptual framework to describe and classify data about presence, the domain of interpersonal connection in medicine. Methods Our top down approach for ontology development based on the concept of re… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

    Comments: 15 pages, 4 figures, 57 references

  43. Face Attributes as Cues for Deep Face Recognition Understanding

    Authors: Matheus Alves Diniz, William Robson Schwartz

    Abstract: Deeply learned representations are the state-of-the-art descriptors for face recognition methods. These representations encode latent features that are difficult to explain, compromising the confidence and interpretability of their predictions. Most attempts to explain deep features are visualization techniques that are often open to interpretation. Instead of relying only on visualizations, we us… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: 7 pages, 5 figures, published at automatic face and gesture recognition 2020

  44. Open-set Face Recognition for Small Galleries Using Siamese Networks

    Authors: Gabriel Salomon, Alceu Britto, Rafael H. Vareto, William R. Schwartz, David Menotti

    Abstract: Face recognition has been one of the most relevant and explored fields of Biometrics. In real-world applications, face recognition methods usually must deal with scenarios where not all probe individuals were seen during the training phase (open-set scenarios). Therefore, open-set face recognition is a subject of increasing interest as it deals with identifying individuals in a space where not all… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Journal ref: 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 2020, pp. 161-166

  45. arXiv:2105.01138  [pdf, other

    cs.DS

    Fault Tolerant Max-Cut

    Authors: Keren Censor-Hillel, Noa Marelly, Roy Schwartz, Tigran Tonoyan

    Abstract: In this work, we initiate the study of fault tolerant Max Cut, where given an edge-weighted undirected graph $G=(V,E)$, the goal is to find a cut $S\subseteq V$ that maximizes the total weight of edges that cross $S$ even after an adversary removes $k$ vertices from $G$. We consider two types of adversaries: an adaptive adversary that sees the outcome of the random coin tosses used by the algorith… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Comments: 29 pages, 4 figures, conference: ICALP '21

  46. The Metric Relaxation for $0$-Extension Admits an $Ω(\log^{2/3}{k})$ Gap

    Authors: Roy Schwartz, Nitzan Tur

    Abstract: We consider the $0$-Extension problem, where we are given an undirected graph $\mathcal{G}=(V,E)$ equipped with non-negative edge weights $w:E\rightarrow \mathbb{R}^+$, a collection $ T=\{ t_1,\ldots,t_k\}\subseteq V$ of $k$ special vertices called terminals, and a semi-metric $D$ over $T$. The goal is to assign every non-terminal vertex to a terminal while minimizing the sum over all edges of the… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

    Comments: 27 pages, 3 figures, will appear in STOC 2021

  47. arXiv:2104.10809  [pdf, other

    cs.CL

    Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?

    Authors: William Merrill, Yoav Goldberg, Roy Schwartz, Noah A. Smith

    Abstract: Language models trained on billions of tokens have recently led to unprecedented results on many NLP tasks. This success raises the question of whether, in principle, a system can ever ``understand'' raw text without access to some form of grounding. We formally investigate the abilities of ungrounded systems to acquire meaning. Our analysis focuses on the role of ``assertions'': textual contexts… ▽ More

    Submitted 22 June, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: Updated 06/22/21 with substantive changes. Accepted at TACL; pre-MIT Press publication version

  48. arXiv:2103.12308  [pdf, other

    cs.LG cs.AI cs.CV

    IAIA-BL: A Case-based Interpretable Deep Learning Model for Classification of Mass Lesions in Digital Mammography

    Authors: Alina Jade Barnett, Fides Regina Schwartz, Chaofan Tao, Chaofan Chen, Yinhao Ren, Joseph Y. Lo, Cynthia Rudin

    Abstract: Interpretability in machine learning models is important in high-stakes decisions, such as whether to order a biopsy based on a mammographic exam. Mammography poses important challenges that are not present in other computer vision tasks: datasets are small, confounding information is present, and it can be difficult even for a radiologist to decide between watchful waiting and biopsy based on a m… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: 24 pages, 5 figures, 2 tables

    ACM Class: I.2.6; I.4.9; I.2.10

  49. arXiv:2103.09591  [pdf, other

    cs.CL cs.CV

    Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA

    Authors: Yonatan Bitton, Gabriel Stanovsky, Roy Schwartz, Michael Elhadad

    Abstract: Recent works have shown that supervised models often exploit data artifacts to achieve good test scores while their performance severely degrades on samples outside their training distribution. Contrast sets (Gardneret al., 2020) quantify this phenomenon by perturbing test samples in a minimal way such that the output label is modified. While most contrast sets were created manually, requiring int… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: Accepted to NAACL 2021

  50. arXiv:2103.02143  [pdf, other

    cs.CL

    Random Feature Attention

    Authors: Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng Kong

    Abstract: Transformers are state-of-the-art models for a variety of sequence modeling tasks. At their core is an attention function which models pairwise interactions between the inputs at every timestep. While attention is powerful, it does not scale efficiently to long sequences due to its quadratic time and space complexity in the sequence length. We propose RFA, a linear time and space attention that us… ▽ More

    Submitted 19 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: ICLR 2021