Skip to main content

Showing 1–50 of 415 results for author: Roth, D

.
  1. arXiv:2406.19237  [pdf, other

    cs.CL cs.CV cs.IR cs.LG

    FlowVQA: Map** Multimodal Logic in Visual Question Answering with Flowcharts

    Authors: Shubhankar Singh, Purvi Chaurasia, Yerram Varun, Pranshu Pandya, Vatsal Gupta, Vivek Gupta, Dan Roth

    Abstract: Existing benchmarks for visual question answering lack in visual grounding and complexity, particularly in evaluating spatial reasoning skills. We introduce FlowVQA, a novel benchmark aimed at assessing the capabilities of visual question-answering multimodal language models in reasoning with flowcharts as visual contexts. FlowVQA comprises 2,272 carefully generated and human-verified flowchart im… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted in ACL 2024 (Findings), 21 pages, 7 figures, 9 Tables

  2. arXiv:2406.11243  [pdf, other

    cs.CL cs.AI

    FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation

    Authors: Bangzheng Li, Ben Zhou, Xingyu Fu, Fei Wang, Dan Roth, Muhao Chen

    Abstract: Language models have shown impressive in-context-learning capabilities, which allow them to benefit from input prompts and perform better on downstream end tasks. Existing works investigate the mechanisms behind this observation, and propose label-agnostic prompt metrics that can better estimate end-task performances. One popular approach is using perplexity as a way to measure models' familiarity… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.11050  [pdf, other

    cs.CL cs.AI

    A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

    Authors: Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth

    Abstract: This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their token bias in solving logical reasoning tasks. Specifically, we develop carefully controlled synthetic datasets, featuring conjunction fallacy and syll… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Codes are open-sourced at https://github.com/bowen-upenn/llm_token_bias

  4. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: typos corrected, references added, Project Page: https://muirbench.github.io/

  5. arXiv:2406.09403  [pdf, other

    cs.CV cs.CL

    Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

    Authors: Yushi Hu, Weijia Shi, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna

    Abstract: Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 26 pages

  6. arXiv:2406.07546  [pdf, other

    cs.CV cs.AI cs.CL

    Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

    Authors: Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth

    Abstract: We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that fit commonsense in real life, which we call Commonsense-T2I. Given two adversarial text prompts containing an identical set of action words with minor differences, such as "a lightbulb without electricity" v.s. "a lightbulb with electricity", we evaluate whether T2I model… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Text-to-Image Generation, Commonsense, Project Url: https://zeyofu.github.io/CommonsenseT2I/

  7. arXiv:2405.18638  [pdf, other

    cs.CL cs.AI

    ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models

    Authors: Aparna Elangovan, Ling Liu, Lei Xu, Sravan Bodapati, Dan Roth

    Abstract: In this position paper, we argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking that draws upon insights from disciplines such as user experience research and human behavioral psychology to ensure that the experimental design and results are reliable. The conclusions from these evaluations, thus, must consider factors such as usability, a… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted in ACL 2024

  8. arXiv:2405.16334  [pdf, other

    cs.AI

    Devil's Advocate: Anticipatory Reflection for LLM Agents

    Authors: Haoyu Wang, Tao Li, Zhiwei Deng, Dan Roth, Yang Li

    Abstract: In this work, we introduce a novel approach that equips LLM agents with introspection, enhancing consistency and adaptability in solving complex tasks. Our approach prompts LLM agents to decompose a given task into manageable subtasks (i.e., to make a plan), and to continuously introspect upon the suitability and results of their actions. %; and when necessary, to explore ``the road not taken.'' W… ▽ More

    Submitted 20 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 13 pages, 6 figures

  9. arXiv:2404.12494  [pdf, other

    cs.CL

    BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models

    Authors: Yu Feng, Ben Zhou, Weidong Lin, Dan Roth

    Abstract: Large language models primarily rely on inductive reasoning for decision making. This results in unreliable decisions when applied to real-world tasks that often present incomplete contexts and conditions. Thus, accurate probability estimation and appropriate interpretations are required to enhance decision-making reliability. In this paper, we propose a Bayesian inference framework called BIRD fo… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  10. arXiv:2404.12390  [pdf, other

    cs.CV cs.AI cs.CL

    BLINK: Multimodal Large Language Models Can See but Not Perceive

    Authors: Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna

    Abstract: We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations. Most of the Blink tasks can be solved by humans "within a blink" (e.g., relative depth estimation, visual correspondence, forensics detection, and multi-view reasoning). However, we find these perception-demanding tasks cast significant challeng… ▽ More

    Submitted 3 July, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Multimodal Benchmark, Project Url: https://zeyofu.github.io/blink/, ECCV 2024

  11. arXiv:2404.10830  [pdf, other

    cs.CL cs.AI cs.LG

    Fewer Truncations Improve Language Modeling

    Authors: Hantian Ding, Zijian Wang, Giovanni Paolini, Varun Kumar, Anoop Deoras, Dan Roth, Stefano Soatto

    Abstract: In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity -- it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and… ▽ More

    Submitted 2 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: ICML 2024

  12. arXiv:2404.09889  [pdf, other

    cs.IR cs.AI cs.CL

    Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval

    Authors: Peter Baile Chen, Yi Zhang, Dan Roth

    Abstract: Retrieving relevant tables containing the necessary information to accurately answer a given question over tables is critical to open-domain question-answering (QA) systems. Previous methods assume the answer to such a question can be found either in a single table or multiple tables identified through question decomposition or rewriting. However, neither of these approaches is sufficient, as many… ▽ More

    Submitted 5 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: ACL 2024 camera ready

  13. arXiv:2404.00205  [pdf, other

    cs.CL

    Conceptual and Unbiased Reasoning in Language Models

    Authors: Ben Zhou, Hongming Zhang, Sihao Chen, Dian Yu, Hongwei Wang, Baolin Peng, Dan Roth, Dong Yu

    Abstract: Conceptual reasoning, the ability to reason in abstract and high-level perspectives, is key to generalization in human cognition. However, limited study has been done on large language models' capability to perform conceptual reasoning. In this work, we bridge this gap and propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions and gener… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: Preprint under review

  14. arXiv:2403.16400  [pdf, other

    cs.CV cs.RO

    ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation

    Authors: Hannah Schieber, Shiyu Li, Niklas Corell, Philipp Beckerle, Julian Kreimeier, Daniel Roth

    Abstract: In medical and industrial domains, providing guidance for assembly processes is critical to ensure efficiency and safety. Errors in assembly can lead to significant consequences such as extended surgery times, and prolonged manufacturing or maintenance times in industry. Assembly scenarios can benefit from in-situ AR visualization to provide guidance, reduce assembly times and minimize errors. To… ▽ More

    Submitted 11 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  15. arXiv:2403.14783  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MA

    Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering

    Authors: Bowen Jiang, Zhijun Zhuang, Shreyas S. Shivakumar, Dan Roth, Camillo J. Taylor

    Abstract: This work explores the zero-shot capabilities of foundation models in Visual Question Answering (VQA) tasks. We propose an adaptive multi-agent system, named Multi-Agent VQA, to overcome the limitations of foundation models in object detection and counting by using specialized agents as tools. Unlike existing approaches, our study focuses on the system's performance without fine-tuning it on speci… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: A full version of the paper will be released soon. The codes are available at https://github.com/bowen-upenn/Multi-Agent-VQA

  16. arXiv:2403.06326  [pdf, other

    cs.CL cs.AI cs.LG

    From Instructions to Constraints: Language Model Alignment with Automatic Constraint Verification

    Authors: Fei Wang, Chao Shang, Sarthak Jain, Shuai Wang, Qiang Ning, Bonan Min, Vittorio Castelli, Yassine Benajiba, Dan Roth

    Abstract: User alignment is crucial for adapting general-purpose language models (LMs) to downstream tasks, but human annotations are often not available for all types of instructions, especially those with customized constraints. We observe that user instructions typically contain constraints. While assessing response quality in terms of the whole instruction is often costly, efficiently evaluating the sat… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  17. arXiv:2402.11194  [pdf, other

    cs.CL

    Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering

    Authors: Pragya Srivastava, Manuj Malik, Vivek Gupta, Tanuja Ganu, Dan Roth

    Abstract: Large Language Models (LLMs), excel in natural language understanding, but their capability for complex mathematical reasoning with an amalgamation of structured tables and unstructured text is uncertain. This study explores LLMs' mathematical reasoning on four financial tabular question-answering datasets: TATQA, FinQA, ConvFinQA, and Multihiertt. Through extensive experiments with various models… ▽ More

    Submitted 29 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: 25 pages, 17 figures

  18. arXiv:2402.07677  [pdf, other

    cs.CV

    GBOT: Graph-Based 3D Object Tracking for Augmented Reality-Assisted Assembly Guidance

    Authors: Shiyu Li, Hannah Schieber, Niklas Corell, Bernhard Egger, Julian Kreimeier, Daniel Roth

    Abstract: Guidance for assemblable parts is a promising field for augmented reality. Augmented reality assembly guidance requires 6D object poses of target objects in real time. Especially in time-critical medical or industrial settings, continuous and markerless tracking of individual parts is essential to visualize instructions superimposed on or next to the target object parts. In this regard, occlusions… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 9 pages

  19. arXiv:2402.06147  [pdf, other

    cs.AI cs.CL

    DeAL: Decoding-time Alignment for Large Language Models

    Authors: James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-an Lai, Arshit Gupta, Nikolaos Pappas, Saab Mansour, Katrin Kirchhoff, Dan Roth

    Abstract: Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom r… ▽ More

    Submitted 20 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: The appendix contains data that is offensive / disturbing in nature

  20. arXiv:2402.01935  [pdf, other

    cs.CL

    Code Representation Learning At Scale

    Authors: Dejiao Zhang, Wasi Ahmad, Ming Tan, Hantian Ding, Ramesh Nallapati, Dan Roth, Xiaofei Ma, Bing Xiang

    Abstract: Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred million parameter scale using very limited pretraining corpora. In this work, we fuel code representation learning with a vast amount of code data via a two-st… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 10 pages

    Journal ref: ICLR 2024

  21. arXiv:2401.11437  [pdf, other

    cs.LG cs.RO

    Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning

    Authors: Ge Li, Hongyi Zhou, Dominik Roth, Serge Thilges, Fabian Otto, Rudolf Lioutikov, Gerhard Neumann

    Abstract: Current advancements in reinforcement learning (RL) have predominantly focused on learning step-based policies that generate actions for each perceived state. While these methods efficiently leverage step information from environmental interaction, they often ignore the temporal correlation between actions, resulting in inefficient exploration and unsmooth trajectories that are challenging to impl… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: Codebase, see: https://github.com/BruceGeLi/TCE_RL

  22. arXiv:2311.09702  [pdf, other

    cs.CL cs.AI

    Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?

    Authors: Bangzheng Li, Ben Zhou, Fei Wang, Xingyu Fu, Dan Roth, Muhao Chen

    Abstract: Despite the recent advancement in large language models (LLMs) and their high performances across numerous benchmarks, recent research has unveiled that LLMs suffer from hallucinations and unfaithful reasoning. This work studies a specific type of hallucination induced by semantic associations. Specifically, we investigate to what extent LLMs take shortcuts from certain keyword/entity biases in th… ▽ More

    Submitted 5 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Work accepted by NAACL 2024

  23. arXiv:2311.09558  [pdf, other

    cs.CL

    What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception

    Authors: Chaitanya Malaviya, Subin Lee, Dan Roth, Mark Yatskar

    Abstract: Eliciting feedback from end users of NLP models can be beneficial for improving models. However, how should we present model responses to users so they are most amenable to be corrected from user feedback? Further, what properties do users value to understand and trust responses? We answer these questions by analyzing the effect of rationales (or explanations) generated by QA models to support the… ▽ More

    Submitted 1 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024. Code & data available at https://github.com/chaitanyamalaviya/rationale_formats

  24. arXiv:2311.08669  [pdf, other

    cs.CL cs.LG

    On the Calibration of Multilingual Question Answering LLMs

    Authors: Yahan Yang, Soham Dan, Dan Roth, Insup Lee

    Abstract: Multilingual pre-trained Large Language Models (LLMs) are incredibly effective at Question Answering (QA), a core task in Natural Language Understanding, achieving high accuracies on several multilingual benchmarks. However, little is known about how well their confidences are calibrated. In this paper, we comprehensively benchmark the calibration of several multilingual LLMs (MLLMs) on a variety… ▽ More

    Submitted 15 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Preprint. Under Submission

  25. arXiv:2311.08662  [pdf, other

    cs.CL cs.AI cs.IR

    Multi-Set Inoculation: Assessing Model Robustness Across Multiple Challenge Sets

    Authors: Vatsal Gupta, Pranshu Pandya, Tushar Kataria, Vivek Gupta, Dan Roth

    Abstract: Language models, given their black-box nature, often exhibit sensitivity to input perturbations, leading to trust issues due to hallucinations. To bolster trust, it's essential to understand these models' failure modes and devise strategies to enhance their performance. In this study, we propose a framework to study the effect of input perturbations on language models of different scales, from pre… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 13 pages, 2 Figure, 12 Tables

  26. arXiv:2311.04335  [pdf, other

    cs.CL cs.AI

    Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations

    Authors: Sihao Chen, Hongming Zhang, Tong Chen, Ben Zhou, Wenhao Yu, Dian Yu, Baolin Peng, Hongwei Wang, Dan Roth, Dong Yu

    Abstract: We introduce sub-sentence encoder, a contrastively-learned contextual embedding model for fine-grained semantic representation of text. In contrast to the standard practice with sentence embeddings, where the meaning of an entire sequence of text is encoded into a fixed-length vector, the sub-sentence encoder learns to produce distinct contextual embeddings corresponding to different atomic propos… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  27. arXiv:2310.12516  [pdf, other

    cs.CL cs.AI cs.LG

    ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks

    Authors: Xiaodong Yu, Hao Cheng, Xiaodong Liu, Dan Roth, Jianfeng Gao

    Abstract: Despite remarkable advancements in mitigating hallucinations in large language models (LLMs) by retrieval augmentation, it remains challenging to measure the reliability of LLMs using static question-answering (QA) data. Specifically, given the potential of data contamination (e.g., leading to memorization), good static benchmark performance does not ensure that model can reliably use the provided… ▽ More

    Submitted 31 May, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: NAACL 2024 Findings

  28. arXiv:2310.11248  [pdf, other

    cs.LG cs.CL cs.SE

    CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion

    Authors: Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang

    Abstract: Code completion models have made significant progress in recent years, yet current popular evaluation datasets, such as HumanEval and MBPP, predominantly focus on code completion tasks within a single file. This over-simplified setting falls short of representing the real-world software development scenario where repositories span multiple files with numerous cross-file dependencies, and accessing… ▽ More

    Submitted 16 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: To appear at NeurIPS 2023 (Datasets and Benchmarks Track)

  29. arXiv:2310.00074  [pdf, other

    cs.CL cs.AI

    SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation

    Authors: Hangfeng He, Hongming Zhang, Dan Roth

    Abstract: To comprehensively gauge the capacity of current models for complex reasoning, it is crucial to assess their step-by-step reasoning in a scalable manner. Established reference-based evaluation metrics rely on human-annotated reasoning chains as references to assess the model-derived chains. However, such "gold-standard" human-written reasoning chains may not be unique and their acquisition is ofte… ▽ More

    Submitted 18 April, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

  30. arXiv:2309.08927  [pdf, other

    cs.CV

    DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields

    Authors: Nicolas Schischka, Hannah Schieber, Mert Asim Karaoglu, Melih Görgülü, Florian Grötzner, Alexander Ladikos, Daniel Roth, Nassir Navab, Benjamin Busam

    Abstract: The accurate reconstruction of dynamic scenes with neural radiance fields is significantly dependent on the estimation of camera poses. Widely used structure-from-motion pipelines encounter difficulties in accurately tracking the camera trajectory when faced with separate dynamics of the scene content and the camera movement. To address this challenge, we propose DynaMoN. DynaMoN utilizes semantic… ▽ More

    Submitted 17 March, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: 6 pages, 4 figures

  31. arXiv:2309.07852  [pdf, other

    cs.CL cs.AI

    ExpertQA: Expert-Curated Questions and Attributed Answers

    Authors: Chaitanya Malaviya, Subin Lee, Sihao Chen, Elizabeth Sieber, Mark Yatskar, Dan Roth

    Abstract: As language models are adopted by a more sophisticated and diverse set of users, the importance of guaranteeing that they provide factually correct information supported by verifiable sources is critical across fields of study. This is especially the case for high-stakes fields, such as medicine and law, where the risk of propagating false information is high and can lead to undesirable societal c… ▽ More

    Submitted 1 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to NAACL 2024. Dataset & code is available at https://github.com/chaitanyamalaviya/expertqa

  32. arXiv:2309.04516  [pdf, ps, other

    eess.AS cs.LG cs.SD

    End-to-End Speech Recognition and Disfluency Removal with Acoustic Language Model Pretraining

    Authors: Saksham Bassi, Giulio Duregon, Siddhartha Jalagam, David Roth

    Abstract: The SOTA in transcription of disfluent and conversational speech has in recent years favored two-stage models, with separate transcription and cleaning stages. We believe that previous attempts at end-to-end disfluency removal have fallen short because of the representational advantage that large-scale language model pretraining has given to lexical models. Until recently, the high dimensionality… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  33. arXiv:2309.04433  [pdf, other

    cs.LG cs.AI

    Variations and Relaxations of Normalizing Flows

    Authors: Keegan Kelly, Lorena Piedras, Sukrit Rao, David Roth

    Abstract: Normalizing Flows (NFs) describe a class of models that express a complex target distribution as the composition of a series of bijective transformations over a simpler base distribution. By limiting the space of candidate transformations to diffeomorphisms, NFs enjoy efficient, exact sampling and density evaluation, enabling NFs to flexibly behave as both discriminative and generative models. The… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  34. A search for rare $B \rightarrow D μ^+ μ^-$ decays

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, B. Adeva, M. Adinolfi, P. Adlarson, H. Afsharnia, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, A. Alfonso Albero, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey , et al. (1038 additional authors not shown)

    Abstract: A search for rare $B \rightarrow D μ^+ μ^-$ decays is performed using proton-proton collision data collected by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. No significant signals are observed in the non-resonant $μ^+μ^-$ modes, and upper limits of $\mathcal{B}(B^0 \rightarrow \overline{D}^0 μ^+ μ^-) < 5.1 \times 10^{-8}$,… ▽ More

    Submitted 26 February, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-048.html (LHCb public pages)

    Report number: LHCb-PAPER-2022-048, CERN-EP-2023-121

    Journal ref: J. High Energ. Phys. 2024, 32 (2024)

  35. arXiv:2308.05317  [pdf, other

    cs.CL

    Few-Shot Data-to-Text Generation via Unified Representation and Multi-Source Learning

    Authors: Alexander Hanbo Li, Mingyue Shang, Evangelia Spiliopoulou, Jie Ma, Patrick Ng, Zhiguo Wang, Bonan Min, William Wang, Kathleen McKeown, Vittorio Castelli, Dan Roth, Bing Xiang

    Abstract: We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods that primarily focus on specific types of structured data. Our proposed method aims to improve performance in multi-task training, zero-shot and few-shot scenarios by providing a unified representation that can handle various forms of structured data such as tables, knowledge graph… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  36. arXiv:2308.04756  [pdf, other

    cs.CL cs.IR

    Building Interpretable and Reliable Open Information Retriever for New Domains Overnight

    Authors: Xiaodong Yu, Ben Zhou, Dan Roth

    Abstract: Information retrieval (IR) or knowledge retrieval, is a critical component for many down-stream tasks such as open-domain question answering (QA). It is also very challenging, as it requires succinctness, completeness, and correctness. In recent works, dense retrieval models have achieved state-of-the-art (SOTA) performance on in-domain IR and QA benchmarks by representing queries and knowledge pa… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: Submission of ACL 2023. Rejected

  37. arXiv:2307.03886  [pdf, other

    cs.LG stat.ML

    On Regularization and Inference with Label Constraints

    Authors: Kaifu Wang, Hangfeng He, Tin D. Nguyen, Piyush Kumar, Dan Roth

    Abstract: Prior knowledge and symbolic rules in machine learning are often expressed in the form of label constraints, especially in structured prediction problems. In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference, by quantifying their impact on model performance. For regularization, we sho… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  38. arXiv:2307.00171  [pdf, other

    cs.AI cs.CL cs.LG

    The Integer Linear Programming Inference Cookbook

    Authors: Vivek Srikumar, Dan Roth

    Abstract: Over the years, integer linear programs have been employed to model inference in many natural language processing problems. This survey is meant to guide the reader through the process of framing a new inference problem as an instance of an integer linear program and is structured as a collection of recipes. At the end, we will see two worked examples to illustrate the use of these recipes.

    Submitted 30 June, 2023; originally announced July 2023.

  39. arXiv:2306.17290  [pdf, other

    cs.CL

    Towards Open-Domain Topic Classification

    Authors: Hantian Ding, **rui Yang, Yuqian Deng, Hongming Zhang, Dan Roth

    Abstract: We introduce an open-domain topic classification system that accepts user-defined taxonomy in real time. Users will be able to classify a text snippet with respect to any candidate labels they want, and get instant response from our web interface. To obtain such flexibility, we build the backend model in a zero-shot way. By training on a new dataset constructed from Wikipedia, our label-aware text… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: Accepted by NAACL 2022 System Demonstrations

  40. arXiv:2306.13986  [pdf, other

    cs.CL

    Large Language Models as Sous Chefs: Revising Recipes with GPT-3

    Authors: Alyssa Hwang, Bryan Li, Zhaoyi Hou, Dan Roth

    Abstract: With their remarkably improved text generation and prompting capabilities, large language models can adapt existing written information into forms that are easier to use and understand. In our work, we focus on recipes as an example of complex, diverse, and widely used instructions. We develop a prompt grounded in the original recipe and ingredients list that breaks recipes down into simpler steps… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

  41. arXiv:2306.13796  [pdf, ps, other

    cs.LG stat.ML

    On Learning Latent Models with Multi-Instance Weak Supervision

    Authors: Kaifu Wang, Efi Tsamoura, Dan Roth

    Abstract: We consider a weakly supervised learning scenario where the supervision signal is generated by a transition function $σ$ of labels associated with multiple input instances. We formulate this problem as \emph{multi-instance Partial Label Learning (multi-instance PLL)}, which is an extension to the standard PLL problem. Our problem is met in different fields, including latent structural learning and… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  42. arXiv:2306.03203  [pdf, other

    cs.CL cs.SE

    A Static Evaluation of Code Completion by Large Language Models

    Authors: Hantian Ding, Varun Kumar, Yuchen Tian, Zijian Wang, Rob Kwiatkowski, Xiaopeng Li, Murali Krishna Ramanathan, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang

    Abstract: Large language models trained on code have shown great potential to increase productivity of software developers. Several execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. Nevertheless, it is expensive to perform the same evaluation on complex real-world projects considering the execution cost. On the contrary,… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023 industry track

  43. arXiv:2305.18842  [pdf, other

    cs.CL cs.AI cs.CV

    Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

    Authors: Xingyu Fu, Sheng Zhang, Gukyeong Kwon, Pramuditha Perera, Henghui Zhu, Yuhao Zhang, Alexander Hanbo Li, William Yang Wang, Zhiguo Wang, Vittorio Castelli, Patrick Ng, Dan Roth, Bing Xiang

    Abstract: The open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs using world knowledge. Recently, pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources. However, these methods suffer from low knowledge coverage caused by PLM bias -- the tendency to generate certa… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023 Findings

  44. arXiv:2305.17127  [pdf, other

    cs.CL

    Characterizing and Measuring Linguistic Dataset Drift

    Authors: Tyler A. Chang, Kishaloy Halder, Neha Anna John, Yogarshi Vyas, Yassine Benajiba, Miguel Ballesteros, Dan Roth

    Abstract: NLP models often degrade in performance when real world data distributions differ markedly from training data. However, existing dataset drift metrics in NLP have generally not considered specific dimensions of linguistic drift that affect model performance, and they have not been validated in their ability to predict model performance at the individual example level, where such metrics are often… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  45. Associated production of prompt $J/ψ$ and $\mathitΥ$ mesons in $pp$ collisions at $\sqrt{s}=13\,\mathrm{TeV}$

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, B. Adeva, M. Adinolfi, P. Adlarson, H. Afsharnia, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, A. Alfonso Albero, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey , et al. (1037 additional authors not shown)

    Abstract: The associated production of prompt $J/ψ$ and $\mathit{\mathitΥ}$ mesons in $pp$ collisions at a centre-of-mass energy of $\sqrt{s}=13\,\mathrm{TeV}$ is studied using LHCb data, corresponding to an integrated luminosity of $4\,\mathrm{fb}^{-1}$. The measurement is performed for $J/ψ$ ($\mathitΥ$) mesons with a transverse momentum $p_{\mathrm{T}}<10\,(30)\,\mathrm{GeV}/c$ in the rapidity range… ▽ More

    Submitted 29 August, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-047.html (LHCb public pages)

    Report number: LHCb-PAPER-2022-047, CERN-EP-2023-078

    Journal ref: J. High Energ. Phys. 2023, 93 (2023)

  46. arXiv:2305.14882  [pdf, other

    cs.CL cs.AI cs.CV

    Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering

    Authors: Xingyu Fu, Ben Zhou, Sihao Chen, Mark Yatskar, Dan Roth

    Abstract: Recent advances in multimodal large language models (LLMs) have shown extreme effectiveness in visual question answering (VQA). However, the design nature of these end-to-end models prevents them from being interpretable to humans, undermining trust and applicability in critical domains. While post-hoc rationales offer certain insight into understanding model behavior, these explanations are not g… ▽ More

    Submitted 13 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Multimodal, Visual Question Answering, Vision and Language

  47. arXiv:2305.13191  [pdf, other

    cs.CL cs.AI cs.LG

    Taxonomy Expansion for Named Entity Recognition

    Authors: Karthikeyan K, Yogarshi Vyas, Jie Ma, Giovanni Paolini, Neha Anna John, Shuai Wang, Yassine Benajiba, Vittorio Castelli, Dan Roth, Miguel Ballesteros

    Abstract: Training a Named Entity Recognition (NER) model often involves fixing a taxonomy of entity types. However, requirements evolve and we might need the NER model to recognize additional entity types. A simple approach is to re-annotate entire dataset with both existing and additional entity types and then train the model on the re-annotated dataset. However, this is an extremely laborious task. To re… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  48. arXiv:2305.12835  [pdf, other

    cs.CL cs.AI

    Open-Domain Event Graph Induction for Mitigating Framing Bias

    Authors: Siyi Liu, Hongming Zhang, Hongwei Wang, Kaiqiang Song, Dan Roth, Dong Yu

    Abstract: Researchers have proposed various information extraction (IE) techniques to convert news articles into structured knowledge for news understanding. However, none of the existing methods have explicitly addressed the issue of framing bias that is inherent in news articles. We argue that studying and identifying framing bias is a crucial step towards trustworthy event understanding. We propose a nov… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  49. arXiv:2305.11242  [pdf, other

    cs.CL

    Comparing Biases and the Impact of Multilingual Training across Multiple Languages

    Authors: Sharon Levy, Neha Anna John, Ling Liu, Yogarshi Vyas, Jie Ma, Yoshinari Fu**uma, Miguel Ballesteros, Vittorio Castelli, Dan Roth

    Abstract: Studies in bias and fairness in natural language processing have primarily examined social biases within a single language and/or across few attributes (e.g. gender, race). However, biases can manifest differently across various languages for individual attributes. As a result, it is critical to examine biases within each language and attribute. Of equal importance is to study how these biases com… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  50. arXiv:2305.10515  [pdf, other

    hep-ex physics.ins-det

    The LHCb upgrade I

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, C. Achard, T. Ackernley, B. Adeva, M. Adinolfi, P. Adlarson, H. Afsharnia, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, A. Alfonso Albero, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato , et al. (1298 additional authors not shown)

    Abstract: The LHCb upgrade represents a major change of the experiment. The detectors have been almost completely renewed to allow running at an instantaneous luminosity five times larger than that of the previous running periods. Readout of all detectors into an all-software trigger is central to the new design, facilitating the reconstruction of events at the maximum LHC interaction rate, and their select… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: All figures and tables, along with any supplementary material and additional information, are available at http://lhcbproject.web.cern.ch/lhcbproject/Publications/LHCbProjectPublic/LHCb-DP-2022-002.html (LHCb public pages)

    Report number: LHCb-DP-2022-002