Skip to main content

Showing 1–9 of 9 results for author: Bikel, D

.
  1. arXiv:2404.01295  [pdf, other

    cs.CL cs.AI

    Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models

    Authors: Yi-Lin Tuan, Xilun Chen, Eric Michael Smith, Louis Martin, Soumya Batra, Asli Celikyilmaz, William Yang Wang, Daniel M. Bikel

    Abstract: As large language models (LLMs) become easily accessible nowadays, the trade-off between safety and helpfulness can significantly impact user experience. A model that prioritizes safety will cause users to feel less engaged and assisted while prioritizing helpfulness will potentially cause harm. Possible harms include teaching people how to build a bomb, exposing youth to inappropriate content, an… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  2. arXiv:2311.06513  [pdf, other

    cs.CL cs.AI

    Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems

    Authors: Hsuan Su, Rebecca Qian, Chinnadhurai Sankar, Shahin Shayandeh, Shang-Tse Chen, Hung-yi Lee, Daniel M. Bikel

    Abstract: Recent works have shown considerable improvements in task-oriented dialogue (TOD) systems by utilizing pretrained large language models (LLMs) in an end-to-end manner. However, the biased behavior of each component in a TOD system and the error propagation issue in the end-to-end framework can lead to seriously biased TOD responses. Existing works of fairness only focus on the total bias of a syst… ▽ More

    Submitted 14 November, 2023; v1 submitted 11 November, 2023; originally announced November 2023.

  3. arXiv:2311.02772  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

    Authors: Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel

    Abstract: In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing convolutional modules with self-attention modules. They achieve state-of-the-art performance on ASR with top efficiency. We first show that employing these speech tr… ▽ More

    Submitted 8 February, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: 5 pages; accepted to Self-supervision in Audio, Speech and Beyond (SASB) workshop in ICASSP24

  4. arXiv:2307.09288  [pdf, other

    cs.CL cs.AI

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

    Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  5. arXiv:2204.07120  [pdf, other

    cs.CL cs.IR cs.LG

    Exploring Dual Encoder Architectures for Question Answering

    Authors: Zhe Dong, Jianmo Ni, Daniel M. Bikel, Enrique Alfonseca, Yuan Wang, Chen Qu, Imed Zitouni

    Abstract: Dual encoders have been used for question-answering (QA) and information retrieval (IR) tasks with good results. Previous research focuses on two major types of dual encoders, Siamese Dual Encoder (SDE), with parameters shared across two encoders, and Asymmetric Dual Encoder (ADE), with two distinctly parameterized encoders. In this work, we explore different ways in which the dual encoder can be… ▽ More

    Submitted 15 November, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

    Comments: Published in EMNLP 2022

  6. arXiv:2106.07352  [pdf, other

    cs.IR cs.CL cs.LG cs.SI

    MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network

    Authors: Nicholas FitzGerald, Jan A. Botha, Daniel Gillick, Daniel M. Bikel, Tom Kwiatkowski, Andrew McCallum

    Abstract: We present an instance-based nearest neighbor approach to entity linking. In contrast to most prior entity retrieval systems which represent each entity with a single vector, we build a contextualized mention-encoder that learns to place similar mentions of the same entity closer in vector space than mentions of different entities. This approach allows all mentions of an entity to serve as "class… ▽ More

    Submitted 22 July, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: Accepted to ACL 2021, edit to add missing Turkish results in Tables 2 and 7

  7. arXiv:2004.03555  [pdf, other

    cs.CL

    Entity Linking via Dual and Cross-Attention Encoders

    Authors: Oshin Agarwal, Daniel M. Bikel

    Abstract: Entity Linking has two main open areas of research: 1) generate candidate entities without using alias tables and 2) generate more contextual representations for both mentions and entities. Recently, a solution has been proposed for the former as a dual-encoder entity retrieval system (Gillick et al., 2019) that learns mention and entity representations in the same space, and performs linking by s… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

  8. arXiv:1210.8440  [pdf, other

    cs.CL

    Large Scale Language Modeling in Automatic Speech Recognition

    Authors: Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar

    Abstract: Large language models have been proven quite beneficial for a variety of automatic speech recognition tasks in Google. We summarize results on Voice Search and a few YouTube speech transcription tasks to highlight the impact that one can expect from increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amou… ▽ More

    Submitted 31 October, 2012; originally announced October 2012.

  9. arXiv:cmp-lg/9803003  [pdf, ps

    cs.CL

    Nymble: a High-Performance Learning Name-finder

    Authors: Daniel M. Bikel, Scott Miller, Richard Schwartz, Ralph Weischedel

    Abstract: This paper presents a statistical, learned approach to finding names and other non-recursive entities in text (as per the MUC-6 definition of the NE task), using a variant of the standard hidden Markov model. We present our justification for the problem and our approach, a detailed discussion of the model itself and finally the successful results of this new approach.

    Submitted 27 March, 1998; originally announced March 1998.

    Comments: Postscript only, 8 pages

    Journal ref: Proceedings of the Fifth Conference on Applied Natural Language Processing, 1997, pp. 194-201