Skip to main content

Showing 1–12 of 12 results for author: Thai, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16264  [pdf, other

    cs.CL cs.AI

    One Thousand and One Pairs: A "novel" challenge for long-context language models

    Authors: Marzena Karpinska, Katherine Thai, Kyle Lo, Tanya Goyal, Mohit Iyyer

    Abstract: Synthetic long-context LLM benchmarks (e.g., "needle-in-the-haystack") test only surface-level retrieval capabilities, but how well can long-context LLMs retrieve, synthesize, and reason over information across book-length inputs? We address this question by creating NoCha, a dataset of 1,001 minimally different pairs of true and false claims about 67 recently-published English fictional books, wr… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: preprint, 29 pages

  2. arXiv:2404.13784  [pdf, other

    cs.CR cs.CL cs.CV

    Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images

    Authors: Ali Naseh, Katherine Thai, Mohit Iyyer, Amir Houmansadr

    Abstract: With the digital imagery landscape rapidly evolving, image stocks and AI-generated image marketplaces have become central to visual media. Traditional stock images now exist alongside innovative platforms that trade in prompts for AI-generated visuals, driven by sophisticated APIs like DALL-E 3 and Midjourney. This paper studies the possibility of employing multi-modal models with enhanced visual… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  3. arXiv:2210.14250  [pdf, other

    cs.CL

    Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

    Authors: Katherine Thai, Marzena Karpinska, Kalpesh Krishna, Bill Ray, Moira Inghilleri, John Wieting, Mohit Iyyer

    Abstract: Literary translation is a culturally significant task, but it is bottlenecked by the small number of qualified literary translators relative to the many untranslated works published around the world. Machine translation (MT) holds potential to complement the work of human translators by improving both training procedures and their overall efficiency. Literary translation is less constrained than m… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  4. arXiv:2210.13746  [pdf, other

    cs.CL

    DEMETR: Diagnosing Evaluation Metrics for Translation

    Authors: Marzena Karpinska, Nishant Raj, Katherine Thai, Yixiao Song, Ankita Gupta, Mohit Iyyer

    Abstract: While machine translation evaluation metrics based on string overlap (e.g., BLEU) have their limitations, their computations are transparent: the BLEU score assigned to a particular candidate translation can be traced back to the presence or absence of certain words. The operations of newer learned metrics (e.g., BLEURT, COMET), which leverage pretrained language models to achieve higher correlati… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: 22 pages, EMNLP 2022 (camera ready)

  5. arXiv:2204.10878  [pdf, other

    cs.CL

    ChapterBreak: A Challenge Dataset for Long-Range Language Models

    Authors: Simeng Sun, Katherine Thai, Mohit Iyyer

    Abstract: While numerous architectures for long-range language models (LRLMs) have recently been proposed, a meaningful evaluation of their discourse-level language understanding capabilities has not yet followed. To this end, we introduce ChapterBreak, a challenge dataset that provides an LRLM with a long segment from a narrative that ends at a chapter boundary and asks it to distinguish the beginning of t… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

  6. arXiv:2203.10053  [pdf, other

    cs.CL

    RELIC: Retrieving Evidence for Literary Claims

    Authors: Katherine Thai, Yapei Chang, Kalpesh Krishna, Mohit Iyyer

    Abstract: Humanities scholars commonly provide evidence for claims that they make about a work of literature (e.g., a novel) in the form of quotations from the work. We collect a large-scale dataset (RELiC) of 78K literary quotations and surrounding critical analysis and use it to formulate the novel task of literary evidence retrieval, in which models are given an excerpt of literary analysis surrounding a… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: ACL 2022 camera ready (19 pages)

  7. Maximum Covering Subtrees for Phylogenetic Networks

    Authors: Nathan Davidov, Amanda Hernandez, Justin Jian, Patrick McKenna, K. A. Medlin, Roadra Mojumder, Megan Owen, Andrew Quijano, Amanda Rodriguez, Katherine St. John, Katherine Thai, Meliza Uraga

    Abstract: Tree-based phylogenetic networks, which may be roughly defined as leaf-labeled networks built by adding arcs only between the original tree edges, have elegant properties for modeling evolutionary histories. We answer an open question of Francis, Semple, and Steel about the complexity of determining how far a phylogenetic network is from being tree-based, including non-binary phylogenetic networks… ▽ More

    Submitted 24 November, 2020; v1 submitted 25 September, 2020; originally announced September 2020.

  8. arXiv:1910.06078  [pdf, other

    cs.CY stat.ML

    MUTLA: A Large-Scale Dataset for Multimodal Teaching and Learning Analytics

    Authors: Fangli Xu, Lingfei Wu, KP Thai, Carol Hsu, Wei Wang, Richard Tong

    Abstract: Automatic analysis of teacher and student interactions could be very important to improve the quality of teaching and student engagement. However, despite some recent progress in utilizing multimodal data for teaching and learning analytics, a thorough analysis of a rich multimodal dataset coming for a complex real learning environment has yet to be done. To bridge this gap, we present a large-sca… ▽ More

    Submitted 6 December, 2022; v1 submitted 4 October, 2019; originally announced October 2019.

    Comments: 3 pages, 1 figure, 2 tables workshop paper

  9. arXiv:1910.03225  [pdf, other

    cs.LG stat.ML

    NGBoost: Natural Gradient Boosting for Probabilistic Prediction

    Authors: Tony Duan, Anand Avati, Daisy Yi Ding, Khanh K. Thai, Sanjay Basu, Andrew Y. Ng, Alejandro Schuler

    Abstract: We present Natural Gradient Boosting (NGBoost), an algorithm for generic probabilistic prediction via gradient boosting. Typical regression models return a point estimate, conditional on covariates, but probabilistic regression models output a full probability distribution over the outcome space, conditional on the covariates. This allows for predictive uncertainty estimation -- crucial in applica… ▽ More

    Submitted 9 June, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: Accepted for ICML 2020

  10. arXiv:1907.03627  [pdf, other

    cs.CR

    HyperPubSub: Blockchain based Publish/Subscribe

    Authors: Gewu Bu, Thanh Son Lam Nguyen, Maria Potop-Butucaru, Kim Thai

    Abstract: In this paper we describe the architecture and the implementation of a broker based publish/subscribe system where the broker role is played by a private blockchain, Hy-perledger Fabric. We show the effectiveness of our architecture by implementing and deploying a photo trading plateform. Interestingly, our architecture is generic enough to be adapted to any digital asset trading.

    Submitted 8 July, 2019; originally announced July 2019.

  11. arXiv:1903.08856  [pdf, other

    cs.PF cs.CR cs.NI

    Impact of network delays on Hyperledger Fabric

    Authors: Thanh Son Lam Nguyen, Guillaume Jourjon, Maria Potop-Butucaru, Kim Thai

    Abstract: Blockchain has become one of the most attractive technologies for applications, with a large range of deployments such as production, economy, or banking. Under the hood, Blockchain technology is a type of distributed database that supports untrusted parties. In this paper we focus Hyperledger Fabric, the first blockchain in the market tailored for a private environment, allowing businesses to cre… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

  12. arXiv:1901.10268  [pdf

    cs.CY cs.HC

    Performance comparison of an AI-based Adaptive Learning System in China

    Authors: Wei Cui, Zhen Xue, Khanh-Phuong Thai

    Abstract: Adaptive learning systems stand apart from traditional learning systems by offering a personalized learning experience to students according to their different knowledge states. Adaptive systems collect and analyse students' behavior data, update learner profiles, then accordingly provide timely individualized feedback to each student. Such interactions between the learning system and students can… ▽ More

    Submitted 29 January, 2019; originally announced January 2019.