Skip to main content

Showing 1–9 of 9 results for author: Siu, A

.
  1. arXiv:2405.17602  [pdf, other

    cs.IR

    Augmenting Textual Generation via Topology Aware Retrieval

    Authors: Yu Wang, Nedim Lipka, Ruiyi Zhang, Alexa Siu, Yuying Zhao, Bo Ni, Xin Wang, Ryan Rossi, Tyler Derr

    Abstract: Despite the impressive advancements of Large Language Models (LLMs) in generating text, they are often limited by the knowledge contained in the input and prone to producing inaccurate or hallucinated content. To tackle these issues, Retrieval-augmented Generation (RAG) is employed as an effective strategy to enhance the available knowledge base and anchor the responses in reality by pulling addit… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2405.01501  [pdf, other

    cs.HC

    Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models

    Authors: Raymond Fok, Nedim Lipka, Tong Sun, Alexa Siu

    Abstract: Knowledge workers often need to extract and analyze information from a collection of documents to solve complex information tasks in the workplace, e.g., hiring managers reviewing resumes or analysts assessing risk in contracts. However, foraging for relevant information can become tedious and repetitive over many documents and criteria of interest. We introduce Marco, a mixed-initiative workspace… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 20 pages, 10 figures, 4 tables. Published at CHI 2024

  3. arXiv:2403.00553  [pdf, other

    cs.CL

    Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores

    Authors: Chantal Shaib, Joe Barrow, Jiuding Sun, Alexa F. Siu, Byron C. Wallace, Ani Nenkova

    Abstract: The diversity across outputs generated by large language models shapes the perception of their quality and utility. Prompt leaks, templated answer structure, and canned responses across different interactions are readily noticed by people, but there is no standard score to measure this aspect of model behavior. In this work we empirically investigate diversity scores on English texts. We find that… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Preprint

  4. arXiv:2402.18756  [pdf, other

    cs.CL

    How Much Annotation is Needed to Compare Summarization Models?

    Authors: Chantal Shaib, Joe Barrow, Alexa F. Siu, Byron C. Wallace, Ani Nenkova

    Abstract: Modern instruction-tuned models have become highly capable in text generation tasks such as summarization, and are expected to be released at a steady pace. In practice one may now wish to choose confidently, but with minimal effort, the best performing summarization model when applied to a new domain or purpose. In this work, we empirically investigate the test sample size necessary to select a p… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Preprint

  5. arXiv:2309.08872  [pdf, other

    cs.CL cs.AI cs.LG

    PDFTriage: Question Answering over Long, Structured Documents

    Authors: Jon Saad-Falcon, Joe Barrow, Alexa Siu, Ani Nenkova, David Seunghyun Yoon, Ryan A. Rossi, Franck Dernoncourt

    Abstract: Large Language Models (LLMs) have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with dif… ▽ More

    Submitted 8 November, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

  6. TaleStream: Supporting Story Ideation with Trope Knowledge

    Authors: Jean-Peïc Chou, Alexa F. Siu, Nedim Lipka, Ryan Rossi, Franck Dernoncourt, Maneesh Agrawala

    Abstract: Story ideation is a critical part of the story-writing process. It is challenging to support computationally due to its exploratory and subjective nature. Tropes, which are recurring narrative elements across stories, are essential in stories as they shape the structure of narratives and our understanding of them. In this paper, we propose to use tropes as an intermediate representation of stories… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: 12 pages, 6 figures, 3 tables

    ACM Class: D.2.2; H.1.2; H.5.2

  7. arXiv:2308.11730  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Knowledge Graph Prompting for Multi-Document Question Answering

    Authors: Yu Wang, Nedim Lipka, Ryan A. Rossi, Alexa Siu, Ruiyi Zhang, Tyler Derr

    Abstract: The `pre-train, prompt, predict' paradigm of large language models (LLMs) has achieved remarkable success in open-domain question answering (OD-QA). However, few works explore this paradigm in the scenario of multi-document question answering (MD-QA), a task demanding a thorough understanding of the logical associations among the contents and structures of different documents. To fill this crucial… ▽ More

    Submitted 25 December, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

  8. A Graph-based Stratified Sampling Methodology for the Analysis of (Underground) Forums

    Authors: Giorgio Di Tizio, Gilberto Atondo Siu, Alice Hutchings, Fabio Massacci

    Abstract: [Context] Researchers analyze underground forums to study abuse and cybercrime activities. Due to the size of the forums and the domain expertise required to identify criminal discussions, most approaches employ supervised machine learning techniques to automatically classify the posts of interest. [Goal] Human annotation is costly. How to select samples to annotate that account for the structure… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Journal ref: IEEE Transactions on Information Forensics and Security, 2023

  9. Experts prefer text but videos help novices: an analysis of the utility of multi-media content

    Authors: Hayeong Song, Jennifer Healey, Alexa Siu, Curtis Wigington, John Stasko

    Abstract: Multi-media increases engagement and is increasingly prevalent in online content including news, web blogs, and social media, however, it may not always be beneficial to users. To determine what types of media users actually wanted, we conducted an exploratory study where users got to choose their own media augmentation. Our findings showed that users desired different amounts and types of media d… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: in CHI'23 Extended Abstracts on Human Factors in Computing Systems, 2023