Skip to main content

Showing 1–5 of 5 results for author: Hou, A B

.
  1. arXiv:2406.17186  [pdf, other

    cs.CL cs.CY

    CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation

    Authors: Abe Bohan Hou, Orion Weller, Guanghui Qin, Eugene Yang, Dawn Lawrie, Nils Holzenberger, Andrew Blair-Stanek, Benjamin Van Durme

    Abstract: Legal professionals need to write analyses that rely on citations to relevant precedents, i.e., previous case decisions. Intelligent systems assisting legal professionals in writing such documents provide great benefits but are challenging to design. Such systems need to help locate, summarize, and reason over salient precedents in order to be useful. To enable systems for such tasks, we work with… ▽ More

    Submitted 27 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2402.11638  [pdf, other

    cs.CL

    Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks

    Authors: Yichen Wang, Shangbin Feng, Abe Bohan Hou, Xiao Pu, Chao Shen, Xiaoming Liu, Yulia Tsvetkov, Tianxing He

    Abstract: The widespread use of large language models (LLMs) is increasing the demand for methods that detect machine-generated text to prevent misuse. The goal of our study is to stress test the detectors' robustness to malicious attacks under realistic scenarios. We comprehensively study the robustness of popular machine-generated text detectors under attacks from diverse categories: editing, paraphrasing… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  3. arXiv:2402.11399  [pdf, other

    cs.CL cs.CR cs.CY cs.LG

    k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text

    Authors: Abe Bohan Hou, **gyu Zhang, Yichen Wang, Daniel Khashabi, Tianxing He

    Abstract: Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SemStamp (Hou et al., 2023) applies watermark on the semantic representation of sentences and demonstrates promising robustness. SemStamp employs locality-sensitive hashing (LSH) to partition the semant… ▽ More

    Submitted 8 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 24 Findings

  4. arXiv:2310.03991  [pdf, other

    cs.CL

    SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation

    Authors: Abe Bohan Hou, **gyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi, Yulia Tsvetkov

    Abstract: Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by an LLM, and conducts sentence… ▽ More

    Submitted 22 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted to NAACL 24 Main

  5. arXiv:2208.14564  [pdf, other

    physics.geo-ph

    QuakeFlow: A Scalable Machine-learning-based Earthquake Monitoring Workflow with Cloud Computing

    Authors: Weiqiang Zhu, Alvin Brian Hou, Robert Yang, Avoy Datta, S. Mostafa Mousavi, William L. Ellsworth, Gregory C. Beroza

    Abstract: Earthquake monitoring workflows are designed to detect earthquake signals and to determine source characteristics from continuous waveform data. Recent developments in deep learning seismology have been used to improve tasks within earthquake monitoring workflows that allow the fast and accurate detection of up to orders of magnitude more small events than are present in conventional catalogs. To… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.