Skip to main content

Showing 1–50 of 76 results for author: Lam, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00888  [pdf, other

    cs.CL cs.HC

    Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

    Authors: Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao, Michael Bernstein, Diyi Yang

    Abstract: Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number ($<10$) o… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  2. arXiv:2406.00562  [pdf, other

    cs.CL

    SPAGHETTI: Open-Domain Question Answering from Heterogeneous Data Sources with Retrieval and Semantic Parsing

    Authors: Heidi C. Zhang, Sina J. Semnani, Farhad Ghassemi, Jialiang Xu, Shicheng Liu, Monica S. Lam

    Abstract: We introduce SPAGHETTI: Semantic Parsing Augmented Generation for Hybrid English information from Text Tables and Infoboxes, a hybrid question-answering (QA) pipeline that utilizes information from heterogeneous knowledge sources, including knowledge base, text, tables, and infoboxes. Our LLM-augmented approach achieves state-of-the-art performance on the Compmix dataset, the most comprehensive he… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: ACL Findings 2024

  3. arXiv:2406.00284  [pdf, other

    cs.CL

    A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters

    Authors: Long Hei Matthew Lam, Ehsan Shareghi

    Abstract: Logical reasoning serves as a cornerstone for human cognition. Recently, the emergence of Large Language Models (LLMs) has demonstrated promising progress in solving logical reasoning tasks effectively. To improve this capability, recent studies have delved into integrating LLMs with various symbolic solvers using diverse techniques and methodologies. While some combinations excel on specific data… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: Code and data are publicly available at: https://github.com/Mattylam/Logic_Symbolic_Solvers_Experiment

  4. arXiv:2405.17840  [pdf, other

    cs.CL

    Benchmarks Underestimate the Readiness of Multi-lingual Dialogue Agents

    Authors: Andrew H. Lee, Sina J. Semnani, Galo Castillo-López, Gäel de Chalendar, Monojit Choudhury, Ashna Dua, Kapil Rajesh Kavitha, Sungkyun Kim, Prashant Kodali, Ponnurangam Kumaraguru, Alexis Lombard, Mehrad Moradshahi, Gihyun Park, Nasredine Semmar, Jiwon Seo, Tianhao Shen, Manish Shrivastava, Deyi Xiong, Monica S. Lam

    Abstract: Creating multilingual task-oriented dialogue (TOD) agents is challenging due to the high cost of training data acquisition. Following the research trend of improving training data efficiency, we show for the first time, that in-context learning is sufficient to tackle multilingual TOD. To handle the challenging dialogue state tracking (DST) subtask, we break it down to simpler steps that are mor… ▽ More

    Submitted 16 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  5. Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM

    Authors: Michelle S. Lam, Janice Teoh, James Landay, Jeffrey Heer, Michael S. Bernstein

    Abstract: Data analysts have long sought to turn unstructured text data into meaningful concepts. Though common, topic modeling and clustering focus on lower-level keywords and require significant interpretative work. We introduce concept induction, a computational process that instead produces high-level concepts, defined by explicit inclusion criteria, from unstructured text. For a dataset of toxic online… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: To appear at CHI 2024

  6. arXiv:2404.01620  [pdf

    cs.SD cs.AI cs.CY eess.AS

    Voice EHR: Introducing Multimodal Audio Data for Health

    Authors: James Anibal, Hannah Huth, Ming Li, Lindsey Hazen, Yen Minh Lam, Hang Nguyen, Phuc Hong, Michael Kleinman, Shelley Ost, Christopher Jackson, Laura Sprabery, Cheran Elangovan, Balaji Krishnaiah, Lee Akst, Ioan Lina, Iqbal Elyazar, Lenny Ekwati, Stefan Jansen, Richard Nduwayezu, Charisse Garcia, Jeffrey Plum, Jacqueline Brenner, Miranda Song, Emily Ricotta, David Clifton , et al. (3 additional authors not shown)

    Abstract: Large AI models trained on audio data may have the potential to rapidly classify patients, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets using expensive recording equipment in high-income, English-speaking countries. This challenges deployment in resource-constrained, high-volume settings where audio d… ▽ More

    Submitted 1 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 19 pages, 2 figures, 7 tables

  7. arXiv:2403.11807  [pdf, other

    cs.AI cs.CL

    How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

    Authors: Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Michael R. Lyu

    Abstract: Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a well-established field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce o… ▽ More

    Submitted 25 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 16 pages of main text. 11 pages of appendices. 15 figures, 9 tables. Updated scoring scheme

  8. arXiv:2402.14207  [pdf, other

    cs.CL cs.AI

    Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

    Authors: Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, Monica S. Lam

    Abstract: We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retriev… ▽ More

    Submitted 8 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 27 pages, NAACL 2024 Main Conference

  9. arXiv:2402.03715  [pdf, other

    cs.LG cs.AI cs.CL

    Clarify: Improving Model Robustness With Natural Language Corrections

    Authors: Yoonho Lee, Michelle S. Lam, Helena Vasconcelos, Michael S. Bernstein, Chelsea Finn

    Abstract: In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on high-level misconceptions. To prevent such misconceptions, we must necessarily provide additional information beyond the training data. Existing methods incorporate forms of additional instance-level supervision, such as labels for spurious features or additional labele… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  10. arXiv:2312.11681  [pdf, other

    cs.HC cs.AI cs.CL

    Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows

    Authors: Madeleine Grunde-McLaughlin, Michelle S. Lam, Ranjay Krishna, Daniel S. Weld, Jeffrey Heer

    Abstract: LLM chains enable complex tasks by decomposing work into a sequence of subtasks. Similarly, the more established techniques of crowdsourcing workflows decompose complex tasks into smaller tasks for human crowdworkers. Chains address LLM errors analogously to the way crowdsourcing workflows address human error. To characterize opportunities for LLM chaining, we survey 107 papers across the crowdsou… ▽ More

    Submitted 6 May, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  11. arXiv:2312.09922  [pdf, other

    cs.CV cs.AI

    A Unifying Tensor View for Lightweight CNNs

    Authors: Jason Chun Lok Li, Rui Lin, Jiajun Zhou, Edmund Yin Mun Lam, Ngai Wong

    Abstract: Despite the decomposition of convolutional kernels for lightweight CNNs being well studied, existing works that rely on tensor network diagrams or hyperdimensional abstraction lack geometry intuition. This work devises a new perspective by linking a 3D-reshaped kernel tensor to its various slice-wise and rank-1 decompositions, permitting a straightforward connection between various tensor approxim… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 4 pages, 3 figures, accepted in 2023 IEEE 15th International Conference on ASIC (ASICON 2023)

  12. arXiv:2311.09818  [pdf, other

    cs.CL cs.PL

    SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models

    Authors: Shicheng Liu, Jialiang Xu, Wesley Tjangnaka, Sina J. Semnani, Chen Jie Yu, Monica S. Lam

    Abstract: While most conversational agents are grounded on either free-text or structured knowledge, many knowledge corpora consist of hybrid sources. This paper presents the first conversational agent that supports the full generality of hybrid data access for large knowledge corpora, through a language we developed called SUQL (Structured and Unstructured Query Language). Specifically, SUQL extends SQL wi… ▽ More

    Submitted 13 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  13. arXiv:2310.08778  [pdf, other

    cs.RO eess.SY

    3D Self-Localization of Drones using a Single Millimeter-Wave Anchor

    Authors: Maisy Lam, Laura Dodds, Aline Eid, Jimmy Hester, Fadel Adib

    Abstract: We present the design, implementation, and evaluation of MiFly, a self-localization system for autonomous drones that works across indoor and outdoor environments, including low-visibility, dark, and GPS-denied settings. MiFly performs 6DoF self-localization by leveraging a single millimeter-wave (mmWave) anchor in its vicinity - even if that anchor is visually occluded. MmWave signals are used in… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  14. arXiv:2310.01386  [pdf, other

    cs.CL

    Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench

    Authors: Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

    Abstract: Large Language Models (LLMs) have recently showcased their remarkable capacities, not only in natural language processing tasks but also across diverse domains such as clinical medicine, legal consultation, and education. LLMs become more than mere applications, evolving into assistants capable of addressing diverse user requests. This narrows the distinction between human beings and artificial in… ▽ More

    Submitted 22 January, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted for ICLR 2024 Oral Presentation. 15 pages (main text) and 5 pages (appendix)

  15. arXiv:2308.15768  [pdf, other

    cs.HC cs.CY

    Sociotechnical Audits: Broadening the Algorithm Auditing Lens to Investigate Targeted Advertising

    Authors: Michelle S. Lam, Ayush Pandit, Colin H. Kalicki, Rachit Gupta, Poonam Sahoo, Danaë Metaxa

    Abstract: Algorithm audits are powerful tools for studying black-box systems. While very effective in examining technical components, the method stops short of a sociotechnical frame, which would also consider users as an integral and dynamic part of the system. Addressing this gap, we propose the concept of sociotechnical auditing: auditing methods that evaluate algorithmic systems at the sociotechnical le… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: To appear at CSCW 2023

  16. arXiv:2308.03656  [pdf, other

    cs.CL

    Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench

    Authors: Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

    Abstract: Evaluating Large Language Models' (LLMs) anthropomorphic capabilities has become increasingly important in contemporary discourse. Utilizing the emotion appraisal theory from psychology, we propose to evaluate the empathy ability of LLMs, i.e., how their feelings change when presented with specific situations. After a careful and comprehensive survey, we collect a dataset containing over 400 situa… ▽ More

    Submitted 24 April, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: 12 pages of main text; 9 pages of appendices

  17. arXiv:2307.15569  [pdf, other

    cs.CV

    Point Clouds Are Specialized Images: A Knowledge Transfer Approach for 3D Understanding

    Authors: Jiachen Kang, Wen**g Jia, Xiangjian He, Kin Man Lam

    Abstract: Self-supervised representation learning (SSRL) has gained increasing attention in point cloud understanding, in addressing the challenges posed by 3D data scarcity and high annotation costs. This paper presents PCExpert, a novel SSRL approach that reinterprets point clouds as "specialized images". This conceptual shift allows PCExpert to leverage knowledge derived from large-scale image modality i… ▽ More

    Submitted 23 April, 2024; v1 submitted 28 July, 2023; originally announced July 2023.

  18. arXiv:2307.13912  [pdf, other

    cs.HC cs.AI

    Embedding Democratic Values into Social Media AIs via Societal Objective Functions

    Authors: Chenyan Jia, Michelle S. Lam, Minh Chau Mai, Jeff Hancock, Michael S. Bernstein

    Abstract: Can we design artificial intelligence (AI) systems that rank our social media feeds to consider democratic values such as mitigating partisan animosity as part of their objective functions? We introduce a method for translating established, vetted social scientific constructs into AI objective functions, which we term societal objective functions, and demonstrate the method with application to the… ▽ More

    Submitted 14 February, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: This paper has been accepted to CSCW 2024 and will be published in Proc. ACM Hum.-Comput. Interact. 8, CSCW1, Article 163 (April 2024)

    Journal ref: Proceedings of the ACM: Human-Computer Interaction, 8, CSCW1, Article 163 (2024)

  19. arXiv:2306.17674  [pdf, other

    cs.CL

    X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents

    Authors: Mehrad Moradshahi, Tianhao Shen, Kalika Bali, Monojit Choudhury, Gaël de Chalendar, Anmol Goel, Sungkyun Kim, Prashant Kodali, Ponnurangam Kumaraguru, Nasredine Semmar, Sina J. Semnani, Jiwon Seo, Vivek Seshadri, Manish Shrivastava, Michael Sun, Aditya Yadavalli, Chaobin You, Deyi Xiong, Monica S. Lam

    Abstract: Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-H… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023 Findings

  20. ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language Models

    Authors: Jackie Junrui Yang, Yingtian Shi, Yuhan Zhang, Karina Li, Daniel Wan Rosli, Anisha Jain, Shuning Zhang, Tianshi Li, James A. Landay, Monica S. Lam

    Abstract: By combining voice and touch interactions, multimodal interfaces can surpass the efficiency of either modality alone. Traditional multimodal frameworks require laborious developer work to support rich multimodal commands where the user's multimodal command involves possibly exponential combinations of actions/function invocations. This paper presents ReactGenie, a programming framework that better… ▽ More

    Submitted 2 May, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

  21. arXiv:2305.19926  [pdf, other

    cs.CL

    Revisiting the Reliability of Psychological Scales on Large Language Models

    Authors: Jen-tse Huang, Wenxuan Wang, Man Ho Lam, Eric John Li, Wenxiang Jiao, Michael R. Lyu

    Abstract: Recent research has extended beyond assessing the performance of Large Language Models (LLMs) to examining their characteristics from a psychological standpoint, acknowledging the necessity of understanding their behavioral characteristics. The administration of personality tests to LLMs has emerged as a noteworthy area in this context. However, the suitability of employing psychological scales, i… ▽ More

    Submitted 28 December, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: 10 pages. Added more comprehensive experiments and analysis

  22. arXiv:2305.16749  [pdf, other

    cs.SD eess.AS

    Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model

    Authors: Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng

    Abstract: Expressive human speech generally abounds with rich and flexible speech prosody variations. The speech prosody predictors in existing expressive speech synthesis methods mostly produce deterministic predictions, which are learned by directly minimizing the norm of prosody prediction error. Its unimodal nature leads to a mismatch with ground truth distribution and harms the model's ability in makin… ▽ More

    Submitted 7 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Proceedings of Interspeech 2023 (doi: 10.21437/Interspeech.2023-715), demo site at https://thuhcsi.github.io/interspeech2023-DiffVar/

  23. arXiv:2305.15719  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Efficient Neural Music Generation

    Authors: Max W. Y. Lam, Qiao Tian, Tang Li, Zongyu Yin, Siyuan Feng, Ming Tu, Yuliang Ji, Rui Xia, Mingbo Ma, Xuchen Song, Jitong Chen, Yu** Wang, Yuxuan Wang

    Abstract: Recent progress in music generation has been remarkably advanced by the state-of-the-art MusicLM, which comprises a hierarchy of three LMs, respectively, for semantic, coarse acoustic, and fine acoustic modelings. Yet, sampling with the MusicLM requires processing through these LMs one by one to obtain the fine-grained acoustic tokens, making it computationally expensive and prohibitive for a real… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  24. WikiChat: Stop** the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

    Authors: Sina J. Semnani, Violet Z. Yao, Heidi C. Zhang, Monica S. Lam

    Abstract: This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia, the largest curated free-text corpus. WikiChat generates a response from an LLM, retains only the grounded facts, and combines them with additional information it retrieves from the corpus to form factual and engagi… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Findings of EMNLP 2023

  25. arXiv:2305.14202  [pdf, other

    cs.CL

    Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata

    Authors: Silei Xu, Shicheng Liu, Theo Culhane, Elizaveta Pertseva, Meng-Hsi Wu, Sina J. Semnani, Monica S. Lam

    Abstract: While large language models (LLMs) can answer many questions correctly, they can also hallucinate and give wrong answers. Wikidata, with its over 12 billion facts, can be used to ground LLMs to improve their factuality. This paper presents WikiWebQuestions, a high-quality question answering benchmark for Wikidata. Ported over from WebQuestions for Freebase, it consists of real-world data with SPAR… ▽ More

    Submitted 5 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Main

  26. arXiv:2303.02884  [pdf, other

    cs.HC cs.AI cs.LG

    Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design

    Authors: Michelle S. Lam, Zixian Ma, Anne Li, Izequiel Freitas, Dakuo Wang, James A. Landay, Michael S. Bernstein

    Abstract: Machine learning practitioners often end up tunneling on low-level technical details like model architectures and performance metrics. Could early model development instead focus on high-level questions of which factors a model ought to pay attention to? Inspired by the practice of sketching in design, which distills ideas to their minimal representation, we introduce model sketching: a technical… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

    Comments: To appear at CHI 2023

  27. arXiv:2302.12458  [pdf, other

    cs.RO eess.SY

    Design and Mechanics of Cable-Driven Rolling Diaphragm Transmission for High-Transparency Robotic Motion

    Authors: Hoi Man Lam, W. Jared Walker, Lucas Jonasch, Dimitri Schreiber, Michael C. Yip

    Abstract: Applications of rolling diaphragm transmissions for medical and teleoperated robotics are of great interest, due to the low friction of rolling diaphragms combined with the power density and stiffness of hydraulic transmissions. However, the stiffness-enabling pressure preloads can form a tradeoff against bearing loading in some rolling diaphragm layouts, and transmission setup can be difficult. U… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: 7 pages, 13 figures

  28. arXiv:2302.09424  [pdf, other

    cs.CL

    Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a Distilled Representation

    Authors: Mehrad Moradshahi, Sina J. Semnani, Monica S. Lam

    Abstract: Task-oriented Dialogue (ToD) agents are mostly limited to a few widely-spoken languages, mainly due to the high cost of acquiring training data for each language. Existing low-cost approaches that rely on cross-lingual embeddings or naive machine translation sacrifice a lot of accuracy for data efficiency, and largely fail in creating a usable dialogue agent. We propose automatic methods that use… ▽ More

    Submitted 18 February, 2023; originally announced February 2023.

    Comments: Published in EACL 2023

  29. arXiv:2301.10904  [pdf, other

    cs.CR cs.DC cs.LG

    GPU-based Private Information Retrieval for On-Device Machine Learning Inference

    Authors: Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta, Yang Li, Liangzhen Lai, Ilias Leontiadis, Minsoo Rhu, Hsien-Hsin S. Lee, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks, G. Edward Suh

    Abstract: On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the or… ▽ More

    Submitted 25 September, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

  30. arXiv:2204.09934  [pdf, other

    eess.AS cs.LG cs.SD

    FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

    Authors: Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao

    Abstract: Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of div… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Comments: Accepted by IJCAI 2022

  31. arXiv:2204.00367  [pdf, other

    eess.IV cs.CV

    Learning to Deblur using Light Field Generated and Real Defocus Images

    Authors: Lingyan Ruan, Bin Chen, Jizhou Li, Miuling Lam

    Abstract: Defocus deblurring is a challenging task due to the spatially varying nature of defocus blur. While deep learning approach shows great promise in solving image restoration problems, defocus deblurring demands accurate training data that consists of all-in-focus and defocus image pairs, which is difficult to collect. Naive two-shot capturing cannot achieve pixel-wise correspondence between the defo… ▽ More

    Submitted 1 April, 2022; originally announced April 2022.

    Comments: CVPR 2022 Oral

  32. arXiv:2203.13508  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

    Authors: Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

    Abstract: Diffusion probabilistic models (DPMs) and their extensions have emerged as competitive generative models yet confront challenges of efficient sampling. We propose a new bilateral denoising diffusion model (BDDM) that parameterizes both the forward and reverse processes with a schedule network and a score network, which can train with a novel bilateral modeling objective. We show that the new surro… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted in ICLR 2022. arXiv admin note: text overlap with arXiv:2108.11514

    Journal ref: International Conference on Learning Representations 2022

  33. arXiv:2203.12751  [pdf, other

    cs.PL cs.CL

    ThingTalk: An Extensible, Executable Representation Language for Task-Oriented Dialogues

    Authors: Monica S. Lam, Giovanni Campagna, Mehrad Moradshahi, Sina J. Semnani, Silei Xu

    Abstract: Task-oriented conversational agents rely on semantic parsers to translate natural language to formal representations. In this paper, we propose the design and rationale of the ThingTalk formal representation, and how the design improves the development of transactional task-oriented agents. ThingTalk is built on four core principles: (1) representing user requests directly as executable statemen… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: 8 pages, 3 figures

  34. arXiv:2203.02833  [pdf, other

    cs.CR cs.AI

    Tabula: Efficiently Computing Nonlinear Activation Functions for Secure Neural Network Inference

    Authors: Maximilian Lam, Michael Mitzenmacher, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks

    Abstract: Multiparty computation approaches to secure neural network inference commonly rely on garbled circuits for securely executing nonlinear activation functions. However, garbled circuits require excessive communication between server and client, impose significant storage overheads, and incur large runtime penalties. To reduce these costs, we propose an alternative to garbled circuits: Tabula, an alg… ▽ More

    Submitted 16 June, 2024; v1 submitted 5 March, 2022; originally announced March 2022.

  35. arXiv:2202.02950  [pdf, other

    cs.HC cs.AI cs.LG

    Jury Learning: Integrating Dissenting Voices into Machine Learning Models

    Authors: Mitchell L. Gordon, Michelle S. Lam, Joon Sung Park, Kayur Patel, Jeffrey T. Hancock, Tatsunori Hashimoto, Michael S. Bernstein

    Abstract: Whose labels should a machine learning (ML) algorithm learn to emulate? For ML tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about ground truth labels. Supervised ML today resolves these label disagreements implicitly using majority vote, which overrides minority groups' labels. We intr… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

    Comments: To appear at CHI 2022

  36. arXiv:2111.09344  [pdf, other

    cs.LG stat.ML

    The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage

    Authors: Daniel Galvez, Greg Diamos, Juan Ciro, Juan Felipe Cerón, Keith Achorn, Anjali Gopi, David Kanter, Maximilian Lam, Mark Mazumder, Vijay Janapa Reddi

    Abstract: The People's Speech is a free-to-download 30,000-hour and growing supervised conversational English speech recognition dataset licensed for academic and commercial usage under CC-BY-SA (with a CC-BY subset). The data is collected via searching the Internet for appropriately licensed audio data with existing transcriptions. We describe our data collection methodology and release our data collection… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: Part of 2021 Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks

  37. arXiv:2111.02574  [pdf, other

    cs.CL cs.LG

    Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues

    Authors: Mehrad Moradshahi, Victoria Tsai, Giovanni Campagna, Monica S. Lam

    Abstract: Robust state tracking for task-oriented dialogue systems currently remains restricted to a few popular languages. This paper shows that given a large-scale dialogue data set in one language, we can automatically produce an effective semantic parser for other languages using machine translation. We propose automatic translation of dialogue datasets with alignment to ensure faithful translation of s… ▽ More

    Submitted 18 February, 2023; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: Published in EACL 2023

  38. arXiv:2108.11514  [pdf, other

    cs.LG cs.AI cs.SD eess.AS eess.SP

    Bilateral Denoising Diffusion Models

    Authors: Max W. Y. Lam, Jun Wang, Rongjie Huang, Dan Su, Dong Yu

    Abstract: Denoising diffusion probabilistic models (DDPMs) have emerged as competitive generative models yet brought challenges to efficient sampling. In this paper, we propose novel bilateral denoising diffusion models (BDDMs), which take significantly fewer steps to generate high-quality samples. From a bilateral modeling objective, BDDMs parameterize the forward and reverse processes with a score network… ▽ More

    Submitted 14 September, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

  39. arXiv:2107.14427  [pdf, other

    cs.RO eess.SY

    ARCSnake: Reconfigurable Snake-Like Robot with Archimedean Screw Propulsion for Multi-Domain Mobility

    Authors: Florian Richter, Peter V. Gavrilov, Hoi Man Lam, Amir Degani, Michael C. Yip

    Abstract: Exploring and navigating in extreme environments, such as caves, oceans, and planetary bodies, are often too hazardous for humans, and as such, robots are possible surrogates. These robots are met with significant locomotion challenges that require traversing a wide range of surface roughnesses and topologies. Previous locomotion strategies, involving wheels or ambulatory motion, such as snake pla… ▽ More

    Submitted 30 July, 2021; originally announced July 2021.

    Comments: 12 pages, 17 figures

  40. arXiv:2106.06089  [pdf, other

    cs.CR cs.AI

    Gradient Disaggregation: Breaking Privacy in Federated Learning by Reconstructing the User Participant Matrix

    Authors: Maximilian Lam, Gu-Yeon Wei, David Brooks, Vijay Janapa Reddi, Michael Mitzenmacher

    Abstract: We show that aggregated model updates in federated learning may be insecure. An untrusted central server may disaggregate user updates from sums of updates across participants given repeated observations, enabling the server to recover privileged information about individual users' private training data via traditional gradient inference attacks. Our method revolves around reconstructing participa… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: ICML 2021

  41. arXiv:2106.04275  [pdf, other

    cs.SD cs.AI eess.AS eess.SP

    Raw Waveform Encoder with Multi-Scale Globally Attentive Locally Recurrent Networks for End-to-End Speech Recognition

    Authors: Max W. Y. Lam, Jun Wang, Chao Weng, Dan Su, Dong Yu

    Abstract: End-to-end speech recognition generally uses hand-engineered acoustic features as input and excludes the feature extraction module from its joint optimization. To extract learnable and adaptive features and mitigate information loss, we propose a new encoder that adopts globally attentive locally recurrent (GALR) networks and directly takes raw waveform as input. We observe improved ASR performanc… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted in Interspeech 2021

  42. arXiv:2106.04008  [pdf, other

    cs.LG

    Widening Access to Applied Machine Learning with TinyML

    Authors: Vijay Janapa Reddi, Brian Plancher, Susan Kennedy, Laurence Moroney, Pete Warden, Anant Agarwal, Colby Banbury, Massimo Banzi, Matthew Bennett, Benjamin Brown, Sharad Chitlangia, Radhika Ghosal, Sarah Grafman, Rupert Jaeger, Srivatsan Krishnan, Maximilian Lam, Daniel Leiker, Cara Mann, Mark Mazumder, Dominic Pajak, Dhilan Ramaprasad, J. Evan Smith, Matthew Stewart, Dustin Tingley

    Abstract: Broadening access to both computational and educational resources is critical to diffusing machine-learning (ML) innovation. However, today, most ML resources and experts are siloed in a few countries and organizations. In this paper, we describe our pedagogical approach to increasing access to applied ML through a massive open online course (MOOC) on Tiny Machine Learning (TinyML). We suggest tha… ▽ More

    Submitted 9 June, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Understanding the underpinnings of the TinyML edX course series: https://www.edx.org/professional-certificate/harvardx-tiny-machine-learning

  43. arXiv:2103.16057  [pdf, other

    cs.CL cs.LG

    Grounding Open-Domain Instructions to Automate Web Support Tasks

    Authors: Nancy Xu, Sam Masling, Michael Du, Giovanni Campagna, Larry Heck, James Landay, Monica S Lam

    Abstract: Grounding natural language instructions on the web to perform previously unseen tasks enables accessibility and automation. We introduce a task and dataset to train AI agents from open-domain, step-by-step instructions originally written for people. We build RUSS (Rapid Universal Support Service) to tackle this problem. RUSS consists of two models: First, a BERT-LSTM with pointers parses instructi… ▽ More

    Submitted 4 April, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: To be published in NAACL 2021

  44. arXiv:2103.06172  [pdf, other

    cs.LG cs.CY

    Fairness On The Ground: Applying Algorithmic Fairness Approaches to Production Systems

    Authors: Chloé Bakalar, Renata Barreto, Stevie Bergman, Miranda Bogen, Bobbie Chern, Sam Corbett-Davies, Melissa Hall, Isabel Kloumann, Michelle Lam, Joaquin Quiñonero Candela, Manish Raghavan, Joshua Simons, Jonathan Tannen, Edmund Tong, Kate Vredenburgh, Jie**g Zhao

    Abstract: Many technical approaches have been proposed for ensuring that decisions made by machine learning systems are fair, but few of these proposals have been stress-tested in real-world systems. This paper presents an example of one team's approach to the challenge of applying algorithmic fairness approaches to complex production systems within the context of a large technology company. We discuss how… ▽ More

    Submitted 24 March, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: 12 pages, 2 figures

  45. arXiv:2103.01461  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect

    Authors: Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu

    Abstract: We study the cocktail party problem and propose a novel attention network called Tune-In, abbreviated for training under negative environments with interference. It firstly learns two separate spaces of speaker-knowledge and speech-stimuli based on a shared feature space, where a new block structure is designed as the building block for all spaces, and then cooperatively solves different tasks. Be… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: Accepted in AAAI 2021

  46. arXiv:2103.00819  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation

    Authors: Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

    Abstract: One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers. In contrast, our key finding is that multi-granularity features are essential for enhancing contextual modeling and computational efficiency. We introduce a self-attentive network with a novel sandglass… ▽ More

    Submitted 8 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: Accepted in ICASSP 2021

  47. arXiv:2103.00816  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Contrastive Separative Coding for Self-supervised Representation Learning

    Authors: Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu

    Abstract: To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC). Our key finding is to learn such representations by separating the target signal from contrastive interfering signals. First, a multi-task separative encoder is built to extract shared separable and discriminative embedding… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: Accepted in ICASSP 2021

  48. arXiv:2101.05014  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks

    Authors: Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu

    Abstract: Recent research on the time-domain audio separation networks (TasNets) has brought great success to speech separation. Nevertheless, conventional TasNets struggle to satisfy the memory and latency constraints in industrial applications. In this regard, we design a low-cost high-performance architecture, namely, globally attentive locally recurrent (GALR) network. Alike the dual-path RNN (DPRNN), w… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

    Comments: Accepted in IEEE SLT 2021

  49. arXiv:2010.05106  [pdf, other

    cs.CL cs.LG

    Localizing Open-Ontology QA Semantic Parsers in a Day Using Machine Translation

    Authors: Mehrad Moradshahi, Giovanni Campagna, Sina J. Semnani, Silei Xu, Monica S. Lam

    Abstract: We propose Semantic Parser Localizer (SPL), a toolkit that leverages Neural Machine Translation (NMT) systems to localize a semantic parser for a new language. Our methodology is to (1) generate training data automatically in the target language by augmenting machine-translated datasets with local entities scraped from public websites, (2) add a few-shot boost of human-translated sentences and tra… ▽ More

    Submitted 10 October, 2020; originally announced October 2020.

    Comments: Published in EMNLP 2020

  50. arXiv:2010.04806  [pdf, other

    cs.CL

    AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

    Authors: Silei Xu, Sina J. Semnani, Giovanni Campagna, Monica S. Lam

    Abstract: We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with template-based parsing to find alternative expressions of a… ▽ More

    Submitted 7 June, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

    Comments: To appear in EMNLP 2020