Skip to main content

Showing 1–16 of 16 results for author: Chiang, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18665  [pdf, other

    cs.LG cs.AI cs.CL

    RouteLLM: Learning to Route LLMs with Preference Data

    Authors: Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica

    Abstract: Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select betwe… ▽ More

    Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.11939  [pdf, other

    cs.LG cs.AI cs.CL

    From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

    Authors: Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica

    Abstract: The rapid evolution of language models has necessitated the development of more challenging benchmarks. Current static benchmarks often struggle to consistently distinguish between the capabilities of different models and fail to align with real-world user preferences. On the other hand, live crowd-sourced platforms like the Chatbot Arena collect a wide range of natural prompts and user feedback.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2405.20947  [pdf, other

    cs.CL cs.AI

    OR-Bench: An Over-Refusal Benchmark for Large Language Models

    Authors: Justin Cui, Wei-Lin Chiang, Ion Stoica, Cho-Jui Hsieh

    Abstract: Large Language Models (LLMs) require careful safety alignment to prevent malicious outputs. While significant research focuses on mitigating harmful content generation, the enhanced safety often come with the side effect of over-refusal, where LLMs may reject innocuous prompts and become less helpful. Although the issue of over-refusal has been empirically observed, a systematic measurement is cha… ▽ More

    Submitted 20 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: version 2, 10 pages main, 22 pages total

  4. arXiv:2404.14527  [pdf, other

    cs.DC cs.LG

    Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

    Authors: Tyler Griggs, Xiaoxuan Liu, Jiaxiang Yu, Doyoung Kim, Wei-Lin Chiang, Alvin Cheung, Ion Stoica

    Abstract: Large language models (LLMs) are increasingly integrated into many online services, yet they remain cost-prohibitive to deploy due to the requirement of expensive GPU instances. Prior work has addressed the high cost of LLM serving by improving the inference engine, but less attention has been given to selecting the most cost-efficient GPU type(s) for a specific LLM service. There is a large and g… ▽ More

    Submitted 27 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  5. arXiv:2404.11595  [pdf, other

    cs.SE

    A Deep Dive into Large Language Models for Automated Bug Localization and Repair

    Authors: Soneya Binta Hossain, Nan Jiang, Qiang Zhou, Xiaopeng Li, Wen-Hao Chiang, Yingjun Lyu, Hoan Nguyen, Omer Tripp

    Abstract: Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug fixing utilizing LLMs. In contrast to many deep learning-based APR methods that assume known bug locations, rely on line-level localization tools, or address bug prediction and fixing in one step, our… ▽ More

    Submitted 10 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  6. arXiv:2403.04132  [pdf, other

    cs.AI cs.CL

    Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

    Authors: Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, Ion Stoica

    Abstract: Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences. Our methodology employs a pairwise comparison approach and leverages input from a diverse user base through crowd… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  7. arXiv:2311.14904  [pdf, other

    cs.LG cs.SE

    LLM-Assisted Code Cleaning For Training Accurate Code Generators

    Authors: Naman Jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen, Ion Stoica

    Abstract: Natural language to code generation is an important application area of LLMs and has received wide attention from the community. The majority of relevant studies have exclusively concentrated on increasing the quantity and functional correctness of training sets while disregarding other stylistic elements of programs. More recently, data quality has garnered a lot of interest and multiple works ha… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

  8. arXiv:2311.04850  [pdf, other

    cs.CL cs.AI

    Rethinking Benchmark and Contamination for Language Models with Rephrased Samples

    Authors: Shuo Yang, Wei-Lin Chiang, Lianmin Zheng, Joseph E. Gonzalez, Ion Stoica

    Abstract: Large language models are increasingly trained on all the data ever produced by humans. Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets. While most data decontamination efforts apply string matching (e.g., n-gram overlap) to remove benchmark data, we show that these methods are insufficient, and simple… ▽ More

    Submitted 11 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  9. arXiv:2309.11998  [pdf, other

    cs.CL cs.AI

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

    Authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

    Abstract: Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications. In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art LLMs. This dataset is collected from 210K unique IP addresses in the wild on our Vicuna demo and… ▽ More

    Submitted 10 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  10. arXiv:2306.05685  [pdf, other

    cs.CL cs.AI

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    Authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

    Abstract: Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement… ▽ More

    Submitted 23 December, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  11. Balsa: Learning a Query Optimizer Without Expert Demonstrations

    Authors: Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica

    Abstract: Query optimizers are a performance-critical component in every database system. Due to their complexity, optimizers take experts months to write and years to refine. In this work, we demonstrate for the first time that learning to optimize queries without learning from an expert optimizer is both possible and efficient. We present Balsa, a query optimizer built by deep reinforcement learning. Bals… ▽ More

    Submitted 3 May, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

    Comments: SIGMOD 2022; code released at: https://github.com/balsa-project/balsa/

  12. arXiv:2004.01580  [pdf, other

    cs.LG stat.ML

    Hawkes Process Multi-armed Bandits for Disaster Search and Rescue

    Authors: Wen-Hao Chiang, George Mohler

    Abstract: We propose a novel framework for integrating Hawkes processes with multi-armed bandit algorithms to solve spatio-temporal event forecasting and detection problems when data may be undersampled or spatially biased. In particular, we introduce an upper confidence bound algorithm using Bayesian spatial Hawkes process estimation for balancing the tradeoff between exploiting geographic regions where da… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.

  13. arXiv:1911.07574  [pdf, other

    cs.CV cs.IR cs.LG

    Bias-Aware Heapified Policy for Active Learning

    Authors: Wen-Yen Chang, Wen-Huan Chiang, Shao-Hao Lu, Tingfan Wu, Min Sun

    Abstract: The data efficiency of learning-based algorithms is more and more important since high-quality and clean data is expensive as well as hard to collect. In order to achieve high model performance with the least number of samples, active learning is a technique that queries the most important subset of data from the original dataset. In active learning domain, one of the mainstream research is the he… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

  14. arXiv:1905.07953  [pdf, other

    cs.LG cs.AI stat.ML

    Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

    Authors: Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, Cho-Jui Hsieh

    Abstract: Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for kee** the entire graph and the embedding of each node in memory. In this paper, we p… ▽ More

    Submitted 8 August, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

    Comments: In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19)

  15. arXiv:1902.08675  [pdf, ps, other

    cs.LG stat.ML

    Drug-drug interaction prediction based on co-medication patterns and graph matching

    Authors: Wen-Hao Chiang, Li Shen, Lang Li, Xia Ning

    Abstract: Background: The problem of predicting whether a drug combination of arbitrary orders is likely to induce adverse drug reactions is considered in this manuscript. Methods: Novel kernels over drug combinations of arbitrary orders are developed within support vector machines for the prediction. Graph matching methods are used in the novel kernels to measure the similarities among drug combinations, i… ▽ More

    Submitted 22 February, 2019; originally announced February 2019.

  16. arXiv:1803.03185  [pdf, other

    cs.IR cs.LG

    Drug Recommendation toward Safe Polypharmacy

    Authors: Wen-Hao Chiang, Li Shen, Lang Li, Xia Ning

    Abstract: Adverse drug reactions (ADRs) induced from high-order drug-drug interactions (DDIs) due to polypharmacy represent a significant public health problem. In this paper, we formally formulate the to-avoid and safe (with respect to ADRs) drug recommendation problems when multiple drugs have been taken simultaneously. We develop a joint model with a recommendation component and an ADR label prediction c… ▽ More

    Submitted 8 March, 2018; originally announced March 2018.