Skip to main content

Showing 1–50 of 125 results for author: Cai, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01080  [pdf, other

    cs.CL cs.AI

    Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese

    Authors: Yunqi Xu, Tianchi Cai, Jiyan Jiang, Xierui Song

    Abstract: The prevailing issue of factual inconsistency errors in conventional Retrieval Augmented Generation (RAG) motivates the study of Factual Consistency Evaluation (FCE). Despite the various FCE methods proposed earlier, these methods are evaluated on datasets generated by specific Large Language Models (LLMs). Without a comprehensive benchmark, it remains unexplored how these FCE methods perform on o… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering

    Authors: Tianchi Cai, Zhiwen Tan, Xierui Song, Tao Sun, Jiyan Jiang, Yunqi Xu, Yinger Zhang, **jie Gu

    Abstract: Retrieval Augmented Generation (RAG) has become prevalent in question-answering (QA) tasks due to its ability of utilizing search engine to enhance the quality of long-form question-answering (LFQA). Despite the emergence of various open source methods and web-enhanced commercial systems such as Bing Chat, two critical problems remain unsolved, i.e., the lack of factuality and clear logic in the g… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Report number: 30th

    Journal ref: KDD 2024

  3. arXiv:2406.06755  [pdf, other

    math.ST cs.LG stat.ML

    Optimal Federated Learning for Nonparametric Regression with Heterogeneous Distributed Differential Privacy Constraints

    Authors: T. Tony Cai, Abhinav Chakraborty, Lasse Vuursteen

    Abstract: This paper studies federated learning for nonparametric regression in the context of distributed samples across different servers, each adhering to distinct differential privacy constraints. The setting we consider is heterogeneous, encompassing both varying sample sizes and differential privacy constraints across servers. Within this framework, both global and pointwise estimation are considered,… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 49 pages total, consisting of an article (24 pages) and a supplement (25 pages)

    MSC Class: 62G08; 62C20; 68P27; 62F30;

  4. arXiv:2406.06749  [pdf, other

    math.ST cs.LG stat.ML

    Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests

    Authors: T. Tony Cai, Abhinav Chakraborty, Lasse Vuursteen

    Abstract: Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations. In this paper, we study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints. We first establish matching lower and upper bound… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 77 pages total; consisting of a main article (28 pages) and supplement (49 pages)

    MSC Class: 62G10; 62C20; 68P27; 62F30

  5. arXiv:2405.18971  [pdf, other

    cs.IR

    Mitigate Position Bias with Coupled Ranking Bias on CTR Prediction

    Authors: Yao Zhao, Zhining Liu, Tianchi Cai, Haipeng Zhang, Chenyi Zhuang, **jie Gu

    Abstract: Position bias, i.e., users' preference of an item is affected by its placing position, is well studied in the recommender system literature. However, most existing methods ignore the widely coupled ranking bias, which is also related to the placing position of the item. Using both synthetic and industrial datasets, we first show how this widely coexisted ranking bias deteriorates the performance o… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures

  6. arXiv:2405.16042  [pdf, other

    cs.CL

    Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention

    Authors: Andrew Li, Xianle Feng, Siddhant Narang, Austin Peng, Tianle Cai, Raj Sanjay Shah, Sashank Varma

    Abstract: When reading temporarily ambiguous garden-path sentences, misinterpretations sometimes linger past the point of disambiguation. This phenomenon has traditionally been studied in psycholinguistic experiments using online measures such as reading times and offline measures such as comprehension questions. Here, we investigate the processing of garden-path sentences and the fate of lingering misinter… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by CogSci-24

  7. arXiv:2405.09493  [pdf, ps, other

    stat.ML cs.LG

    C-Learner: Constrained Learning for Causal Inference and Semiparametric Statistics

    Authors: Tiffany Tianhui Cai, Yuri Fonseca, Kaiwen Hou, Hongseok Namkoong

    Abstract: Causal estimation (e.g. of the average treatment effect) requires estimating complex nuisance parameters (e.g. outcome models). To adjust for errors in nuisance parameter estimation, we present a novel correction method that solves for the best plug-in estimator under the constraint that the first-order error of the estimator with respect to the nuisance parameter estimate is zero. Our constrained… ▽ More

    Submitted 22 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  8. arXiv:2405.06107  [pdf, other

    cs.LG cs.SC hep-ph hep-th stat.ML

    Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory

    Authors: Tianji Cai, Garrett W. Merz, François Charton, Niklas Nolte, Matthias Wilhelm, Kyle Cranmer, Lance J. Dixon

    Abstract: We pursue the use of deep learning methods to improve state-of-the-art computations in theoretical high-energy physics. Planar N = 4 Super Yang-Mills theory is a close cousin to the theory that describes Higgs boson production at the Large Hadron Collider; its scattering amplitudes are large mathematical expressions containing integer coefficients. In this paper, we apply Transformers to predict t… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 26+10 pages, 9 figures, 7 tables, application of machine learning aimed at physics and machine learning audience

    Report number: SLAC-PUB-17774

  9. arXiv:2404.14469  [pdf, other

    cs.CL cs.AI

    SnapKV: LLM Knows What You are Looking for Before Generation

    Authors: Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen

    Abstract: Large Language Models (LLMs) have made remarkable progress in processing extensive contexts, with the Key-Value (KV) cache playing a vital role in enhancing their performance. However, the growth of the KV cache in response to increasing input length poses challenges to memory and time efficiency. To address this problem, this paper introduces SnapKV, an innovative and fine-tuning-free approach th… ▽ More

    Submitted 16 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  10. arXiv:2404.07413  [pdf, other

    cs.CL cs.AI

    JetMoE: Reaching Llama2 Performance with 0.1M Dollars

    Authors: Yikang Shen, Zhen Guo, Tianle Cai, Zengyi Qin

    Abstract: Large Language Models (LLMs) have achieved remarkable results, but their increasing resource demand has become a major obstacle to the development of powerful and accessible super-human intelligence. This report introduces JetMoE-8B, a new LLM trained with less than $0.1 million, using 1.25T tokens from carefully mixed open-source corpora and 30,000 H100 GPU hours. Despite its low cost, the JetMoE… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  11. arXiv:2404.06676  [pdf

    cs.LG eess.SP stat.AP

    Topological Feature Search Method for Multichannel EEG: Application in ADHD classification

    Authors: Tianming Cai, Guoying Zhao, Junbin Zang, Chen Zong, Zhidong Zhang, Chenyang Xue

    Abstract: In recent years, the preliminary diagnosis of Attention Deficit Hyperactivity Disorder (ADHD) using electroencephalography (EEG) has garnered attention from researchers. EEG, known for its expediency and efficiency, plays a pivotal role in the diagnosis and treatment of ADHD. However, the non-stationarity of EEG signals and inter-subject variability pose challenges to the diagnostic and classifica… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  12. arXiv:2403.15484  [pdf, other

    cs.CL cs.LG

    RakutenAI-7B: Extending Large Language Models for Japanese

    Authors: Rakuten Group, Aaron Levine, Connie Huang, Chenguang Wang, Eduardo Batista, Ewa Szymanska, Hongyi Ding, Hou Wei Chou, Jean-François Pessiot, Johanes Effendi, Justin Chiu, Kai Torben Ohlhus, Karan Chopra, Keiji Shinzato, Koji Murakami, Lee Xiong, Lei Chen, Maki Kubota, Maksim Tkachenko, Miroku Lee, Naoki Takahashi, Prathyusha Jwalapuram, Ryutaro Tatsushima, Saurabh Jain, Sunil Kumar Yadav , et al. (5 additional authors not shown)

    Abstract: We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.

    Submitted 21 March, 2024; originally announced March 2024.

  13. arXiv:2403.14926  [pdf, other

    stat.ML cs.LG

    Contrastive Learning on Multimodal Analysis of Electronic Health Records

    Authors: Tianxi Cai, Feiqing Huang, Ryumei Nakada, Linjun Zhang, Doudou Zhou

    Abstract: Electronic health record (EHR) systems contain a wealth of multimodal clinical data including structured data like clinical codes and unstructured data such as clinical notes. However, many existing EHR-focused studies has traditionally either concentrated on an individual modality or merged different modalities in a rather rudimentary fashion. This approach often results in the perception of stru… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 34 pages

  14. arXiv:2403.10006  [pdf, other

    cs.CY cs.HC cs.LG cs.SI

    Graph Enhanced Reinforcement Learning for Effective Group Formation in Collaborative Problem Solving

    Authors: Zheng Fang, Fucai Ke, Jae Young Han, Zhijie Feng, Toby Cai

    Abstract: This study addresses the challenge of forming effective groups in collaborative problem-solving environments. Recognizing the complexity of human interactions and the necessity for efficient collaboration, we propose a novel approach leveraging graph theory and reinforcement learning. Our methodology involves constructing a graph from a dataset where nodes represent participants, and edges signify… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  15. arXiv:2403.01251  [pdf, other

    cs.CL

    Accelerating Greedy Coordinate Gradient via Probe Sampling

    Authors: Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, Michael Shieh

    Abstract: Safety of Large Language Models (LLMs) has become a critical issue given their rapid progresses. Greedy Coordinate Gradient (GCG) is shown to be effective in constructing adversarial prompts to break the aligned LLMs, but optimization of GCG is time-consuming. To reduce the time cost of GCG and enable more comprehensive studies of LLM safety, in this work, we study a new algorithm called… ▽ More

    Submitted 27 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  16. arXiv:2402.19481  [pdf, other

    cs.CV

    DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

    Authors: Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han

    Abstract: Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method split… ▽ More

    Submitted 15 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR 2024 Highlight Code: https://github.com/mit-han-lab/distrifuser Website: https://hanlab.mit.edu/projects/distrifusion Blog: https://hanlab.mit.edu/blog/distrifusion

  17. arXiv:2402.17437  [pdf, other

    cs.CL cs.AI

    Exploiting Emotion-Semantic Correlations for Empathetic Response Generation

    Authors: Zhou Yang, Zhaochun Ren, Yufeng Wang, Xiaofei Zhu, Zhihao Chen, Tiecheng Cai, Yunbing Wu, Yisong Su, Sibo Ju, Xiangwen Liao

    Abstract: Empathetic response generation aims to generate empathetic responses by understanding the speaker's emotional feelings from the language of dialogue. Recent methods capture emotional words in the language of communicators and construct them as static vectors to perceive nuanced emotions. However, linguistic research has shown that emotional words in language are dynamic and have correlations with… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 3 figures, Findings of EMNLP 2023

  18. arXiv:2402.13497  [pdf, other

    cs.CV

    Push Quantization-Aware Training Toward Full Precision Performances via Consistency Regularization

    Authors: Junbiao Pang, Tianyang Cai, Baochang Zhang, Jiaqi Wu, Ye Tao

    Abstract: Existing Quantization-Aware Training (QAT) methods intensively depend on the complete labeled dataset or knowledge distillation to guarantee the performances toward Full Precision (FP) accuracies. However, empirical results show that QAT still has inferior results compared to its FP counterpart. One question is how to push QAT toward or even surpass FP performances. In this paper, we address this… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 11 pages, 5 figures

  19. arXiv:2402.10193  [pdf, other

    cs.LG cs.CL

    BitDelta: Your Fine-Tune May Only Be Worth One Bit

    Authors: James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

    Abstract: Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that fine-tuning adds less new information to the model, and is thus more compressible. We explore this assumption by decomposing the weights of fine-tuned models into t… ▽ More

    Submitted 27 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  20. arXiv:2402.03204  [pdf, other

    cs.IT cs.AI cs.LG

    Multi-agent Reinforcement Learning for Energy Saving in Multi-Cell Massive MIMO Systems

    Authors: Tianzhang Cai, Qichen Wang, Shuai Zhang, Özlem Tuğfe Demir, Cicek Cavdar

    Abstract: We develop a multi-agent reinforcement learning (MARL) algorithm to minimize the total energy consumption of multiple massive MIMO (multiple-input multiple-output) base stations (BSs) in a multi-cell network while preserving the overall quality-of-service (QoS) by making decisions on the multi-level advanced sleep modes (ASMs) and antenna switching of these BSs. The problem is modeled as a decentr… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  21. arXiv:2401.15444  [pdf, other

    cs.LG

    Towards Causal Classification: A Comprehensive Study on Graph Neural Networks

    Authors: Simi Job, Xiaohui Tao, Taotao Cai, Lin Li, Haoran Xie, Jianming Yong

    Abstract: The exploration of Graph Neural Networks (GNNs) for processing graph-structured data has expanded, particularly their potential for causal analysis due to their universal approximation capabilities. Anticipated to significantly enhance common graph-based tasks such as classification and prediction, the development of a causally enhanced GNN framework is yet to be thoroughly investigated. Addressin… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  22. arXiv:2401.12272  [pdf, other

    stat.ML cs.LG

    Transfer Learning for Nonparametric Regression: Non-asymptotic Minimax Analysis and Adaptive Procedure

    Authors: T. Tony Cai, Hongming Pu

    Abstract: Transfer learning for nonparametric regression is considered. We first study the non-asymptotic minimax risk for this problem and develop a novel estimator called the confidence thresholding estimator, which is shown to achieve the minimax optimal risk up to a logarithmic factor. Our results demonstrate two unique phenomena in transfer learning: auto-smoothing and super-acceleration, which differe… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  23. arXiv:2401.10774  [pdf, other

    cs.LG cs.CL

    Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

    Authors: Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

    Abstract: Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Bandwidth Memory (HBM) to the accelerator's cache. While methods such as speculative decoding have been suggested to address this issue, their implementa… ▽ More

    Submitted 14 June, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: The code for this implementation is available at https://github.com/FasterDecoding/Medusa

  24. arXiv:2401.03820  [pdf, other

    math.ST cs.IT stat.ME stat.ML

    Optimal Differentially Private PCA and Estimation for Spiked Covariance Matrices

    Authors: T. Tony Cai, Dong Xia, Mengyue Zha

    Abstract: Estimating a covariance matrix and its associated principal components is a fundamental problem in contemporary statistics. While optimal estimation procedures have been developed with well-understood properties, the increasing demand for privacy preservation introduces new complexities to this classical problem. In this paper, we study optimal differentially private Principal Component Analysis (… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  25. arXiv:2312.02554  [pdf, other

    cs.LG cs.CL

    ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference

    Authors: Tianchi Cai, Xierui Song, Jiyan Jiang, Fei Teng, **jie Gu, Guannan Zhang

    Abstract: Aligning language models to human expectations, e.g., being helpful and harmless, has become a pressing challenge for large language models. A typical alignment procedure consists of supervised fine-tuning and preference learning. Most preference learning methods, such as RLHF and DPO, depend on pairwise preference data, which inadequately address scenarios where human feedback is point-wise, lead… ▽ More

    Submitted 26 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  26. arXiv:2311.14994  [pdf, other

    cs.LG cs.AI

    Exploring Causal Learning through Graph Neural Networks: An In-depth Review

    Authors: Simi Job, Xiaohui Tao, Taotao Cai, Haoran Xie, Lin Li, Jianming Yong, Qing Li

    Abstract: In machine learning, exploring data correlations to predict outcomes is a fundamental task. Recognizing causal relationships embedded within data is pivotal for a comprehensive understanding of system dynamics, the significance of which is paramount in data-driven decision-making processes. Beyond traditional methods, there has been a surge in the use of graph neural networks (GNNs) for causal lea… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  27. arXiv:2311.08252  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    REST: Retrieval-Based Speculative Decoding

    Authors: Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He

    Abstract: We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation. The key insight driving the development of REST is the observation that the process of text generation often includes certain common phases and patterns. Unlike previous methods that rely on a draft language model for speculative decoding, REST harnesses the power of retrieva… ▽ More

    Submitted 4 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: NAACL 2024, camera ready

  28. arXiv:2309.10283  [pdf, other

    cs.LG cs.AI cs.CR

    FRAMU: Attention-based Machine Unlearning using Federated Reinforcement Learning

    Authors: Thanveer Shaik, Xiaohui Tao, Lin Li, Haoran Xie, Taotao Cai, Xiaofeng Zhu, Qing Li

    Abstract: Machine Unlearning is an emerging field that addresses data privacy issues by enabling the removal of private or irrelevant data from the Machine Learning process. Challenges related to privacy and model efficiency arise from the use of outdated, private, and irrelevant data. These issues compromise both the accuracy and the computational efficiency of models in both Machine Learning and Unlearnin… ▽ More

    Submitted 2 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  29. arXiv:2309.06534  [pdf, other

    cs.LG stat.ME

    Distributionally Robust Transfer Learning

    Authors: Xin Xiong, Zijian Guo, Tianxi Cai

    Abstract: Many existing transfer learning methods rely on leveraging information from source data that closely resembles the target data. However, this approach often overlooks valuable knowledge that may be present in different yet potentially related auxiliary samples. When dealing with a limited amount of target data and a diverse range of source models, our paper introduces a novel approach, Distributio… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  30. arXiv:2309.02669  [pdf, other

    cs.LG

    Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning

    Authors: Tianchi Cai, Jiyan Jiang, Wenpeng Zhang, Shiji Zhou, Xierui Song, Li Yu, Lihong Gu, Xiaodong Zeng, **jie Gu, Guannan Zhang

    Abstract: We study the budget allocation problem in online marketing campaigns that utilize previously collected offline data. We first discuss the long-term effect of optimizing marketing budget allocation decisions in the offline setting. To overcome the challenge, we propose a novel game-theoretic offline value-based reinforcement learning method using mixed policies. The proposed method reduces the need… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: WSDM 23, Best Paper Candidate

  31. arXiv:2309.01194  [pdf, other

    cs.AI

    A Survey on Service Route and Time Prediction in Instant Delivery: Taxonomy, Progress, and Prospects

    Authors: Haomin Wen, Youfang Lin, Lixia Wu, Xiaowei Mao, Tianyue Cai, Yunfeng Hou, Shengnan Guo, Yuxuan Liang, Guangyin **, Yiji Zhao, Roger Zimmermann, Jie** Ye, Huaiyu Wan

    Abstract: Instant delivery services, such as food delivery and package delivery, have achieved explosive growth in recent years by providing customers with daily-life convenience. An emerging research area within these services is service Route\&Time Prediction (RTP), which aims to estimate the future service route as well as the arrival time of a given worker. As one of the most crucial tasks in those serv… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  32. Model-free Reinforcement Learning with Stochastic Reward Stabilization for Recommender Systems

    Authors: Tianchi Cai, Shenliao Bao, Jiyan Jiang, Shiji Zhou, Wenpeng Zhang, Lihong Gu, **jie Gu, Guannan Zhang

    Abstract: Model-free RL-based recommender systems have recently received increasing research attention due to their capability to handle partial feedback and long-term rewards. However, most existing research has ignored a critical feature in recommender systems: one user's feedback on the same item at different times is random. The stochastic rewards property essentially differs from that in classic RL sce… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: SIGIR '23

  33. arXiv:2307.02690  [pdf, other

    cs.CL cs.AI cs.LG

    Scaling In-Context Demonstrations with Structured Attention

    Authors: Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang

    Abstract: The recent surge of large language models (LLMs) highlights their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embedd… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  34. Catch Me If You Can: A New Low-Rate DDoS Attack Strategy Disguised by Feint

    Authors: Tianyang Cai, Yuqi Li, Tao Jia, Leo Yu Zhang, Zheng Yang

    Abstract: While collaborative systems provide convenience to our lives, they also face many security threats. One of them is the Low-rate Distributed Denial-of-Service (LDDoS) attack, which is a worthy concern. Unlike volumetric DDoS attacks that continuously send large volumes of traffic, LDDoS attacks are more stealthy and difficult to be detected owing to their low-volume feature. Due to its stealthiness… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

  35. arXiv:2305.17608  [pdf, other

    cs.LG cs.AI cs.CL math.OC stat.ML

    Reward Collapse in Aligning Large Language Models

    Authors: Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su

    Abstract: The extraordinary capabilities of large language models (LLMs) such as ChatGPT and GPT-4 are in part unleashed by aligning them with reward models that are trained on human preferences, which are often represented as rankings of responses to prompts. In this paper, we document the phenomenon of \textit{reward collapse}, an empirical observation where the prevailing ranking-based approach results i… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

  36. arXiv:2305.17126  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Large Language Models as Tool Makers

    Authors: Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou

    Abstract: Recent research has highlighted the potential of large language models (LLMs) to improve their problem-solving capabilities with the aid of suitable external tools. In our work, we further advance this concept by introducing a closed-loop framework, referred to as LLMs A s Tool Makers (LATM), where LLMs create their own reusable tools for problem-solving. Our approach consists of two phases: 1) to… ▽ More

    Submitted 10 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Code available at https://github.com/ctlllll/LLM-ToolMaker

  37. arXiv:2305.11407  [pdf, other

    cs.AI

    LATTE: Label-efficient Incident Phenoty** from Longitudinal Electronic Health Records

    Authors: Jun Wen, Jue Hou, Clara-Lea Bonzel, Yihan Zhao, Victor M. Castro, Vivian S. Gainer, Dana Weisenfeld, Tianrun Cai, Yuk-Lam Ho, Vidul A. Panickan, Lauren Costa, Chuan Hong, J. Michael Gaziano, Katherine P. Liao, Junwei Lu, Kelly Cho, Tianxi Cai

    Abstract: Electronic health record (EHR) data are increasingly used to support real-world evidence (RWE) studies. Yet its ability to generate reliable RWE is limited by the lack of readily available precise information on the timing of clinical events such as the onset time of heart failure. We propose a LAbel-efficienT incidenT phEnoty** (LATTE) algorithm to accurately annotate the timing of clinical eve… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: ERHs data

  38. arXiv:2305.02334  [pdf, other

    hep-th cond-mat.dis-nn cs.LG hep-ph stat.ML

    Structures of Neural Network Effective Theories

    Authors: Ian Banta, Tianji Cai, Nathaniel Craig, Zhengkang Zhang

    Abstract: We develop a diagrammatic approach to effective field theories (EFTs) corresponding to deep neural networks at initialization, which dramatically simplifies computations of finite-width corrections to neuron statistics. The structures of EFT calculations make it transparent that a single condition governs criticality of all connected correlators of neuron preactivations. Understanding of such EFTs… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: 7+13 pages, 5 figures

  39. arXiv:2304.13704  [pdf

    cs.RO

    An Investigation into Active Control for Accessible Orbital Flight

    Authors: Timothy Cai

    Abstract: Recently, a practical and publicly accessible satellite standard called the SmallSat has amplified public involvement in orbital research. This allows for flexible and efficient deployments of impactful low-earth-orbit experiments that would otherwise never be flown. However, the launch industry responsible for flying these experiments is not flexible nor efficient. This project aims to make orbit… ▽ More

    Submitted 29 March, 2023; originally announced April 2023.

    Comments: 13 pages, 7 figures, published in the Canadian Science Fair Journal

  40. arXiv:2304.06808  [pdf, other

    cs.LG stat.ML

    Active Cost-aware Labeling of Streaming Data

    Authors: Ting Cai, Kirthevasan Kandasamy

    Abstract: We study actively labeling streaming data, where an active learner is faced with a stream of data points and must carefully choose which of these points to label via an expensive experiment. Such problems frequently arise in applications such as healthcare and astronomy. We first study a setting when the data's inputs belong to one of $K$ discrete distributions and formalize this problem via a los… ▽ More

    Submitted 4 July, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Accepted by AISTATS 2023. 20 pages, 11 figures

  41. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  42. arXiv:2303.07152  [pdf, ps, other

    math.ST cs.CR cs.LG stat.ME stat.ML

    Score Attack: A Lower Bound Technique for Optimal Differentially Private Learning

    Authors: T. Tony Cai, Yichen Wang, Linjun Zhang

    Abstract: Achieving optimal statistical performance while ensuring the privacy of personal data is a challenging yet crucial objective in modern data analysis. However, characterizing the optimality, particularly the minimax lower bound, under privacy constraints is technically difficult. To address this issue, we propose a novel approach called the score attack, which provides a lower bound on the differ… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2011.03900

    MSC Class: 62F30; 62J12; 62G05

  43. arXiv:2303.05686  [pdf, other

    eess.IV cs.CV

    Generative AI for Rapid Diffusion MRI with Improved Image Quality, Reliability and Generalizability

    Authors: Amir Sadikov, Xinlei Pan, Hannah Choi, Lanya T. Cai, Pratik Mukherjee

    Abstract: Diffusion MRI is a non-invasive, in-vivo biomedical imaging method for map** tissue microstructure. Applications include structural connectivity imaging of the human brain and detecting microstructural neural changes. However, acquiring high signal-to-noise ratio dMRI datasets with high angular and spatial resolution requires prohibitively long scan times, limiting usage in many important clinic… ▽ More

    Submitted 6 October, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  44. arXiv:2303.04221  [pdf, other

    cs.HC

    THERIF: A Pipeline for Generating Themes for Readability with Iterative Feedback

    Authors: Tianyuan Cai, Aleena Gertrudes Niklaus, Michael Kraley, Bernard Kerr, Zoya Bylinskii

    Abstract: Digital reading applications give readers the ability to customize fonts, sizes, and spacings, all of which have been shown to improve the reading experience for readers from different demographics. However, tweaking these text features can be challenging, especially given their interactions on the final look and feel of the text. Our solution is to offer readers preset combinations of font, chara… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: Extended version of CHI LBW'2023 paper

  45. arXiv:2303.02011  [pdf, other

    stat.ML cs.LG

    Diagnosing Model Performance Under Distribution Shift

    Authors: Tiffany Tianhui Cai, Hongseok Namkoong, Steve Yadlowsky

    Abstract: Prediction models can perform poorly when deployed to target distributions different from the training distribution. To understand these operational failure modes, we develop a method, called DIstribution Shift DEcomposition (DISDE), to attribute a drop in performance to different types of distribution shifts. Our approach decomposes the performance drop into terms for 1) an increase in harder but… ▽ More

    Submitted 10 July, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

  46. ReDas: A Lightweight Architecture for Supporting Fine-Grained Resha** and Multiple Dataflows on Systolic Array

    Authors: Meng Han, Liang Wang, Limin Xiao, Tianhao Cai, Zeyu Wang, Xiangrong Xu, Chenhao Zhang

    Abstract: The systolic accelerator is one of the premier architectural choices for DNN acceleration. However, the conventional systolic architecture suffers from low PE utilization due to the mismatch between the fixed array and diverse DNN workloads. Recent studies have proposed flexible systolic array architectures to adapt to DNN models. However, these designs support only coarse-grained resha** or sig… ▽ More

    Submitted 14 May, 2024; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: 14 pages, 22 figures, journal

  47. arXiv:2301.04633  [pdf, ps, other

    hep-ex cs.DC physics.data-an

    Accelerating Machine Learning Inference with GPUs in ProtoDUNE Data Processing

    Authors: Te** Cai, Kenneth Herner, Tingjun Yang, Michael Wang, Maria Acosta Flechas, Philip Harris, Burt Holzman, Kevin Pedro, Nhan Tran

    Abstract: We study the performance of a cloud-based GPU-accelerated inference server to speed up event reconstruction in neutrino data batch jobs. Using detector data from the ProtoDUNE experiment and employing the standard DUNE grid job submission tools, we attempt to reprocess the data by running several thousand concurrent grid jobs, a rate we expect to be typical of current and future neutrino physics e… ▽ More

    Submitted 27 October, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: 13 pages, 9 figures, matches accepted version

    Report number: FERMILAB-PUB-22-944-ND-PPD-SCD

    Journal ref: Comput Softw Big Sci 7, 11 (2023)

  48. arXiv:2211.12612  [pdf, ps, other

    stat.ML cs.LG math.ST

    Transfer Learning for Contextual Multi-armed Bandits

    Authors: Changxiao Cai, T. Tony Cai, Hongzhe Li

    Abstract: Motivated by a range of applications, we study in this paper the problem of transfer learning for nonparametric contextual multi-armed bandits under the covariate shift model, where we have data collected on source bandits before the start of the target bandit learning. The minimax rate of convergence for the cumulative regret is established and a novel transfer learning algorithm that attains the… ▽ More

    Submitted 24 January, 2024; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Accepted to the Annals of Statistics

  49. arXiv:2210.09298  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    What Makes Convolutional Models Great on Long Sequence Modeling?

    Authors: Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, Debadeepta Dey

    Abstract: Convolutional models have been widely used in multiple domains. However, most existing models only use local convolution, making the model unable to handle long-range dependency efficiently. Attention overcomes this problem by aggregating global information but also makes the computational complexity quadratic to the sequence length. Recently, Gu et al. [2021] proposed a model called S4 inspired b… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: The code is available at https://github.com/ctlllll/SGConv

  50. arXiv:2209.13762  [pdf, other

    stat.ML cs.LG

    Consensus Knowledge Graph Learning via Multi-view Sparse Low Rank Block Model

    Authors: Tianxi Cai, Dong Xia, Luwan Zhang, Doudou Zhou

    Abstract: Network analysis has been a powerful tool to unveil relationships and interactions among a large number of objects. Yet its effectiveness in accurately identifying important node-node interactions is challenged by the rapidly growing network size, with data being collected at an unprecedented granularity and scale. Common wisdom to overcome such high dimensionality is collapsing nodes into smaller… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.