Skip to main content

Showing 1–50 of 232 results for author: Zhou, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00782  [pdf, other

    cs.CL

    Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning

    Authors: Zimu Lu, Aojun Zhou, Ke Wang, Houxing Ren, Weikang Shi, Junting Pan, Mingjie Zhan

    Abstract: Direct Preference Optimization (DPO) has proven effective at improving the performance of large language models (LLMs) on downstream tasks such as reasoning and alignment. In this work, we propose Step-Controlled DPO (SCDPO), a method for automatically providing stepwise error supervision by creating negative samples of mathematical reasoning rationales that start making errors at a specified step… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2407.00719  [pdf

    cs.CR cs.DC cs.LG

    A Whole-Process Certifiably Robust Aggregation Method Against Backdoor Attacks in Federated Learning

    Authors: Anqi Zhou, Yezheng Liu, Yidong Chai, Hongyi Zhu, Xinyue Ge, Yuanchun Jiang, Meng Wang

    Abstract: Federated Learning (FL) has garnered widespread adoption across various domains such as finance, healthcare, and cybersecurity. Nonetheless, FL remains under significant threat from backdoor attacks, wherein malicious actors insert triggers into trained models, enabling them to perform certain tasks while still meeting FL's primary objectives. In response, robust aggregation methods have been prop… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 14 pages

  3. arXiv:2407.00487  [pdf, other

    cs.CL

    It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization

    Authors: Bingdong Li, Zixiang Di, Yanting Yang, Hong Qian, Peng Yang, Hao Hao, Ke Tang, Aimin Zhou

    Abstract: In this paper, we introduce a novel approach for large language model merging via black-box multi-objective optimization algorithms. The goal of model merging is to combine multiple models, each excelling in different tasks, into a single model that outperforms any of the individual source models. However, model merging faces two significant challenges: First, existing methods rely heavily on huma… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  4. arXiv:2406.17864  [pdf, other

    cs.CY cs.AI

    AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

    Authors: Yi Zeng, Kevin Klyman, Andy Zhou, Yu Yang, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

    Abstract: We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation. We identify 314 unique risk categories organized into a four-tiered taxonomy. At the highest level, this taxonomy encompasses Sys… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  5. arXiv:2406.17097  [pdf, other

    cs.HC

    Lower Quantity, Higher Quality: Auditing News Content and User Perceptions on Twitter/X Algorithmic versus Chronological Timelines

    Authors: Stephanie Wang, Shengchun Huang, Alvin Zhou, Danaë Metaxa

    Abstract: Social media personalization algorithms increasingly influence the flow of civic information through society, resulting in concerns about "filter bubbles", "echo chambers", and other ways they might exacerbate ideological segregation and fan the spread of polarizing content. To address these concerns, we designed and conducted a sociotechnical audit (STA) to investigate how Twitter/X's timeline al… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 24 pages, 5 figures, Computer-Supported Cooperative Work

  6. arXiv:2406.13941  [pdf, other

    cs.IR cs.AI

    UpDLRM: Accelerating Personalized Recommendation using Real-World PIM Architecture

    Authors: Sitian Chen, Haobin Tan, Amelie Chi Zhou, Yusen Li, Pavan Balaji

    Abstract: Deep Learning Recommendation Models (DLRMs) have gained popularity in recommendation systems due to their effectiveness in handling large-scale recommendation tasks. The embedding layers of DLRMs have become the performance bottleneck due to their intensive needs on memory capacity and memory bandwidth. In this paper, we propose UpDLRM, which utilizes real-world processingin-memory (PIM) hardware,… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.10675  [pdf, other

    cs.NE

    Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study

    Authors: Hao Hao, Xiaoqun Zhang, Aimin Zhou

    Abstract: Large Language Models (LLMs) have achieved significant progress across various fields and have exhibited strong potential in evolutionary computation, such as generating new solutions and automating algorithm design. Surrogate-assisted selection is a core step in evolutionary algorithms to solve expensive optimization problems by reducing the number of real evaluations. Traditionally, this has rel… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  8. arXiv:2406.08697  [pdf, other

    stat.ML cs.LG math.OC stat.ME

    Orthogonalized Estimation of Difference of $Q$-functions

    Authors: Angela Zhou

    Abstract: Offline reinforcement learning is important in many settings with available observational data but the inability to deploy new policies online due to safety, cost, and other concerns. Many recent advances in causal inference and machine learning target estimation of causal contrast functions such as CATE, which is sufficient for optimizing decisions and can adapt to potentially smoother structure.… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  9. arXiv:2406.08473  [pdf, other

    cs.LG

    Strategies for Pretraining Neural Operators

    Authors: Anthony Zhou, Cooper Lorsung, AmirPouya Hemmasian, Amir Barati Farimani

    Abstract: Pretraining for partial differential equation (PDE) modeling has recently shown promise in scaling neural operators across datasets to improve generalizability and performance. Despite these advances, our understanding of how pretraining affects neural operators is still limited; studies generally propose tailored architectures and datasets that make it challenging to compare or examine different… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 25 pages, 5 figures

  10. arXiv:2405.20413  [pdf, other

    cs.CR cs.CL cs.CV cs.LG

    Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

    Authors: Haibo **, Andy Zhou, Joe D. Menke, Haohan Wang

    Abstract: Large Language Models (LLMs) are typically harmless but remain vulnerable to carefully crafted prompts known as ``jailbreaks'', which can bypass protective measures and induce harmful behavior. Recent advancements in LLMs have incorporated moderation guardrails that can filter outputs, which trigger processing errors for certain malicious questions. Existing red-teaming benchmarks often neglect to… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 20 pages

  11. arXiv:2405.18206  [pdf, other

    cs.LG stat.ME stat.ML

    Multi-CATE: Multi-Accurate Conditional Average Treatment Effect Estimation Robust to Unknown Covariate Shifts

    Authors: Christoph Kern, Michael Kim, Angela Zhou

    Abstract: Estimating heterogeneous treatment effects is important to tailor treatments to those individuals who would most likely benefit. However, conditional average treatment effect predictors may often be trained on one population but possibly deployed on different, possibly unknown populations. We use methodology for learning multi-accurate predictors to post-process CATE T-learners (differenced regres… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  12. arXiv:2405.17057  [pdf, other

    cs.CL cs.AI

    ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation

    Authors: Houxing Ren, Mingjie Zhan, Zhongyuan Wu, Aojun Zhou, Junting Pan, Hongsheng Li

    Abstract: Code generation plays a crucial role in various tasks, such as code auto-completion and mathematical reasoning. Previous work has proposed numerous methods to enhance code generation performance, including integrating feedback from the compiler. Inspired by this, we present ReflectionCoder, a novel approach that effectively leverages reflection sequences constructed by integrating compiler feedbac… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  13. arXiv:2405.16494  [pdf, other

    cs.NE

    A First Look at Kolmogorov-Arnold Networks in Surrogate-assisted Evolutionary Algorithms

    Authors: Hao Hao, Xiaoqun Zhang, Bingdong Li, Aimin Zhou

    Abstract: Surrogate-assisted Evolutionary Algorithm (SAEA) is an essential method for solving expensive expensive problems. Utilizing surrogate models to substitute the optimization function can significantly reduce reliance on the function evaluations during the search process, thereby lowering the optimization costs. The construction of surrogate models is a critical component in SAEAs, with numerous mach… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  14. arXiv:2405.16057  [pdf, other

    cs.CL cs.LG

    SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

    Authors: Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li

    Abstract: Large Language Models (LLMs) have become pivotal in advancing the field of artificial intelligence, yet their immense sizes pose significant challenges for both fine-tuning and deployment. Current post-training pruning methods, while reducing the sizes of LLMs, often fail to maintain their original performance. To address these challenges, this paper introduces SPP, a Sparsity-Preserved Parameter-… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  15. arXiv:2405.14854  [pdf, other

    cs.CV cs.LG

    TerDiT: Ternary Diffusion Models with Transformers

    Authors: Xudong Lu, Aojun Zhou, Ziyi Lin, Qi Liu, Yuhui Xu, Renrui Zhang, Yafei Wen, Shuai Ren, Peng Gao, Junchi Yan, Hongsheng Li

    Abstract: Recent developments in large-scale pre-trained text-to-image diffusion models have significantly improved the generation of high-fidelity images, particularly with the emergence of diffusion models based on transformer architecture (DiTs). Among these diffusion models, diffusion transformers have demonstrated superior image generation capabilities, boosting lower FID scores and higher scalability.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 18 pages, 13 figures

  16. arXiv:2405.12100  [pdf, other

    cs.CL

    DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction

    Authors: Hao Chen, Biaojie Zeng, Xin Lin, Liang He, Aimin Zhou

    Abstract: Math world problems correction(MWPC) is a novel task dedicated to rectifying reasoning errors in the process of solving mathematical problems. In this paper, leveraging the advancements in large language models (LLMs), we address two key objectives:(1) Distinguishing between mathematical reasoning and error correction; (2) Exploring strategies to enhance the error correction capabilities of LLMs i… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  17. arXiv:2405.08674  [pdf, other

    cs.LG cs.AI

    Expensive Multi-Objective Bayesian Optimization Based on Diffusion Models

    Authors: Bingdong Li, Zixiang Di, Yongfan Lu, Hong Qian, Feng Wang, Peng Yang, Ke Tang, Aimin Zhou

    Abstract: Multi-objective Bayesian optimization (MOBO) has shown promising performance on various expensive multi-objective optimization problems (EMOPs). However, effectively modeling complex distributions of the Pareto optimal solutions is difficult with limited function evaluations. Existing Pareto set learning algorithms may exhibit considerable instability in such expensive scenarios, leading to signif… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  18. arXiv:2405.08604  [pdf, other

    cs.LG cs.AI

    Towards Geometry-Aware Pareto Set Learning for Neural Multi-Objective Combinatorial Optimization

    Authors: Yongfan Lu, Zixiang Di, Bingdong Li, Shengcai Liu, Hong Qian, Peng Yang, Ke Tang, Aimin Zhou

    Abstract: Multi-objective combinatorial optimization (MOCO) problems are prevalent in various real-world applications. Most existing neural MOCO methods rely on problem decomposition to transform an MOCO problem into a series of singe-objective combinatorial optimization (SOCO) problems. However, these methods often approximate partial regions of the Pareto front and spend excessive time on diversity enhanc… ▽ More

    Submitted 23 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  19. arXiv:2405.07395  [pdf, other

    cs.LG cs.AI cs.CE

    CaFA: Global Weather Forecasting with Factorized Attention on Sphere

    Authors: Zijie Li, Anthony Zhou, Saurabh Patil, Amir Barati Farimani

    Abstract: Accurate weather forecasting is crucial in various sectors, impacting decision-making processes and societal events. Data-driven approaches based on machine learning models have recently emerged as a promising alternative to numerical weather prediction models given their potential to capture physics of different scales from historical data and the significantly lower computational cost during the… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Preprint

  20. arXiv:2405.05083  [pdf, other

    cs.CC cs.GT

    Committee Elections with Candidate Attribute Constraints

    Authors: Aizhong Zhou, Fengbo Wang, Jiong Guo

    Abstract: In many real-world applications of committee elections, the candidates are associated with certain attributes and the chosen committee is required to satisfy some constraints posed on the candidate attributes. For instance, when dress collocation, it is generally acknowledged that when wearing a tie, you'd better wear a shirt, and wearing a suit, you'd better wear leather shoes. Here, dresses are… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  21. arXiv:2405.05062  [pdf, ps, other

    cs.CC cs.GT

    Controlling Borda Elections by Adding or Deleting either Votes or Candidates: Complete and Top-Truncated Votes

    Authors: Aizhong Zhou, Fengbo Wang, Jiong Guo

    Abstract: An election is defined as a pair of a set of candidates C=\{c_1,\cdots,c_m\} and a multiset of votes V=\{v_1,\cdots,v_n\}, where each vote is a linear order of the candidates. The Borda election rule is characterized by a vector \langle m-1,m-2,\cdots,0\rangle, which means that the candidate ranked at the i-th position of a vote v receives a score m-i from v, and the candidate receiving the most s… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  22. arXiv:2405.02338  [pdf, other

    cs.HC

    Mixed or Misperceived Reality?

    Authors: Aven Le Zhou, Lei Xi, Kang Zhang

    Abstract: "Surrealism Me" delves into Vilém Flusser's critique of media as mediators that often distort human perception of reality through an interactive virtual-embodying MR experience. It examines the obfuscating nature of media and reveals the constructed nature of media-projected realities, prompting a reevaluation of media's role and influence on our perception.

    Submitted 30 April, 2024; originally announced May 2024.

  23. arXiv:2405.00018  [pdf, other

    cs.DC physics.ao-ph

    Proof-of-concept: Using ChatGPT to Translate and Modernize an Earth System Model from Fortran to Python/JAX

    Authors: Anthony Zhou, Linnia Hawkins, Pierre Gentine

    Abstract: Earth system models (ESMs) are vital for understanding past, present, and future climate, but they suffer from legacy technical infrastructure. ESMs are primarily implemented in Fortran, a language that poses a high barrier of entry for early career scientists and lacks a GPU runtime, which has become essential for continued advancement as GPU power increases and CPU scaling slows. Fortran also la… ▽ More

    Submitted 13 February, 2024; originally announced May 2024.

  24. arXiv:2404.18490  [pdf, other

    cs.LG stat.ML

    Reduced-Rank Multi-objective Policy Learning and Optimization

    Authors: Ezinne Nwankwo, Michael I. Jordan, Angela Zhou

    Abstract: Evaluating the causal impacts of possible interventions is crucial for informing decision-making, especially towards improving access to opportunity. However, if causal effects are heterogeneous and predictable from covariates, personalized treatment decisions can improve individual outcomes and contribute to both efficiency and equity. In practice, however, causal researchers do not have a single… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  25. Inductive Cognitive Diagnosis for Fast Student Learning in Web-Based Online Intelligent Education Systems

    Authors: Shuo Liu, Junhao Shen, Hong Qian, Aimin Zhou

    Abstract: Cognitive diagnosis aims to gauge students' mastery levels based on their response logs. Serving as a pivotal module in web-based online intelligent education systems (WOIESs), it plays an upstream and fundamental role in downstream tasks like learning item recommendation and computerized adaptive testing. WOIESs are open learning environment where numerous new students constantly register and com… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: WWW 2024

  26. arXiv:2404.10561  [pdf, other

    cs.LG q-bio.QM stat.ML

    HiGraphDTI: Hierarchical Graph Representation Learning for Drug-Target Interaction Prediction

    Authors: Bin Liu, Siqi Wu, ** Wang, Xin Deng, Ao Zhou

    Abstract: The discovery of drug-target interactions (DTIs) plays a crucial role in pharmaceutical development. The deep learning model achieves more accurate results in DTI prediction due to its ability to extract robust and expressive features from drug and target chemical structures. However, existing deep learning methods typically generate drug features via aggregating molecular atom representations, ig… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  27. arXiv:2404.09544  [pdf, other

    cs.LG cs.AI

    GNNavigator: Towards Adaptive Training of Graph Neural Networks via Automatic Guideline Exploration

    Authors: Tong Qiao, Jianlei Yang, Yingjie Qi, Ao Zhou, Chen Bai, Bei Yu, Weisheng Zhao, Chunming Hu

    Abstract: Graph Neural Networks (GNNs) succeed significantly in many applications recently. However, balancing GNNs training runtime cost, memory consumption, and attainable accuracy for various applications is non-trivial. Previous training methodologies suffer from inferior adaptability and lack a unified training optimization solution. To address the problem, this work proposes GNNavigator, an adaptive G… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by DAC'24

  28. arXiv:2404.05605  [pdf, other

    cs.LG cs.AI

    Graph Neural Networks Automated Design and Deployment on Device-Edge Co-Inference Systems

    Authors: Ao Zhou, Jianlei Yang, Tong Qiao, Yingjie Qi, Zhi Yang, Weisheng Zhao, Chunming Hu

    Abstract: The key to device-edge co-inference paradigm is to partition models into computation-friendly and computation-intensive parts across the device and the edge, respectively. However, for Graph Neural Networks (GNNs), we find that simply partitioning without altering their structures can hardly achieve the full potential of the co-inference paradigm due to various computational-communication overhead… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted by DAC'24

  29. arXiv:2404.02478  [pdf, other

    cs.LG cs.AI

    FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning

    Authors: Rishub Tamirisa, Chulin Xie, Wenxuan Bao, Andy Zhou, Ron Arel, Aviv Shamsian

    Abstract: Standard federated learning approaches suffer when client data distributions have sufficient heterogeneity. Recent methods addressed the client data heterogeneity issue via personalized federated learning (PFL) - a class of FL algorithms aiming to personalize learned global knowledge to better suit the clients' local data distributions. Existing PFL methods usually decouple global updates in deep… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Published in CVPR 2024

  30. arXiv:2403.20150  [pdf, other

    cs.LG cs.AI cs.CY

    TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

    Authors: Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, Bin Yang

    Abstract: Time series are generated in diverse domains such as economic, traffic, health, and energy, where forecasting of future values has numerous important applications. Not surprisingly, many forecasting methods are being proposed. To ensure progress, it is essential to be able to study and compare such methods empirically in a comprehensive and reliable manner. To achieve this, we propose TFB, an auto… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: Directly accepted by PVLDB 2024

  31. Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

    Authors: Siyuan Shen, Yu Gao, Feng Liu, Hanyang Wang, Aimin Zhou

    Abstract: The mainstream paradigm of speech emotion recognition (SER) is identifying the single emotion label of the entire utterance. This line of works neglect the emotion dynamics at fine temporal granularity and mostly fail to leverage linguistic information of speech signal explicitly. In this paper, we propose Emotion Neural Transducer for fine-grained speech emotion recognition with automatic speech… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted by 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  32. arXiv:2403.18192  [pdf, other

    cs.LG

    Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced Samples

    Authors: Ao Zhou, Bin Liu, ** Wang, Grigorios Tsoumakas

    Abstract: Deep neural network models have demonstrated their effectiveness in classifying multi-label data from various domains. Typically, they employ a training mode that combines mini-batches with optimizers, where each sample is randomly selected with equal probability when constructing mini-batches. However, the intrinsic class imbalance in multi-label data may bias the model towards majority labels, s… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  33. arXiv:2403.17728  [pdf, other

    cs.LG

    Masked Autoencoders are PDE Learners

    Authors: Anthony Zhou, Amir Barati Farimani

    Abstract: Neural solvers for partial differential equations (PDEs) have great potential to generate fast and accurate physics solutions, yet their practicality is currently limited by their generalizability. PDEs evolve over broad scales and exhibit diverse behaviors; predicting these phenomena will require learning representations across a wide variety of inputs which may encompass different coefficients,… ▽ More

    Submitted 29 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 27 pages, 10 figures

  34. arXiv:2403.14624  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

    Authors: Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li

    Abstract: The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered unparalleled attention, due to their superior performance in visual contexts. However, their capabilities in visual math problem-solving remain insufficiently evaluated and understood. We investigate current benchmarks to incorporate excessive visual content within textual questions, which potentially assist MLLMs in… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 46 Pages, Work in Progress, Benchmark Project Page: https://mathverse-cuhk.github.io

  35. arXiv:2403.14413  [pdf, other

    cs.NE cs.LG

    Model Uncertainty in Evolutionary Optimization and Bayesian Optimization: A Comparative Analysis

    Authors: Hao Hao, Xiaoqun Zhang, Aimin Zhou

    Abstract: Black-box optimization problems, which are common in many real-world applications, require optimization through input-output interactions without access to internal workings. This often leads to significant computational resources being consumed for simulations. Bayesian Optimization (BO) and Surrogate-Assisted Evolutionary Algorithm (SAEA) are two widely used gradient-free optimization techniques… ▽ More

    Submitted 22 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  36. arXiv:2403.09717  [pdf, other

    cs.HC cs.AI cs.CL cs.CY

    Enhancing Depression-Diagnosis-Oriented Chat with Psychological State Tracking

    Authors: Yiyang Gu, Yougen Zhou, Qin Chen, Ningning Zhou, Jie Zhou, Aimin Zhou, Liang He

    Abstract: Depression-diagnosis-oriented chat aims to guide patients in self-expression to collect key symptoms for depression detection. Recent work focuses on combining task-oriented dialogue and chitchat to simulate the interview-based depression diagnosis. Whereas, these methods can not well capture the changing information, feelings, or symptoms of the patient during dialogues. Moreover, no explicit fra… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  37. arXiv:2403.00881  [pdf, other

    cs.LG cs.DC cs.NI

    FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked RDMA Transmission

    Authors: Zeling Zhang, Dongqi Cai, Yiran Zhang, Mengwei Xu, Shangguang Wang, Ao Zhou

    Abstract: Communication overhead is a significant bottleneck in federated learning (FL), which has been exaggerated with the increasing size of AI models. In this paper, we propose FedRDMA, a communication-efficient cross-silo FL system that integrates RDMA into the FL communication protocol. To overcome the limitations of RDMA in wide-area networks (WANs), FedRDMA divides the updated model into chunks and… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: under review

  38. arXiv:2402.16352  [pdf, other

    cs.CL cs.AI

    MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs

    Authors: Zimu Lu, Aojun Zhou, Houxing Ren, Ke Wang, Weikang Shi, Junting Pan, Mingjie Zhan, Hongsheng Li

    Abstract: Large language models (LLMs) have exhibited great potential in mathematical reasoning. However, there remains a performance gap in this area between existing open-source models and closed-source models such as GPT-4. In this paper, we introduce MathGenie, a novel method for generating diverse and reliable math problems from a small-scale problem-solution dataset (denoted as seed data). We augment… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  39. arXiv:2402.14800  [pdf, other

    cs.CL cs.AI cs.LG

    Not All Experts are Equal: Efficient Expert Pruning and Skip** for Mixture-of-Experts Large Language Models

    Authors: Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, Hongsheng Li

    Abstract: A pivotal advancement in the progress of large language models (LLMs) is the emergence of the Mixture-of-Experts (MoE) LLMs. Compared to traditional LLMs, MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. Different from previous weight pruning methods that rely on specifically designed hardware, this paper mainl… ▽ More

    Submitted 30 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Mixture-of-Experts Large Language Models, ACL2024

  40. arXiv:2402.10751  [pdf, other

    cs.CY

    Another Body in the World: Flusserian Freedom in Mixed Reality

    Authors: Aven Le Zhou, Lei Xi, Kang Zhang

    Abstract: In Flusserian view of media history, humans often misperceive the world projected by media to be the world itself, leading to a loss of freedom. This paper examines Flusserian Freedom in the context of Mixed Reality (MR) and explores how humans can recognize the obscuration of the world within the media (i.e., MR) and understand their relationship. The authors investigate the concept of playing ag… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Under review

  41. arXiv:2402.06389  [pdf, other

    cs.AI cs.HC cs.MM

    Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example

    Authors: Aven-Le Zhou, Yu-Ao Wang, Wei Wu, Kang Zhang

    Abstract: With the advancement of neural generative capabilities, the art community has actively embraced GenAI (generative artificial intelligence) for creating painterly content. Large text-to-image models can quickly generate aesthetically pleasing outcomes. However, the process can be non-deterministic and often involves tedious trial-and-error, as users struggle with formulating effective prompts to ac… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: 9 pages, 10 figures

  42. arXiv:2402.05232  [pdf, other

    cs.LG cs.AI

    Universal Neural Functionals

    Authors: Allan Zhou, Chelsea Finn, James Harrison

    Abstract: A challenging problem in many modern machine learning tasks is to process weight-space features, i.e., to transform or extract information from the weights and gradients of a neural network. Recent works have developed promising weight-space models that are equivariant to the permutation symmetries of simple feedforward networks. However, they are not applicable to general architectures, since the… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  43. arXiv:2402.03299  [pdf, other

    cs.LG cs.CL cs.CV

    GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models

    Authors: Haibo **, Ruoxi Chen, Andy Zhou, Yang Zhang, Haohan Wang

    Abstract: The discovery of "jailbreaks" to bypass safety filters of Large Language Models (LLMs) and harmful responses have encouraged the community to implement safety measures. One major safety measure is to proactively test the LLMs with jailbreaks prior to the release. Therefore, such testing will require a method that can generate jailbreaks massively and efficiently. In this paper, we follow a novel y… ▽ More

    Submitted 30 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 28 papges

  44. arXiv:2402.02377  [pdf, other

    cs.CV cs.LG

    NOAH: Learning Pairwise Object Category Attentions for Image Classification

    Authors: Chao Li, Aojun Zhou, Anbang Yao

    Abstract: A modern deep neural network (DNN) for image classification tasks typically consists of two parts: a backbone for feature extraction, and a head for feature encoding and class predication. We observe that the head structures of mainstream DNNs adopt a similar feature encoding pipeline, exploiting global feature dependencies while disregarding local ones. In this paper, we revisit the feature encod… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: This research work was completed in 2023. Code and pre-trained models are available at https://github.com/OSVAI/NOAH

  45. arXiv:2402.01034  [pdf

    eess.IV cs.CV

    VISION-MAE: A Foundation Model for Medical Image Segmentation and Classification

    Authors: Zelong Liu, Andrew Tieu, Nikhil Patel, Alexander Zhou, George Soultanidis, Zahi A. Fayad, Timothy Deyer, Xueyan Mei

    Abstract: Artificial Intelligence (AI) has the potential to revolutionize diagnosis and segmentation in medical imaging. However, development and clinical implementation face multiple challenges including limited data availability, lack of generalizability, and the necessity to incorporate multi-modal data effectively. A foundation model, which is a large-scale pre-trained AI model, offers a versatile base… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  46. arXiv:2402.01031  [pdf

    eess.IV cs.CV

    MRAnnotator: A Multi-Anatomy Deep Learning Model for MRI Segmentation

    Authors: Alexander Zhou, Zelong Liu, Andrew Tieu, Nikhil Patel, Sean Sun, Anthony Yang, Peter Choi, Valentin Fauveau, George Soultanidis, Mingqian Huang, Amish Doshi, Zahi A. Fayad, Timothy Deyer, Xueyan Mei

    Abstract: Purpose To develop a deep learning model for multi-anatomy and many-class segmentation of diverse anatomic structures on MRI imaging. Materials and Methods In this retrospective study, two datasets were curated and annotated for model development and evaluation. An internal dataset of 1022 MRI sequences from various clinical sites within a health system and an external dataset of 264 MRI sequenc… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  47. arXiv:2401.17644  [pdf, other

    cs.DC cs.PF

    BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems

    Authors: Yuxin Wang, Yuhan Chen, Zeyu Li, Xueze Kang, Zhenheng Tang, Xin He, Rui Guo, Xin Wang, Qiang Wang, Amelie Chi Zhou, Xiaowen Chu

    Abstract: Serving systems for Large Language Models (LLMs) are often optimized to improve quality of service (QoS) and throughput. However, due to the lack of open-sourced LLM serving workloads, these systems are frequently evaluated under unrealistic workload assumptions. Consequently, performance may degrade when these systems are deployed in real-world scenarios. This work presents BurstGPT, an LLM servi… ▽ More

    Submitted 17 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

  48. arXiv:2401.17263  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

    Authors: Andy Zhou, Bo Li, Haohan Wang

    Abstract: Despite advances in AI alignment, large language models (LLMs) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce unwanted behavior. While some defenses have been proposed, they have not been adapted to newly proposed attacks and more challenging threat models. To address this, we propose an optimization-based objective for defending LLMs ag… ▽ More

    Submitted 5 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Code available at https://github.com/lapisrocks/rpo

  49. arXiv:2401.13870  [pdf, other

    cs.IR

    Integrating Large Language Models into Recommendation via Mutual Augmentation and Adaptive Aggregation

    Authors: Sichun Luo, Yuxuan Yao, Bowei He, Yinya Huang, Aojun Zhou, Xinyi Zhang, Yuanzhang Xiao, Mingjie Zhan, Linqi Song

    Abstract: Conventional recommendation methods have achieved notable advancements by harnessing collaborative or sequential information from user behavior. Recently, large language models (LLMs) have gained prominence for their capabilities in understanding and reasoning over textual semantics, and have found utility in various domains, including recommendation. Conventional recommendation methods and LLMs e… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  50. arXiv:2401.12934  [pdf, other

    stat.ML cs.LG math.OC

    Reward-Relevance-Filtered Linear Offline Reinforcement Learning

    Authors: Angela Zhou

    Abstract: This paper studies offline reinforcement learning with linear function approximation in a setting with decision-theoretic, but not estimation sparsity. The structural restrictions of the data-generating process presume that the transitions factor into a sparse component that affects the reward and could affect additional exogenous dynamics that do not affect the reward. Although the minimally suff… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: conference version accepted at AISTATS 2024