Skip to main content

Showing 1–50 of 256 results for author: Zhu, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11939  [pdf, other

    cs.LG cs.AI cs.CL

    From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

    Authors: Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica

    Abstract: The rapid evolution of language models has necessitated the development of more challenging benchmarks. Current static benchmarks often struggle to consistently distinguish between the capabilities of different models and fail to align with real-world user preferences. On the other hand, live crowd-sourced platforms like the Chatbot Arena collect a wide range of natural prompts and user feedback.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.09838  [pdf, other

    cs.CV cs.AI

    Vision-Language Models Meet Meteorology: Develo** Models for Extreme Weather Events Detection with Heatmaps

    Authors: Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Zixuan Yuan, Bing Zhu, Junwei Liang

    Abstract: Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, the… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2406.06563  [pdf, other

    cs.CL cs.AI

    Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

    Authors: Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  4. arXiv:2406.05898  [pdf, other

    cs.IR cs.AI cs.LG

    Async Learned User Embeddings for Ads Delivery Optimization

    Authors: Mingwei Tang, Meng Liu, Hong Li, Junjie Yang, Chenglin Wei, Boyang Li, Dai Li, Rengan Xu, Yifan Xu, Zehua Zhang, Xiangyu Wang, Linfeng Liu, Yuelei Xie, Chengye Liu, Labib Fawaz, Li Li, Hongnan Wang, Bill Zhu, Sri Reddy

    Abstract: In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based mul… ▽ More

    Submitted 23 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by workshop on Multimodal Representation and Retrieval at SIGIR 2024, Washington DC

  5. arXiv:2406.00605  [pdf, other

    cs.CL cs.AI

    LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

    Authors: Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, Yimeng Gan, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: We introduce LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens. We provide a training recipe for efficiently extending context length of LLMs. We identify that the critical element in enhancing long-context processing capability is to incorporate a long-context SFT stage following the standard SFT stage. A mere 200 iterations can convert the standard… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  6. arXiv:2405.14903  [pdf, other

    physics.flu-dyn cs.AI cs.GR

    Neural Fluidic System Design and Control with Differentiable Simulation

    Authors: Yifei Li, Yuchen Sun, **chuan Ma, Eftychios Sifakis, Tao Du, Bo Zhu, Wojciech Matusik

    Abstract: We present a novel framework to explore neural control and design of complex fluidic systems with dynamic solid boundaries. Our system features a fast differentiable Navier-Stokes solver with solid-fluid interface handling, a low-dimensional differentiable parametric geometry representation, a control-shape co-design algorithm, and gym-like simulation environments to facilitate various fluidic con… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  7. arXiv:2405.13056  [pdf, other

    cs.CL cs.SI

    Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian

    Authors: Rohitash Chandra, Baicheng Zhu, Qingying Fang, Eka Shinjikashvili

    Abstract: During the COVID-19 pandemic, the news media coverage encompassed a wide range of topics that includes viral transmission, allocation of medical resources, and government response measures. There have been studies on sentiment analysis of social media platforms during COVID-19 to understand the public response given the rise of cases and government strategies implemented to control the spread of t… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  8. arXiv:2405.09980  [pdf, other

    cs.CL cs.AI

    FinTextQA: A Dataset for Long-form Financial Question Answering

    Authors: Jian Chen, Peilin Zhou, Yining Hua, Yingxin Loh, Kehui Chen, Ziyuan Li, Bing Zhu, Junwei Liang

    Abstract: Accurate evaluation of financial question answering (QA) systems necessitates a comprehensive dataset encompassing diverse question types and contexts. However, current financial QA datasets lack scope diversity and question complexity. This work introduces FinTextQA, a novel dataset for long-form question answering (LFQA) in finance. FinTextQA comprises 1,262 high-quality, source-attributed QA pa… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  9. Lagrangian Covector Fluid with Free Surface

    Authors: Zhiqi Li, Barnabás Börcsök, Duowen Chen, Yutong Sun, Bo Zhu, Greg Turk

    Abstract: This paper introduces a novel Lagrangian fluid solver based on covector flow maps. We aim to address the challenges of establishing a robust flow-map solver for incompressible fluids under complex boundary conditions. Our key idea is to use particle trajectories to establish precise flow maps and tailor path integrals of physical quantities along these trajectories to reformulate the Poisson probl… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 10 pages, 17 figures, SIGGRAPH Conference Papers '24

  10. Eulerian-Lagrangian Fluid Simulation on Particle Flow Maps

    Authors: Junwei Zhou, Duowen Chen, Molin Deng, Yitong Deng, Yuchen Sun, Sinan Wang, Shiying Xiong, Bo Zhu

    Abstract: We propose a novel Particle Flow Map (PFM) method to enable accurate long-range advection for incompressible fluid simulation. The foundation of our method is the observation that a particle trajectory generated in a forward simulation naturally embodies a perfect flow map. Centered on this concept, we have developed an Eulerian-Lagrangian framework comprising four essential components: Lagrangian… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  11. arXiv:2405.05164  [pdf, other

    cs.CV

    ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion

    Authors: Bing Zhu, Zixin He, Weiyi Xiong, Guanhua Ding, Jianan Liu, Tao Huang, Wei Chen, Wei Xiang

    Abstract: Millimeter wave (mmWave) radar is a non-intrusive privacy and relatively convenient and inexpensive device, which has been demonstrated to be applicable in place of RGB cameras in human indoor pose estimation tasks. However, mmWave radar relies on the collection of reflected signals from the target, and the radar signals containing information is difficult to be fully applied. This has been a long… ▽ More

    Submitted 28 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  12. arXiv:2404.19164  [pdf, ps, other

    cs.CG

    Optimal Bridge, Twin Bridges and Beyond: Inserting Edges into a Road Network to Minimize the Constrained Diameters

    Authors: Zhidan Feng, Henning Fernau, Binhai Zhu

    Abstract: Given a road network modelled as a planar straight-line graph $G=(V,E)$ with $|V|=n$, let $(u,v)\in V\times V$, the shortest path (distance) between $u,v$ is denoted as $δ_G(u,v)$. Let $δ(G)=\max_{(u,v)}δ_G(u,v)$, for $(u,v)\in V\times V$, which is called the diameter of $G$. Given a disconnected road network modelled as two disjoint trees $T_1$ and $T_2$, this paper first aims at inserting one an… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 18 pages, 5 figures

    MSC Class: 68 ACM Class: F.2.2

  13. arXiv:2404.13680  [pdf, other

    cs.CV cs.AI

    Zero-shot High-fidelity and Pose-controllable Character Animation

    Authors: Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, **gwen Su, **xiu Liu, Yanhao Zhang, Zuxuan Wu, Guo-Jun Qi, Yu-Gang Jiang

    Abstract: Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding. To address these limitations,… ▽ More

    Submitted 5 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures

  14. arXiv:2404.13671  [pdf, other

    cs.CV cs.LG

    FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

    Authors: Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, **qiao Wang

    Abstract: Zero-shot anomaly detection (ZSAD) methods entail detecting anomalies directly without access to any known normal or abnormal samples within the target item categories. Existing approaches typically rely on the robust generalization capabilities of multimodal pretrained models, computing similarities between manually crafted textual features representing "normal" or "abnormal" semantics and image… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  15. arXiv:2404.12777  [pdf, other

    cs.CV

    EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation

    Authors: Wenkai Liu, Tao Guan, Bin Zhu, Lili Ju, Zikai Song, Dan Li, Yuesong Wang, Wei Yang

    Abstract: In the domain of 3D scene representation, 3D Gaussian Splatting (3DGS) has emerged as a pivotal technology. However, its application to large-scale, high-resolution scenes (exceeding 4k$\times$4k pixels) is hindered by the excessive computational requirements for managing a large number of Gaussians. Addressing this, we introduce 'EfficientGS', an advanced approach that optimizes 3DGS for high-res… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  16. arXiv:2404.10357  [pdf, other

    cs.CV

    Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models

    Authors: Enming Zhang, Bingke Zhu, Yingying Chen, Qinghai Miao, Ming Tang, **qiao Wang

    Abstract: Vision-Language Models (VLMs), such as CLIP, play a foundational role in various cross-modal applications. To fully leverage VLMs' potential in adapting to downstream tasks, context optimization methods like Prompt Tuning are essential. However, one key limitation is the lack of diversity in prompt templates, whether they are hand-crafted or learned through additional modules. This limitation rest… ▽ More

    Submitted 16 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  17. arXiv:2404.01240  [pdf, other

    cs.SE cs.CL cs.CV cs.HC

    AURORA: Navigating UI Tarpits via Automated Neural Screen Understanding

    Authors: Safwat Ali Khan, Wenyu Wang, Yiran Ren, Bin Zhu, Jiangfan Shi, Alyssa McGowan, Wing Lam, Kevin Moran

    Abstract: Nearly a decade of research in software engineering has focused on automating mobile app testing to help engineers in overcoming the unique challenges associated with the software platform. Much of this work has come in the form of Automated Input Generation tools (AIG tools) that dynamically explore app screens. However, such tools have repeatedly been demonstrated to achieve lower-than-expected… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Published at 17th IEEE International Conference on Software Testing, Verification and Validation (ICST) 2024, 12 pages

  18. arXiv:2403.07403  [pdf, other

    cs.CV cs.AI

    From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

    Authors: Guoshan Liu, Yang Jiao, **g**g Chen, Bin Zhu, Yu-Gang Jiang

    Abstract: The precise recognition of food categories plays a pivotal role for intelligent health management, attracting significant research attention in recent years. Prominent benchmarks, such as Food-101 and VIREO Food-172, provide abundant food image resources that catalyze the prosperity of research in this field. Nevertheless, these datasets are well-curated from canteen scenarios and thus deviate fro… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  19. arXiv:2403.07257  [pdf, other

    cs.AR cs.ET

    The Dawn of AI-Native EDA: Opportunities and Challenges of Large Circuit Models

    Authors: Lei Chen, Yiqi Chen, Zhufei Chu, Wenji Fang, Tsung-Yi Ho, Ru Huang, Yu Huang, Sadaf Khan, Min Li, Xingquan Li, Yu Li, Yun Liang, **wei Liu, Yi Liu, Yibo Lin, Guojie Luo, Zhengyuan Shi, Guangyu Sun, Dimitrios Tsaras, Runsheng Wang, Ziyi Wang, Xinming Wei, Zhiyao Xie, Qiang Xu, Chenhao Xue , et al. (14 additional authors not shown)

    Abstract: Within the Electronic Design Automation (EDA) domain, AI-driven solutions have emerged as formidable tools, yet they typically augment rather than redefine existing methodologies. These solutions often repurpose deep learning models from other domains, such as vision, text, and graph analytics, applying them to circuit design without tailoring to the unique complexities of electronic circuits. Suc… ▽ More

    Submitted 1 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: The authors are ordered alphabetically. Contact: qxu@cse[dot]cuhk[dot]edu[dot]hk, gluo@pku[dot]edu[dot]cn, yuan.mingxuan@huawei[dot]com

  20. arXiv:2403.07227  [pdf, ps, other

    cs.DS

    Noisy Computing of the Threshold Function

    Authors: Ziao Wang, Nadim Ghaddar, Banghua Zhu, Lele Wang

    Abstract: Let $\mathsf{TH}_k$ denote the $k$-out-of-$n$ threshold function: given $n$ input Boolean variables, the output is $1$ if and only if at least $k$ of the inputs are $1$. We consider the problem of computing the $\mathsf{TH}_k$ function using noisy readings of the Boolean variables, where each reading is incorrect with some fixed and known probability $p \in (0,1/2)$. As our main result, we show th… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  21. arXiv:2403.06423  [pdf, other

    eess.SP cs.RO

    LiDAR Point Cloud-based Multiple Vehicle Tracking with Probabilistic Measurement-Region Association

    Authors: Guanhua Ding, Jianan Liu, Yuxuan Xia, Tao Huang, Bing Zhu, **** Sun

    Abstract: Multiple extended target tracking (ETT) has gained increasing attention due to the development of high-precision LiDAR and radar sensors in automotive applications. For LiDAR point cloud-based vehicle tracking, this paper presents a probabilistic measurement-region association (PMRA) ETT model, which can describe the complex measurement distribution by partitioning the target extent into different… ▽ More

    Submitted 18 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures, accepted by the 27th International Conference on Information Fusion (FUSION 2024)

  22. arXiv:2403.04134  [pdf, other

    cs.RO

    An Adaptable, Safe, and Portable Robot-Assisted Feeding System

    Authors: Ethan Kroll Gordon, Rajat Kumar Jenamani, Amal Nanavati, Ziang Liu, Haya Bolotski, Raida Karim, Daniel Stabile, Atharva Kashyap, Bernie Hao Zhu, Xilai Dai, Tyler Schrenk, Jonathan Ko, Taylor Kessler Faulkner, Tapomayukh Bhattacharjee, Siddhartha Srinivasa

    Abstract: We demonstrate a robot-assisted feeding system that enables people with mobility impairments to feed themselves. Our system design embodies Safety, Portability, and User Control, with comprehensive full-stack safety checks, the ability to be mounted on and powered by any powered wheelchair, and a custom web-app allowing care-recipients to leverage their own assistive devices for robot control. For… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: HRI 2024 Demo; Corrected inaccurate author ordering in ACM DL which occurred due to formatting issues

  23. arXiv:2403.04132  [pdf, other

    cs.AI cs.CL

    Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

    Authors: Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, Ion Stoica

    Abstract: Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences. Our methodology employs a pairwise comparison approach and leverages input from a diverse user base through crowd… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  24. arXiv:2402.18133  [pdf, other

    cs.LG cs.CV

    Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

    Authors: Jiequan Cui, Beier Zhu, Xin Wen, Xiaojuan Qi, Bei Yu, Hanwang Zhang

    Abstract: In this paper, we present an empirical study on image recognition fairness, i.e., extreme class accuracy disparity on balanced data like ImageNet. We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets, network architectures, and model capacities. Moreover, several intriguing properties of fairness are id… ▽ More

    Submitted 12 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: CVPR 2024

  25. arXiv:2402.17785  [pdf, other

    cs.SD cs.AI eess.AS

    ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

    Authors: Xia Liang, Xingjian Du, Jiaju Lin, Pei Zou, Yuan Wan, Bilei Zhu

    Abstract: Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Eval… ▽ More

    Submitted 6 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  26. arXiv:2402.14891  [pdf, other

    cs.CL cs.AI

    LLMBind: A Unified Modality-Task Integration Framework

    Authors: Bin Zhu, Munan Ning, Peng **, Bin Lin, **fa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan

    Abstract: In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress. To address this challenge, we introduce \textbf{LLMBind}, a novel framework designed to unify a diverse array of multi-modal tasks. By harnessing a Mixture-of-Experts (MoE) Large Language Model (LLM), LLMBind processes multi-modal inputs and generates task-specific to… ▽ More

    Submitted 18 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  27. arXiv:2402.12692  [pdf, other

    cs.CL

    FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning

    Authors: Xiao Li, Bolin Zhu, Sichen Liu, Yin Zhu, Yiwei Liu, Gong Cheng

    Abstract: The application of formulas is a fundamental ability of humans when addressing numerical reasoning problems. However, existing numerical reasoning datasets seldom explicitly indicate the formulas employed during the reasoning steps. To bridge this gap, we construct a dataset for formula-based numerical reasoning called FormulaReasoning, which consists of 5,420 reasoning-based questions. We employ… ▽ More

    Submitted 12 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  28. arXiv:2402.12617  [pdf, other

    cs.CR cs.AI cs.CL cs.CY cs.LG

    Generative AI Security: Challenges and Countermeasures

    Authors: Banghua Zhu, Norman Mu, Jiantao Jiao, David Wagner

    Abstract: Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny. This paper delves into the unique security challenges posed by Generative AI, and outlines potential research directions for managing these risks.

    Submitted 19 February, 2024; originally announced February 2024.

  29. arXiv:2402.07485  [pdf, other

    cs.SD eess.AS

    MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning

    Authors: Hang Zhao, Yifei Xin, Zhesong Yu, Bilei Zhu, Lu Lu, Zejun Ma

    Abstract: In the realm of audio-language pre-training (ALP), the challenge of achieving cross-modal alignment is significant. Moreover, the integration of audio inputs with diverse distributions and task variations poses challenges in develo** generic audio-language models. In this study, we present MINT, a novel ALP framework boosting audio-language models through multi-target pre-training and instructio… ▽ More

    Submitted 11 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  30. Reliability quality measures for recommender systems

    Authors: Jesús Bobadilla, Abraham Gutierrez, Fernando Ortega, Bo Zhu

    Abstract: Users want to know the reliability of the recommendations; they do not accept high predictions if there is no reliability evidence. Recommender systems should provide reliability values associated with the predictions. Research into reliability measures requires the existence of simple, plausible and universal reliability quality measures. Research into recommender system quality measures has focu… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Journal ref: Information Sciences 442-443, 145-157 (2018)

  31. arXiv:2402.02335  [pdf

    cs.CV cs.IR

    Video Editing for Video Retrieval

    Authors: Bin Zhu, Kevin Flanagan, Adriano Fragomeni, Michael Wray, Dima Damen

    Abstract: Though pre-training vision-language models have demonstrated significant benefits in boosting video-text retrieval performance from large-scale web videos, fine-tuning still plays a critical role with manually annotated clips with start and end times, which requires considerable human effort. To address this issue, we explore an alternative cheaper source of annotations, single timestamps, for vid… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  32. arXiv:2402.01173  [pdf, other

    cs.CL cs.LG

    Efficient Prompt Caching via Embedding Similarity

    Authors: Hanlin Zhu, Banghua Zhu, Jiantao Jiao

    Abstract: Large language models (LLMs) have achieved huge success in numerous natural language process (NLP) tasks. However, it faces the challenge of significant resource consumption during inference. In this paper, we aim to improve the inference efficiency of LLMs by prompt caching, i.e., if the current prompt can be answered by the same response of a previous prompt, one can directly utilize that previo… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 21 pages, 3 figures

  33. CF4J: Collaborative Filtering for Java

    Authors: Fernando Ortega, Bo Zhu, Jesus Bobadilla, Antonio Hernando

    Abstract: Recommender Systems (RS) provide a relevant tool to mitigate the information overload problem. A large number of researchers have published hundreds of papers to improve different RS features. It is advisable to use RS frameworks that simplify RS researchers: a) to design and implement recommendations methods and, b) to speed up the execution time of the experiments. In this paper, we present CF4J… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Journal ref: Knowledge-Based Systems, 152, 94-99 (2018)

  34. arXiv:2401.16335  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

    Authors: Banghua Zhu, Michael I. Jordan, Jiantao Jiao

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns language models closely with human-centric values. The initial phase of RLHF involves learning human values using a reward model from ranking data. It is observed that the performance of the reward model degrades after one epoch of training, and optimizing too much against the learned reward model eventually hinde… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  35. arXiv:2401.15947  [pdf, other

    cs.CV

    MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

    Authors: Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng **, **fa Huang, Junwu Zhang, Munan Ning, Li Yuan

    Abstract: Recent advances demonstrate that scaling Large Vision-Language Models (LVLMs) effectively improves downstream task performances. However, existing scaling methods enable all model parameters to be active for each token in the calculation, which brings massive training and inferring costs. In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs. This strategy innovati… ▽ More

    Submitted 16 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: update table 5

  36. arXiv:2401.11704  [pdf, other

    cs.CV

    EK-Net:Real-time Scene Text Detection with Expand Kernel Distance

    Authors: Boyuan Zhu, Fagui Liu, Xi Chen, Quan Tang

    Abstract: Recently, scene text detection has received significant attention due to its wide application. However, accurate detection in complex scenes of multiple scales, orientations, and curvature remains a challenge. Numerous detection methods adopt the Vatti clip** (VC) algorithm for multiple-instance training to address the issue of arbitrary-shaped text. Yet we identify several bias results from the… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

  37. UAV Trajectory Planning for AoI-Minimal Data Collection in UAV-Aided IoT Networks by Transformer

    Authors: Botao Zhu, Ebrahim Bedeer, Ha H. Nguyen, Robert Barton, Zhen Gao

    Abstract: Maintaining freshness of data collection in Internet-of-Things (IoT) networks has attracted increasing attention. By taking into account age-of-information (AoI), we investigate the trajectory planning problem of an unmanned aerial vehicle (UAV) that is used to aid a cluster-based IoT network. An optimization problem is formulated to minimize the total AoI of the collected data by the UAV from the… ▽ More

    Submitted 8 November, 2023; originally announced January 2024.

    Journal ref: IEEE TWC, 2023

  38. arXiv:2401.01656  [pdf, other

    cs.GT cs.AI

    Deep Automated Mechanism Design for Integrating Ad Auction and Allocation in Feed

    Authors: Xuejian Li, Ze Wang, Bingqi Zhu, Fei He, Yongkang Wang, Xingxing Wang

    Abstract: E-commerce platforms usually present an ordered list, mixed with several organic items and an advertisement, in response to each user's page view request. This list, the outcome of ad auction and allocation processes, directly impacts the platform's ad revenue and gross merchandise volume (GMV). Specifically, the ad auction determines which ad is displayed and the corresponding payment, while the… ▽ More

    Submitted 11 April, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: 9 pages, 2 figures, Posting

  39. arXiv:2401.00588  [pdf, other

    cs.AI cs.LG cs.PF

    Fairness in Serving Large Language Models

    Authors: Ying Sheng, Shiyi Cao, Dacheng Li, Banghua Zhu, Zhuohan Li, Danyang Zhuo, Joseph E. Gonzalez, Ion Stoica

    Abstract: High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading. To ensure that all client requests are processed fairly, most major LLM inference services have request rate limits, to ensure that no client can dominate the request queue. However, this rudimentary notion of fairness also results in under-utilizatio… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  40. arXiv:2312.14991  [pdf, other

    cs.CV

    FoodLMM: A Versatile Food Assistant using Large Multi-modal Model

    Authors: Yuehao Yin, Huiyan Qi, Bin Zhu, **g**g Chen, Yu-Gang Jiang, Chong-Wah Ngo

    Abstract: Large Multi-modal Models (LMMs) have made impressive progress in many vision-language tasks. Nevertheless, the performance of general LMMs in specific domains is still far from satisfactory. This paper proposes FoodLMM, a versatile food assistant based on LMMs with various capabilities, including food recognition, ingredient recognition, recipe generation, nutrition estimation, food segmentation a… ▽ More

    Submitted 12 April, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

  41. arXiv:2312.14635  [pdf, other

    cs.GR cs.AI cs.CV cs.LG physics.flu-dyn

    Fluid Simulation on Neural Flow Maps

    Authors: Yitong Deng, Hong-Xing Yu, Diyang Zhang, Jiajun Wu, Bo Zhu

    Abstract: We introduce Neural Flow Maps, a novel simulation method bridging the emerging paradigm of implicit neural representations with fluid simulation based on the theory of flow maps, to achieve state-of-the-art simulation of inviscid fluid phenomena. We devise a novel hybrid neural field representation, Spatially Sparse Neural Fields (SSNF), which fuses small neural networks with a pyramid of overlapp… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Journal ref: ACM Trans. Graph. 42, 6, Article 248 (December 2023), 21 pages

  42. arXiv:2312.14197  [pdf, other

    cs.CL cs.AI

    Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models

    Authors: **gwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, Fangzhao Wu

    Abstract: The integration of large language models (LLMs) with external content has enabled more up-to-date and wide-ranging applications of LLMs, such as Microsoft Copilot. However, this integration has also exposed LLMs to the risk of indirect prompt injection attacks, where an attacker can embed malicious instructions within external content, compromising LLM output and causing responses to deviate from… ▽ More

    Submitted 8 March, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  43. arXiv:2312.08369  [pdf, other

    stat.ML cs.AI cs.LG

    The Effective Horizon Explains Deep RL Performance in Stochastic Environments

    Authors: Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan

    Abstract: Reinforcement learning (RL) theory has largely focused on proving minimax sample complexity bounds. These require strategic exploration algorithms that use relatively limited function classes for representing the policy or value function. Our goal is to explain why deep RL algorithms often perform well in practice, despite using random exploration and much more expressive function classes like neu… ▽ More

    Submitted 12 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Journal ref: ICLR 2024 (Spotlight)

  44. arXiv:2312.07930  [pdf, other

    cs.LG cs.CL cs.CR cs.IT stat.ML

    Towards Optimal Statistical Watermarking

    Authors: Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

    Abstract: We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error. We characterize the… ▽ More

    Submitted 6 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  45. arXiv:2312.06561  [pdf, other

    cs.CV cs.GR

    Inferring Hybrid Neural Fluid Fields from Videos

    Authors: Hong-Xing Yu, Yang Zheng, Yuan Gao, Yitong Deng, Bo Zhu, Jiajun Wu

    Abstract: We study recovering fluid density and velocity from sparse multiview videos. Existing neural dynamic reconstruction methods predominantly rely on optical flows; therefore, they cannot accurately estimate the density and uncover the underlying velocity due to the inherent visual ambiguities of fluid velocity, as fluids are often shapeless and lack stable visual features. The challenge is further pr… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023. Project website: https://kovenyu.com/HyFluid/ The first two authors contribute equally

  46. arXiv:2312.04763  [pdf, other

    cs.IR

    CAR: Consolidation, Augmentation and Regulation for Recipe Retrieval

    Authors: Fangzhou Song, Bin Zhu, Yanbin Hao, Shuo Wang, Xiangnan He

    Abstract: Learning recipe and food image representation in common embedding space is non-trivial but crucial for cross-modal recipe retrieval. In this paper, we propose CAR framework with three novel techniques, i.e., Consolidation, Augmentation and Regulation, for cross-modal recipe retrieval. We introduce adapter layers to consolidate pre-trained CLIP model with much less computation cost than fully cumbe… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  47. arXiv:2311.18164  [pdf, other

    q-fin.GN cs.CE

    The Paradox Of Just-in-Time Liquidity in Decentralized Exchanges: More Providers Can Sometimes Mean Less Liquidity

    Authors: Agostino Capponi, Ruizhe Jia, Brian Zhu

    Abstract: We study Just-in-time (JIT) liquidity provision in blockchain-based decentralized exchanges. A JIT liquidity provider (LP) monitors pending swap orders in public mempools of blockchains to sandwich orders of their choice with liquidity, depositing right before and withdrawing right after the order. Our game-theoretic model with asymmetrically informed agents reveals that a JIT LP's presence does n… ▽ More

    Submitted 15 February, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  48. arXiv:2311.16103  [pdf, other

    cs.CV cs.AI

    Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

    Authors: Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu Yuan, Dongdong Chen, Li Yuan

    Abstract: Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries. In pursuit of the ultimate goal of achieving artificial general intelligence, a truly intelligent Video-LLM model should not only see and understand the surroundings, but also possess human-level commonsense, a… ▽ More

    Submitted 28 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Benchmark is available at https://github.com/PKU-YuanGroup/Video-Bench

  49. arXiv:2311.10122  [pdf, other

    cs.CV

    Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

    Authors: Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng **, Li Yuan

    Abstract: The Large Vision-Language Model (LVLM) has enhanced the performance of various downstream tasks in visual-language understanding. Most existing approaches encode images and videos into separate feature spaces, which are then fed as inputs to large language models. However, due to the lack of unified tokenization for images and videos, namely misalignment before projection, it becomes challenging f… ▽ More

    Submitted 21 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

  50. arXiv:2311.03285  [pdf, other

    cs.LG cs.AI cs.DC

    S-LoRA: Serving Thousands of Concurrent LoRA Adapters

    Authors: Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

    Abstract: The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched in… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 November, 2023; originally announced November 2023.