Skip to main content

Showing 1–50 of 262 results for author: Tang, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01163  [pdf, other

    cs.LG cs.CV

    Benchmarking Predictive Coding Networks -- Made Simple

    Authors: Luca Pinchetti, Chang Qi, Oleh Lokshyn, Gaspard Olivers, Cornelius Emde, Mufeng Tang, Amine M'Charrak, Simon Frieder, Bayar Menzat, Rafal Bogacz, Thomas Lukasiewicz, Tommaso Salvatori

    Abstract: In this work, we tackle the problems of efficiency and scalability for predictive coding networks in machine learning. To do so, we first propose a library called PCX, whose focus lies on performance and simplicity, and provides a user-friendly, deep-learning oriented interface. Second, we use PCX to implement a large set of benchmarks for the community to use for their experiments. As most works… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 33 pages, 25 figures

    ACM Class: I.2.6

  2. arXiv:2407.01103  [pdf, other

    cs.RO

    FedRC: A Rapid-Converged Hierarchical Federated Learning Framework in Street Scene Semantic Understanding

    Authors: Wei-Bin Kou, Qingfeng Lin, Ming Tang, Shuai Wang, Guangxu Zhu, Yik-Chung Wu

    Abstract: Street Scene Semantic Understanding (denoted as TriSU) is a crucial but complex task for world-wide distributed autonomous driving (AD) vehicles (e.g., Tesla). Its inference model faces poor generalization issue due to inter-city domain-shift. Hierarchical Federated Learning (HFL) offers a potential solution for improving TriSU model generalization, but suffers from slow convergence rate because o… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This work has been accepted by 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  3. arXiv:2406.18009  [pdf, other

    eess.AS cs.SD

    E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

    Authors: Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Xu Tan, Yanqing Liu, Sheng Zhao, Naoyuki Kanda

    Abstract: This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.16317  [pdf

    cs.SD eess.AS

    SNR-Progressive Model with Harmonic Compensation for Low-SNR Speech Enhancement

    Authors: Zhongshu Hou, Qinwen Hu, Zhanzhong Cao, Ming Tang, **g Lu

    Abstract: Despite significant progress made in the last decade, deep neural network (DNN) based speech enhancement (SE) still faces the challenge of notable degradation in the quality of recovered speech under low signal-to-noise ratio (SNR) conditions. In this letter, we propose an SNR-progressive speech enhancement model with harmonic compensation for low-SNR SE. Reliable pitch estimation is obtained from… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.15885  [pdf, other

    cs.SD cs.AI eess.AS

    The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

    Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, ** Wang, Hai Zhao

    Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  6. arXiv:2406.14056  [pdf, other

    cs.CV

    VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning

    Authors: Ziyang Meng, Yu Dai, Zezheng Gong, Shaoxiong Guo, Minglong Tang, Tongquan Wei

    Abstract: Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in ha… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages

    MSC Class: 68-04 68-04 ACM Class: I.2.7; I.2.10

  7. arXiv:2406.05898  [pdf, other

    cs.IR cs.AI cs.LG

    Async Learned User Embeddings for Ads Delivery Optimization

    Authors: Mingwei Tang, Meng Liu, Hong Li, Junjie Yang, Chenglin Wei, Boyang Li, Dai Li, Rengan Xu, Yifan Xu, Zehua Zhang, Xiangyu Wang, Linfeng Liu, Yuelei Xie, Chengye Liu, Labib Fawaz, Li Li, Hongnan Wang, Bill Zhu, Sri Reddy

    Abstract: In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based mul… ▽ More

    Submitted 23 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by workshop on Multimodal Representation and Retrieval at SIGIR 2024, Washington DC

  8. arXiv:2406.05699  [pdf, ps, other

    eess.AS cs.AI eess.SP

    An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

    Authors: Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, **zhu Li, Sheng Zhao, **yu Li, Naoyuki Kanda

    Abstract: Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue. In this paper, we explored various strategies to enhance the quality of audi… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH2024

  9. arXiv:2406.03744  [pdf, other

    cs.CV cs.LG

    ReDistill: Residual Encoded Distillation for Peak Memory Reduction

    Authors: Fang Chen, Gourav Datta, Mujahid Al Rafi, Hyeran Jeon, Meng Tang

    Abstract: The expansion of neural network sizes and the enhancement of image resolution through modern camera sensors result in heightened memory and power demands for neural networks. Reducing peak memory, which is the maximum memory consumed during the execution of a neural network, is critical to deploy neural networks on edge devices with limited memory budget. A naive approach to reducing peak memory i… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  10. arXiv:2405.06697  [pdf, other

    cs.CL cs.AI

    Automated Conversion of Static to Dynamic Scheduler via Natural Language

    Authors: Paul Mingzheng Tang, Kenji Kah Hoe Leong, Nowshad Shaik, Hoong Chuin Lau

    Abstract: In this paper, we explore the potential application of Large Language Models (LLMs) that will automatically model constraints and generate code for dynamic scheduling problems given an existing static model. Static scheduling problems are modelled and coded by optimization experts. These models may be easily obsoleted as the underlying constraints may need to be fine-tuned in order to reflect chan… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 7 pages (excluding appendix), 10 figures, 3 tables

  11. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  12. arXiv:2405.04146  [pdf, other

    cs.RO cs.DC

    pFedLVM: A Large Vision Model (LVM)-Driven and Latent Feature-Based Personalized Federated Learning Framework in Autonomous Driving

    Authors: Wei-Bin Kou, Qingfeng Lin, Ming Tang, Sheng Xu, Rongguang Ye, Yang Leng, Shuai Wang, Guofa Li, Zhenyu Chen, Guangxu Zhu, Yik-Chung Wu

    Abstract: Deep learning-based Autonomous Driving (AD) models often exhibit poor generalization due to data heterogeneity in an ever domain-shifting environment. While Federated Learning (FL) could improve the generalization of an AD model (known as FedAD system), conventional models often struggle with under-fitting as the amount of accumulated training data progressively increases. To address this issue, i… ▽ More

    Submitted 17 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: This paper was submitted to CVPR 2024 in Nov. 2023

  13. arXiv:2404.19518  [pdf, other

    cs.MA cs.AI cs.RO

    MGCBS: An Optimal and Efficient Algorithm for Solving Multi-Goal Multi-Agent Path Finding Problem

    Authors: Mingkai Tang, Yuanhang Li, Hongji Liu, Yingbing Chen, Ming Liu, Lujia Wang

    Abstract: With the expansion of the scale of robotics applications, the multi-goal multi-agent pathfinding (MG-MAPF) problem began to gain widespread attention. This problem requires each agent to visit pre-assigned multiple goal points at least once without conflict. Some previous methods have been proposed to solve the MG-MAPF problem based on Decoupling the goal Vertex visiting order search and the Singl… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: to be published in IJCAI2024

  14. arXiv:2404.19220  [pdf, other

    stat.ML cs.LG

    Regression for matrix-valued data via Kronecker products factorization

    Authors: Yin-Jen Chen, Minh Tang

    Abstract: We study the matrix-variate regression problem $Y_i = \sum_{k} β_{1k} X_i β_{2k}^{\top} + E_i$ for $i=1,2\dots,n$ in the high dimensional regime wherein the response $Y_i$ are matrices whose dimensions $p_{1}\times p_{2}$ outgrow both the sample size $n$ and the dimensions $q_{1}\times q_{2}$ of the predictor variables $X_i$ i.e., $q_{1},q_{2} \ll n \ll p_{1},p_{2}$. We propose an estimation algor… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  15. arXiv:2404.15159  [pdf, other

    cs.CL cs.AI

    MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

    Authors: Dengchun Li, Yingzi Ma, Naizheng Wang, Zhengmao Ye, Zhiyuan Cheng, Yinghao Tang, Yan Zhang, Lei Duan, Jie Zuo, Cal Yang, Mingjie Tang

    Abstract: Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance in multi-task… ▽ More

    Submitted 23 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 18 pages, 5 figures

  16. arXiv:2404.13671  [pdf, other

    cs.CV cs.LG

    FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

    Authors: Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, **qiao Wang

    Abstract: Zero-shot anomaly detection (ZSAD) methods entail detecting anomalies directly without access to any known normal or abnormal samples within the target item categories. Existing approaches typically rely on the robust generalization capabilities of multimodal pretrained models, computing similarities between manually crafted textual features representing "normal" or "abnormal" semantics and image… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  17. arXiv:2404.10952  [pdf, other

    cs.CL cs.AI cs.PL

    Can Language Models Solve Olympiad Programming?

    Authors: Quan Shi, Michael Tang, Karthik Narasimhan, Shunyu Yao

    Abstract: Computing olympiads contain some of the most challenging problems for humans, requiring complex algorithmic reasoning, puzzle solving, in addition to generating efficient code. However, it has been understudied as a domain to evaluate language models (LMs). In this paper, we introduce the USACO benchmark with 307 problems from the USA Computing Olympiad, along with high-quality unit tests, referen… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Code and data: https://princeton-nlp.github.io/USACOBench/

  18. arXiv:2404.10357  [pdf, other

    cs.CV

    Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models

    Authors: Enming Zhang, Bingke Zhu, Yingying Chen, Qinghai Miao, Ming Tang, **qiao Wang

    Abstract: Vision-Language Models (VLMs), such as CLIP, play a foundational role in various cross-modal applications. To fully leverage VLMs' potential in adapting to downstream tasks, context optimization methods like Prompt Tuning are essential. However, one key limitation is the lack of diversity in prompt templates, whether they are hand-crafted or learned through additional modules. This limitation rest… ▽ More

    Submitted 16 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  19. arXiv:2404.08973  [pdf, other

    cs.LG cs.CY cs.DC

    PraFFL: A Preference-Aware Scheme in Fair Federated Learning

    Authors: Rongguang Ye, Ming Tang

    Abstract: Fairness in federated learning has emerged as a critical concern, aiming to develop an unbiased model for any special group (e.g., male or female) of sensitive features. However, there is a trade-off between model performance and fairness, i.e., improving fairness will decrease model performance. Existing approaches have characterized such a trade-off by introducing hyperparameters to quantify cli… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 10 pages, 10 figures, and 1 table. This paper has been submitted to MobiHoc'24

  20. arXiv:2404.05039  [pdf, other

    cs.RO

    StaccaToe: A Single-Leg Robot that Mimics the Human Leg and Toe

    Authors: Nisal Perera, Shangqun Yu, Daniel Marew, Mack Tang, Ken Suzuki, Aidan McCormack, Shifan Zhu, Yong-Jae Kim, Donghyun Kim

    Abstract: We introduce StaccaToe, a human-scale, electric motor-powered single-leg robot designed to rival the agility of human locomotion through two distinctive attributes: an actuated toe and a co-actuation configuration inspired by the human leg. Leveraging the foundational design of HyperLeg's lower leg mechanism, we develop a stand-alone robot by incorporating new link designs, custom-designed power e… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  21. arXiv:2404.01712  [pdf, other

    cs.LG cs.AI

    Efficient and Generalizable Certified Unlearning: A Hessian-free Recollection Approach

    Authors: Xinbao Qiao, Meng Zhang, Ming Tang, Ermin Wei

    Abstract: Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data. Recent advances suggest precomputing and storing statistics extracted from second-order information and implementing unlearning through Newton-style updates. However, the theoretical analysis of these works often depends on restrictive assumptions of convexity and smo… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 31 pages, 10 figures

  22. arXiv:2404.00924  [pdf, other

    cs.CV

    BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks

    Authors: Zhiyuan Cheng, Zhaoyi Liu, Tengda Guo, Shiwei Feng, Dongfang Liu, Mingjie Tang, Xiangyu Zhang

    Abstract: Pixel-wise regression tasks (e.g., monocular depth estimation (MDE) and optical flow estimation (OFE)) have been widely involved in our daily life in applications like autonomous driving, augmented reality and video composition. Although certain applications are security-critical or bear societal significance, the adversarial robustness of such models are not sufficiently studied, especially in th… ▽ More

    Submitted 24 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Paper accepted at ICML 2024

  23. arXiv:2403.19996  [pdf, other

    cs.LG eess.SP

    DeepHeteroIoT: Deep Local and Global Learning over Heterogeneous IoT Sensor Data

    Authors: Muhammad Sakib Khan Inan, Kewen Liao, Haifeng Shen, Prem Prakash Jayaraman, Dimitrios Georgakopoulos, Ming Jian Tang

    Abstract: Internet of Things (IoT) sensor data or readings evince variations in timestamp range, sampling frequency, geographical location, unit of measurement, etc. Such presented sequence data heterogeneity makes it difficult for traditional time series classification algorithms to perform well. Therefore, addressing the heterogeneity challenge demands learning not only the sub-patterns (local features) b… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted for Publication and Presented in EAI MobiQuitous 2023 - 20th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services

  24. arXiv:2403.14171  [pdf, other

    cs.CL

    MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation

    Authors: Longzheng Wang, Xiaohan Xu, Lei Zhang, Jiarui Lu, Yongxiu Xu, Hongbo Xu, Minghao Tang, Chuang Zhang

    Abstract: Automatic detection of multimodal misinformation has gained a widespread attention recently. However, the potential of powerful Large Language Models (LLMs) for multimodal misinformation detection remains underexplored. Besides, how to teach LLMs to interpret multimodal misinformation in cost-effective and accessible way is still an open question. To address that, we propose MMIDR, a framework des… ▽ More

    Submitted 8 April, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: 10 pages, 3 figures

  25. arXiv:2403.10056  [pdf, other

    cs.CL cs.AI

    Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning

    Authors: Yongquan He, Xuancheng Huang, Minghao Tang, Lingxun Meng, Xiang Li, Wei Lin, Wenyuan Zhang, Yifu Gao

    Abstract: Instruction tuning for large language models (LLMs) can drive them to produce results consistent with human goals in specific downstream tasks. However, the process of continual instruction tuning (CIT) for LLMs may bring about the catastrophic forgetting (CF) problem, where previously learned abilities are degraded. Recent methods try to alleviate the CF problem by modifying models or replaying d… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 18 pages, 4 figures

  26. arXiv:2403.09333  [pdf, other

    cs.CV cs.AI

    Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

    Authors: Yufei Zhan, Yousong Zhu, Hongyin Zhao, Fan Yang, Ming Tang, **qiao Wang

    Abstract: Large Vision Language Models have achieved fine-grained object perception, but the limitation of image resolution remains a significant obstacle to surpass the performance of task-specific experts in complex and dense scenarios. Such limitation further restricts the model's potential to achieve nuanced visual and language referring in domains such as GUI Agents, Counting and \etc. To address this… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Tech report working in progress. Codes, models and datasets will be released at https://github.com/jefferyZhan/Griffon

  27. arXiv:2403.07608  [pdf, other

    cs.DB cs.AI cs.LG

    Couler: Unified Machine Learning Workflow Optimization in Cloud

    Authors: Xiaoda Wang, Yuan Tang, Tengda Guo, Bo Sang, **gji Wu, Jian Sha, Ke Zhang, Jiang Qian, Mingjie Tang

    Abstract: Machine Learning (ML) has become ubiquitous, fueling data-driven applications across various organizations. Contrary to the traditional perception of ML in research, ML workflows can be complex, resource-intensive, and time-consuming. Expanding an ML workflow to encompass a wider range of data infrastructure and data types may lead to larger workloads and increased deployment costs. Currently, num… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  28. arXiv:2403.02528  [pdf, other

    cs.CL cs.AI

    DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation

    Authors: Xueqing Wu, Rui Zheng, **gzhen Sha, Te-Lin Wu, Hanyu Zhou, Mohan Tang, Kai-Wei Chang, Nanyun Peng, Haoran Huang

    Abstract: Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights to comprehensively answer a given user query for tabular data. In this work, we aim to propose new resources and benchmarks to inspire future research on this crucial yet challenging and under-explored task. However, collecting data analysis annotations curated by experts can be prohibitively expensi… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  29. arXiv:2402.18811  [pdf, other

    cs.CV

    BFRFormer: Transformer-based generator for Real-World Blind Face Restoration

    Authors: Guo**g Ge, Qi Song, Guibo Zhu, Yuting Zhang, **glu Chen, Miao Xin, Ming Tang, **qiao Wang

    Abstract: Blind face restoration is a challenging task due to the unknown and complex degradation. Although face prior-based methods and reference-based methods have recently demonstrated high-quality results, the restored images tend to contain over-smoothed results and lose identity-preserved details when the degradation is severe. It is observed that this is attributed to short-range dependencies, the in… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted by ICASSP 2024

  30. arXiv:2402.18116  [pdf, other

    cs.GR cs.CV

    Block and Detail: Scaffolding Sketch-to-Image Generation

    Authors: Vishnu Sarukkai, Lu Yuan, Mia Tang, Maneesh Agrawala, Kayvon Fatahalian

    Abstract: We introduce a novel sketch-to-image tool that aligns with the iterative refinement process of artists. Our tool lets users sketch blocking strokes to coarsely represent the placement and form of objects and detail strokes to refine their shape and silhouettes. We develop a two-pass algorithm for generating high-fidelity images from such sketches at any point in the iterative process. In the first… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 12 pages, 13 figures

  31. arXiv:2402.15166  [pdf, other

    cs.DC cs.LG

    Convergence Analysis of Split Federated Learning on Heterogeneous Data

    Authors: Pengchao Han, Chao Huang, Geng Tian, Ming Tang, Xin Liu

    Abstract: Split federated learning (SFL) is a recent distributed approach for collaborative model training among multiple clients. In SFL, a global model is typically split into two parts, where clients train one part in a parallel federated manner, and a main server trains the other. Despite the recent research on SFL algorithm development, the convergence analysis of SFL is missing in the literature, and… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  32. arXiv:2402.10821  [pdf, other

    cs.CV

    Training Class-Imbalanced Diffusion Model Via Overlap Optimization

    Authors: Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang

    Abstract: Diffusion models have made significant advances recently in high-quality image synthesis and related tasks. However, diffusion models trained on real-world datasets, which often follow long-tailed distributions, yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. To address the observed appearance o… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Technique Report

  33. arXiv:2402.07383  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

    Authors: Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, **zhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

    Abstract: Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their applications and user experience. While there have been prior works to generate natural laughter, they fell short in terms of controlling the timing an… ▽ More

    Submitted 4 March, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: See https://aka.ms/elate/ for demo samples, v2: subjective evaluation has been added

  34. arXiv:2401.15855  [pdf, other

    cs.CV

    Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote Sensing

    Authors: Maofeng Tang, Andrei Cozma, Konstantinos Georgiou, Hairong Qi

    Abstract: Remote sensing images present unique challenges to image analysis due to the extensive geographic coverage, hardware limitations, and misaligned multi-scale images. This paper revisits the classical multi-scale representation learning problem but under the general framework of self-supervised learning for remote sensing image understanding. We present Cross-Scale MAE, a self-supervised model built… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  35. arXiv:2401.08939  [pdf, other

    cs.RO

    Enhancing Campus Mobility: Achievements and Challenges of Autonomous Shuttle "Snow Lion''

    Authors: Yingbing Chen, Jie Cheng, Sheng Wang, Hongji Liu, Xiaodong Mei, Xiaoyang Yan, Mingkai Tang, Ge Sun, Ya Wen, Junwei Cai, Xupeng Xie, Lu Gan, Mandan Chao, Ren Xin, Ming Liu, Jianhao Jiao, Kangcheng Liu, Lujia Wang

    Abstract: The rapid evolution of autonomous vehicles (AVs) has significantly influenced global transportation systems. In this context, we present ``Snow Lion'', an autonomous shuttle meticulously designed to revolutionize on-campus transportation, offering a safer and more efficient mobility solution for students, faculty, and visitors. The primary objective of this research is to enhance campus mobility b… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 9 pages, 9 figures

  36. arXiv:2401.08887  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

    Authors: Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

    Abstract: We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: preprint

  37. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  38. arXiv:2312.12666  [pdf, other

    cs.LG cs.CY cs.SI

    Incremental Semi-supervised Federated Learning for Health Inference via Mobile Sensing

    Authors: Guimin Dong, Lihua Cai, Mingyue Tang, Laura E. Barnes, Mehdi Boukhechba

    Abstract: Mobile sensing appears as a promising solution for health inference problem (e.g., influenza-like symptom recognition) by leveraging diverse smart sensors to capture fine-grained information about human behaviors and ambient contexts. Centralized training of machine learning models can place mobile users' sensitive information under privacy risks due to data breach and misexploitation. Federated L… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  39. arXiv:2312.11983  [pdf, other

    cs.CL cs.AI

    Fluctuation-based Adaptive Structured Pruning for Large Language Models

    Authors: Yongqi An, Xu Zhao, Tao Yu, Ming Tang, **qiao Wang

    Abstract: Network Pruning is a promising way to address the huge computing resource demands of the deployment and inference of Large Language Models (LLMs). Retraining-free is important for LLMs' pruning methods. However, almost all of the existing retraining-free pruning approaches for LLMs focus on unstructured pruning, which requires specific hardware support for acceleration. In this paper, we propose a… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  40. arXiv:2312.10825  [pdf, other

    cs.CV cs.LG

    Latent Space Editing in Transformer-Based Flow Matching

    Authors: Vincent Tao Hu, David W Zhang, Pascal Mettes, Meng Tang, Deli Zhao, Cees G. M. Snoek

    Abstract: This paper strives for image editing via generative models. Flow Matching is an emerging generative modeling technique that offers the advantage of simple and efficient training. Simultaneously, a new transformer-based U-ViT has recently been proposed to replace the commonly used UNet for better scalability and performance in generative modeling. Hence, Flow Matching with a transformer backbone of… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: AAAI 2024 with Appendix

  41. arXiv:2312.10418  [pdf, other

    cs.LG cs.NI eess.SP

    Fractional Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing

    Authors: Lyudong **, Ming Tang, Meng Zhang, Hao Wang

    Abstract: Mobile edge computing (MEC) is a promising paradigm for real-time applications with intensive computational needs (e.g., autonomous driving), as it can reduce the processing delay. In this work, we focus on the timeliness of computational-intensive updates, measured by Age-ofInformation (AoI), and study how to jointly optimize the task updating and offloading policies for AoI with fractional form.… ▽ More

    Submitted 19 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

  42. arXiv:2312.02515  [pdf, other

    cs.LG cs.AI

    ASPEN: High-Throughput LoRA Fine-Tuning of Large Language Models with a Single GPU

    Authors: Zhengmao Ye, Dengchun Li, **gqi Tian, Tingfeng Lan, Jie Zuo, Lei Duan, Hui Lu, Yexi Jiang, Jian Sha, Ke Zhang, Mingjie Tang

    Abstract: Transformer-based large language models (LLMs) have demonstrated outstanding performance across diverse domains, particularly when fine-turned for specific domains. Recent studies suggest that the resources required for fine-tuning LLMs can be economized through parameter-efficient methods such as Low-Rank Adaptation (LoRA). While LoRA effectively reduces computational burdens and resource demands… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: 14 pages, 14 figures

  43. arXiv:2311.17121  [pdf, other

    cs.CV cs.LG

    ScribbleGen: Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation

    Authors: Jacob Schnell, Jieke Wang, Lu Qi, Vincent Tao Hu, Meng Tang

    Abstract: Recent advances in generative models, such as diffusion models, have made generating high-quality synthetic images widely accessible. Prior works have shown that training on synthetic images improves many perception tasks, such as image classification, object detection, and semantic segmentation. We are the first to explore generative data augmentations for scribble-supervised semantic segmentatio… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  44. arXiv:2311.17066  [pdf

    q-bio.QM cs.AI

    Cluster trajectory of SOFA score in predicting mortality in sepsis

    Authors: Yuhe Ke, Matilda Swee Sun Tang, Celestine Jia Ling Loh, Hairil Rizal Abdullah, Nicholas Brian Shannon

    Abstract: Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes.… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: 26 pages, 4 figures, 2 tables

  45. arXiv:2311.16479  [pdf, other

    cs.CV

    Mitigating Hallucination in Visual Language Models with Visual Supervision

    Authors: Zhiyang Chen, Yousong Zhu, Yufei Zhan, Zhaowen Li, Chaoyang Zhao, **qiao Wang, Ming Tang

    Abstract: Large vision-language models (LVLMs) suffer from hallucination a lot, generating responses that apparently contradict to the image content occasionally. The key problem lies in its weak ability to comprehend detailed content in a multi-modal context, which can be mainly attributed to two factors in training data and loss function. The vision instruction dataset primarily focuses on global descript… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  46. arXiv:2311.16206  [pdf, other

    cs.LG cs.AI cs.CV

    Continual Instruction Tuning for Large Multimodal Models

    Authors: **ghan He, Haiyun Guo, Ming Tang, **qiao Wang

    Abstract: Instruction tuning is now a widely adopted approach to aligning large multimodal models (LMMs) to follow human intent. It unifies the data format of vision-language tasks, enabling multi-task joint training. However, vision-language tasks are constantly being created in practice. Instead of always re-training LMMs when new tasks arrive, continual learning offers flexibility for models to continual… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  47. arXiv:2311.14552  [pdf, other

    cs.CV cs.AI

    Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models

    Authors: Yufei Zhan, Yousong Zhu, Zhiyang Chen, Fan Yang, Ming Tang, **qiao Wang

    Abstract: Replicating the innate human ability to detect all objects based on free-form texts at any granularity remains a formidable challenge for Vision-Language models. Current Large Vision Language Models (LVLMs) are predominantly constrained to grounding a single, pre-existing object, relying solely on data from Referring Expression Comprehension tasks. The limitation leads to a compromise in model des… ▽ More

    Submitted 27 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Technical report. The codes and dataset will be released soon at https://github.com/jefferyZhan/Griffon

  48. arXiv:2311.03157  [pdf, other

    cs.DB

    GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization

    Authors: Jiale Lao, Yibo Wang, Yufei Li, Jian** Wang, Yunjia Zhang, Zhiyuan Cheng, Wanghu Chen, Mingjie Tang, Jianguo Wang

    Abstract: Modern database management systems (DBMS) expose hundreds of configurable knobs to control system behaviours. Determining the appropriate values for these knobs to improve DBMS performance is a long-standing problem in the database community. As there is an increasing number of knobs to tune and each knob could be in continuous or categorical values, manual tuning becomes impractical. Recently, au… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 15 pages, 14 figures, submit to VLDB2024

  49. arXiv:2310.18349  [pdf, other

    cs.CL

    A Boundary Offset Prediction Network for Named Entity Recognition

    Authors: Minghao Tang, Yongquan He, Yongxiu Xu, Hongbo Xu, Wenyuan Zhang, Yang Lin

    Abstract: Named entity recognition (NER) is a fundamental task in natural language processing that aims to identify and classify named entities in text. However, span-based methods for NER typically assign entity types to text spans, resulting in an imbalanced sample space and neglecting the connections between non-entity and entity spans. To address these issues, we propose a novel approach for NER, named… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by Findings of EMNLP 2023, 13 pages

  50. arXiv:2310.14596  [pdf, other

    cs.CL cs.AI

    Learning to Correct Noisy Labels for Fine-Grained Entity Ty** via Co-Prediction Prompt Tuning

    Authors: Minghao Tang, Yongquan He, Yongxiu Xu, Hongbo Xu, Wenyuan Zhang, Yang Lin

    Abstract: Fine-grained entity ty** (FET) is an essential task in natural language processing that aims to assign semantic types to entities in text. However, FET poses a major challenge known as the noise labeling problem, whereby current methods rely on estimating noise distribution to identify noisy labels but are confused by diverse noise distribution deviation. To address this limitation, we introduce… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by Findings of EMNLP 2023, 11 pages