Skip to main content

Showing 1–50 of 80 results for author: Miao, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17145  [pdf, other

    cs.DC cs.AI cs.LG

    GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

    Authors: Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

    Abstract: Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. Optimal Kernel Orchestration for Tensor Programs with Korch

    Authors: Muyan Hu, Ashwin Venkatram, Shreyashri Biswas, Balamurugan Marimuthu, Bohan Hou, Gabriele Oliaro, Haojie Wang, Liyan Zheng, Xupeng Miao, Jidong Zhai

    Abstract: Kernel orchestration is the task of map** the computation defined in different operators of a deep neural network (DNN) to the execution of GPU kernels on modern hardware platforms. Prior approaches optimize kernel orchestration by greedily applying operator fusion, which fuses the computation of multiple operators into a single kernel, and miss a variety of optimization opportunities in kernel… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Fix some typos in the ASPLOS version

    Journal ref: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 3 (2024) 755-769

  3. arXiv:2406.07845  [pdf, other

    eess.AS cs.SD

    Target Speaker Extraction with Curriculum Learning

    Authors: Yun Liu, Xuechen Liu, Xiaoxiao Miao, Junichi Yamagishi

    Abstract: This paper presents a novel approach to target speaker extraction (TSE) using Curriculum Learning (CL) techniques, addressing the challenge of distinguishing a target speaker's voice from a mixture containing interfering speakers. For efficient training, we propose designing a curriculum that selects subsets of increasing complexity, such as increasing similarity between target and interfering spe… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation at Interspeech 2024

  4. arXiv:2406.01566  [pdf, other

    cs.DC cs.CL cs.LG

    Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

    Authors: Yixuan Mei, Yonghao Zhuang, Xupeng Miao, Juncheng Yang, Zhihao Jia, Rashmi Vinayak

    Abstract: This paper introduces Helix, a distributed system for high-throughput, low-latency large language model (LLM) serving on heterogeneous GPU clusters. A key idea behind Helix is to formulate inference computation of LLMs over heterogeneous GPUs and network connections as a max-flow problem for a directed, weighted graph, whose nodes represent GPU instances and edges capture both GPU and network hete… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2405.15914  [pdf, other

    cs.CV

    ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching

    Authors: Yumin Zhang, Xingyu Miao, Haoran Duan, Bo Wei, Tejal Shah, Yang Long, Rajiv Ranjan

    Abstract: Text-to-3D content creation is a rapidly evolving research area. Given the scarcity of 3D data, current approaches often adapt pre-trained 2D diffusion models for 3D synthesis. Among these approaches, Score Distillation Sampling (SDS) has been widely adopted. However, the issue of over-smoothing poses a significant limitation on the high-fidelity generation of 3D models. To address this challenge,… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2405.12357  [pdf

    eess.IV cs.CV

    Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI

    Authors: Di Xu, Xin Miao, Hengjie Liu, Jessica E. Scholey, Wensha Yang, Mary Feng, Michael Ohliger, Hui Lin, Yi Lao, Yang Yang, Ke Sheng

    Abstract: Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI rec… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  7. arXiv:2405.11252  [pdf, other

    cs.CV

    Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

    Authors: Xingyu Miao, Haoran Duan, Varun Ojha, Jun Song, Tejal Shah, Yang Long, Rajiv Ranjan

    Abstract: In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversi… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  8. arXiv:2404.02677  [pdf, other

    eess.AS cs.CL cs.CR

    The VoicePrivacy 2024 Challenge Evaluation Plan

    Authors: Natalia Tomashenko, Xiaoxiao Miao, Pierre Champion, Sarina Meyer, Xin Wang, Emmanuel Vincent, Michele Panariello, Nicholas Evans, Junichi Yamagishi, Massimiliano Todisco

    Abstract: The task of the challenge is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content and emotional states. The organizers provide development and evaluation datasets and evaluation scripts, as well as baseline anonymization systems and a list of training resources formed on the basis of the participants' requests. Part… ▽ More

    Submitted 12 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 19 pages, https://www.voiceprivacychallenge.org/. arXiv admin note: substantial text overlap with arXiv:2203.12468

  9. arXiv:2403.14097  [pdf, other

    cs.DC

    Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances

    Authors: Jiangfei Duan, Ziang Song, Xupeng Miao, Xiaoli Xi, Dahua Lin, Harry Xu, Minjia Zhang, Zhihao Jia

    Abstract: Deep neural networks (DNNs) are becoming progressively large and costly to train. This paper aims to reduce DNN training costs by leveraging preemptible instances on modern clouds, which can be allocated at a much lower price when idle but may be preempted by the cloud provider at any time. Prior work that supports DNN training on preemptive instances employs a reactive approach to handling instan… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: NSDI '24

  10. arXiv:2403.09363  [pdf, other

    cs.CV

    Sentinel-Guided Zero-Shot Learning: A Collaborative Paradigm without Real Data Exposure

    Authors: Fan Wan, Xingyu Miao, Haoran Duan, **g**g Deng, Rui Gao, Yang Long

    Abstract: With increasing concerns over data privacy and model copyrights, especially in the context of collaborations between AI service providers and data owners, an innovative SG-ZSL paradigm is proposed in this work. SG-ZSL is designed to foster efficient collaboration without the need to exchange models or sensitive data. It consists of a teacher model, a student model and a generator that links both m… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  11. arXiv:2402.18789  [pdf, other

    cs.DC cs.CL cs.LG

    FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

    Authors: Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, Zhihao Jia

    Abstract: Parameter-efficient finetuning (PEFT) is a widely used technique to adapt large language models for different tasks. Service providers typically create separate systems for users to perform PEFT model finetuning and inference tasks. This is because existing systems cannot handle workloads that include a mix of inference and PEFT finetuning requests. As a result, shared GPU resources are underutili… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  12. arXiv:2402.01950  [pdf, other

    cs.CV

    ConRF: Zero-shot Stylization of 3D Scenes with Conditioned Radiation Fields

    Authors: Xingyu Miao, Yang Bai, Haoran Duan, Fan Wan, Yawen Huang, Yang Long, Yefeng Zheng

    Abstract: Most of the existing works on arbitrary 3D NeRF style transfer required retraining on each single style condition. This work aims to achieve zero-shot controlled stylization in 3D scenes utilizing text or visual input as conditioning factors. We introduce ConRF, a novel method of zero-shot stylization. Specifically, due to the ambiguity of CLIP features, we employ a conversion process that maps th… ▽ More

    Submitted 6 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  13. arXiv:2401.10487  [pdf, other

    cs.IR cs.CL

    Generative Dense Retrieval: Memory Can Be a Burden

    Authors: Peiwen Yuan, Xinglin Wang, Shaoxiong Feng, Boyuan Pan, Yiwei Li, Heda Wang, Xupeng Miao, Kan Li

    Abstract: Generative Retrieval (GR), autoregressively decoding relevant document identifiers given a query, has been shown to perform well under the setting of small-scale corpora. By memorizing the document corpus with model parameters, GR implicitly achieves deep interaction between query and document. However, such a memorizing mechanism faces three drawbacks: (1) Poor memory accuracy for fine-grained fe… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: EACL 2024 main

    Journal ref: EACL 2024 main

  14. arXiv:2401.07159  [pdf, other

    cs.LG cs.AI

    Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

    Authors: Zhengxin Zhang, Dan Zhao, Xupeng Miao, Gabriele Oliaro, Qing Li, Yong Jiang, Zhihao Jia

    Abstract: Finetuning large language models (LLMs) has been empirically effective on a variety of downstream tasks. Existing approaches to finetuning an LLM either focus on parameter-efficient finetuning, which only updates a small number of trainable parameters, or attempt to reduce the memory footprint during the training phase of the finetuning. Typically, the memory footprint during finetuning stems from… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

    ACM Class: I.2.7

  15. arXiv:2401.04861  [pdf, other

    cs.CV

    CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video

    Authors: Xingyu Miao, Yang Bai, Haoran Duan, Yawen Huang, Fan Wan, Yang Long, Yefeng Zheng

    Abstract: The goal of our work is to generate high-quality novel views from monocular videos of complex and dynamic scenes. Prior methods, such as DynamicNeRF, have shown impressive performance by leveraging time-varying dynamic radiation fields. However, these methods have limitations when it comes to accurately modeling the motion of complex objects, which can lead to inaccurate and blurry renderings of d… ▽ More

    Submitted 26 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted by Pattern Recognition

  16. arXiv:2401.03459  [pdf, other

    cs.CV

    BCLNet: Bilateral Consensus Learning for Two-View Correspondence Pruning

    Authors: Xiangyang Miao, Guobao Xiao, Shi** Wang, Jun Yu

    Abstract: Correspondence pruning aims to establish reliable correspondences between two related images and recover relative camera motion. Existing approaches often employ a progressive strategy to handle the local and global contexts, with a prominent emphasis on transitioning from local to global, resulting in the neglect of interactions between different contexts. To tackle this issue, we propose a paral… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  17. arXiv:2312.15234  [pdf, other

    cs.LG cs.AI cs.DC cs.PF

    Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

    Authors: Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi **, Tianqi Chen, Zhihao Jia

    Abstract: In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  18. arXiv:2312.06055  [pdf, other

    cs.SD eess.AS

    Speaker-Text Retrieval via Contrastive Learning

    Authors: Xuechen Liu, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi

    Abstract: In this study, we introduce a novel cross-modal retrieval task involving speaker descriptions and their corresponding audio samples. Utilizing pre-trained speaker and text encoders, we present a simple learning framework based on contrastive learning. Additionally, we explore the impact of incorporating speaker labels into the training process. Our findings establish the effectiveness of linking s… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Signal Processing Letters

  19. arXiv:2312.05279  [pdf

    eess.IV cs.CV

    Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network

    Authors: Anbo Cao, Pin-Yu Le, Zhonghui Qie, Haseeb Hassan, Yingwei Guo, Asim Zaman, Jiaxi Lu, Xueqiang Zeng, Huihui Yang, Xiaoqiang Miao, Taiyu Han, Guangtao Huang, Yan Kang, Yu Luo, Jia Guo

    Abstract: Dynamic susceptibility contrast magnetic resonance imaging (DSC-MRI) is widely used to evaluate acute ischemic stroke to distinguish salvageable tissue and infarct core. For this purpose, traditional methods employ deconvolution techniques, like singular value decomposition, which are known to be vulnerable to noise, potentially distorting the derived perfusion parameters. However, deep learning t… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  20. arXiv:2311.15578  [pdf, other

    cs.LG cs.DB cs.IR

    Experimental Analysis of Large-scale Learnable Vector Storage Compression

    Authors: Hailin Zhang, Penghao Zhao, Xupeng Miao, Yingxia Shao, Zirui Liu, Tong Yang, Bin Cui

    Abstract: Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains. However, the high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table, which poses a great challenge to the training and deployment of mo… ▽ More

    Submitted 13 February, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  21. arXiv:2311.15566  [pdf, other

    cs.DC cs.CL cs.LG

    SpotServe: Serving Generative Large Language Models on Preemptible Instances

    Authors: Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia

    Abstract: The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time. Serving LLMs on pre… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: ASPLOS 2024

  22. arXiv:2311.14220  [pdf, other

    stat.ME cs.LG stat.ML

    Assumption-lean and Data-adaptive Post-Prediction Inference

    Authors: Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, Qiongshi Lu

    Abstract: A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be both costly and labor-intensive to obtain. With the rapid development of machine learning (ML), scientists have relied on ML algorithms to predict these gold-standard outcomes with easily obtained covariates. However, these predicted outcomes are often used directly in subsequent st… ▽ More

    Submitted 6 February, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  23. arXiv:2311.06825  [pdf, ps, other

    cs.IT eess.SP

    Secure Rate-Splitting Multiple Access Transmissions in LMS Systems

    Authors: Minjue He, Hui Zhao, Xiaqing Miao, Shuai Wang, Gaofeng Pan

    Abstract: This letter investigates the secure delivery performance of the rate-splitting multiple access scheme in land mobile satellite (LMS) systems, considering that the private messages intended by a terminal can be eavesdropped by any others from the broadcast signals. Specifically, the considered system has an N-antenna satellite and numerous single-antenna land users. Maximum ratio transmission (MRT)… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, 1 table

  24. arXiv:2310.00711  [pdf, other

    cs.DB

    Automatic Data Repair: Are We Ready to Deploy?

    Authors: Wei Ni, Xiaoye Miao, Xiangyu Zhao, Yangyang Wu, Jianwei Yin

    Abstract: Data quality is paramount in today's data-driven world, especially in the era of generative AI. Dirty data with errors and inconsistencies usually leads to flawed insights, unreliable decision-making, and biased or low-quality outputs from generative models. The study of repairing erroneous data has gained significant importance. Existing data repair algorithms differ in information utilization, p… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: 14 pages, 51 figures

  25. arXiv:2309.13335  [pdf, other

    cs.IR

    Model-enhanced Vector Index

    Authors: Hailin Zhang, Yu**g Wang, Qi Chen, Ruiheng Chang, Ting Zhang, Ziming Miao, Yingyan Hou, Yang Ding, Xupeng Miao, Haonan Wang, Bochen Pang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Xing Xie, Mao Yang, Bin Cui

    Abstract: Embedding-based retrieval methods construct vector indices to search for document representations that are most similar to the query representations. They are widely used in document retrieval due to low latency and decent recall performance. Recent research indicates that deep retrieval solutions offer better model quality, but are hindered by unacceptable serving latency and the inability to sup… ▽ More

    Submitted 9 November, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

  26. VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

    Authors: Sarina Meyer, Xiaoxiao Miao, Ngoc Thang Vu

    Abstract: Speaker anonymization is the task of modifying a speech recording such that the original speaker cannot be identified anymore. Since the first Voice Privacy Challenge in 2020, along with the release of a framework, the popularity of this research topic is continually increasing. However, the comparison and combination of different anonymization approaches remains challenging due to the complexity… ▽ More

    Submitted 21 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted by OJSP-ICASSP 2024 https://ieeexplore.ieee.org/document/10365329

  27. arXiv:2309.06141  [pdf, other

    cs.SD eess.AS

    SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Nicholas Evans, Massimiliano Todisco, Jean-François Bonastre, Mickael Rouvier

    Abstract: The success of deep learning in speaker recognition relies heavily on the use of large datasets. However, the data-hungry nature of deep learning methods has already being questioned on account the ethical, privacy, and legal concerns that arise when using large-scale datasets of natural speech collected from real human speakers. For example, the widely-used VoxCeleb2 dataset for speaker recogniti… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: conference

  28. DS-Depth: Dynamic and Static Depth Estimation via a Fusion Cost Volume

    Authors: Xingyu Miao, Yang Bai, Haoran Duan, Yawen Huang, Fan Wan, Xinxing Xu, Yang Long, Yefeng Zheng

    Abstract: Self-supervised monocular depth estimation methods typically rely on the reprojection error to capture geometric relationships between successive frames in static environments. However, this assumption does not hold in dynamic objects in scenarios, leading to errors during the view synthesis stage, such as feature mismatch and occlusion, which can significantly reduce the accuracy of the generated… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  29. arXiv:2307.02031  [pdf, other

    cs.LG cs.DB cs.DC

    Improving Automatic Parallel Training via Balanced Memory Workload Optimization

    Authors: Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, Bin Cui

    Abstract: Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts… ▽ More

    Submitted 24 February, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.13878

  30. arXiv:2307.00965  [pdf, other

    cs.LG cs.AI

    OpenClinicalAI: An Open and Dynamic Model for Alzheimer's Disease Diagnosis

    Authors: Yunyou Huang, Xiaoshuang Liang, Xiangjiang Lu, Xiuxia Miao, Jiyue Xie, Wen**g Liu, Fan Zhang, Guoxin Kang, Li Ma, Suqin Tang, Zhifei Zhang, Jianfeng Zhan

    Abstract: Although Alzheimer's disease (AD) cannot be reversed or cured, timely diagnosis can significantly reduce the burden of treatment and care. Current research on AD diagnosis models usually regards the diagnosis task as a typical classification task with two primary assumptions: 1) All target categories are known a priori; 2) The diagnostic strategy for each patient is consistent, that is, the number… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Real-world clinical setting,Alzheimer's disease,diagnose,AI,deep learning. arXiv admin note: text overlap with arXiv:2109.04004

  31. arXiv:2307.00936  [pdf, other

    cs.LG cs.AI

    OpenAPMax: Abnormal Patterns-based Model for Real-World Alzheimer's Disease Diagnosis

    Authors: Yunyou Huang, Xianglong Guan, Xiangjiang Lu, Xiaoshuang Liang, Xiuxia Miao, Jiyue Xie, Wen**g Liu, Li Ma, Suqin Tang, Zhifei Zhang, Jianfeng Zhan

    Abstract: Alzheimer's disease (AD) cannot be reversed, but early diagnosis will significantly benefit patients' medical treatment and care. In recent works, AD diagnosis has the primary assumption that all categories are known a prior -- a closed-set classification problem, which contrasts with the open-set recognition problem. This assumption hinders the application of the model in natural clinical setting… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Alzheimer's Disease, Abnormal Patterns, Open-set Recognition, OpenAPMax

  32. arXiv:2306.04945  [pdf, ps, other

    cs.DB

    Modern Data Pricing Models: Taxonomy and Comprehensive Survey

    Authors: Xiaoye Miao, Huanhuan Peng, Xinyu Huang, Lu Chen, Yunjun Gao, Jianwei Yin

    Abstract: Data play an increasingly important role in smart data analytics, which facilitate many data-driven applications. The goal of various data markets aims to alleviate the issue of isolated data islands, so as to benefit data circulation. The problem of data pricing is indispensable yet challenging in data trade. In this paper, we conduct a comprehensive survey on the modern data pricing solutions. W… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  33. arXiv:2305.18823  [pdf, other

    cs.SD eess.AS

    Speaker anonymization using orthogonal Householder neural network

    Authors: Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko

    Abstract: Speaker anonymization aims to conceal a speaker's identity while preserving content information in speech. Current mainstream neural-network speaker anonymization systems disentangle speech into prosody-related, content, and speaker representations. The speaker representation is then anonymized by a selection-based speaker anonymizer that uses a mean vector over a set of randomly selected speaker… ▽ More

    Submitted 12 September, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

  34. arXiv:2305.17423  [pdf, other

    cs.CV

    Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference

    Authors: Zihao Yu, Haoyang Li, Fangcheng Fu, Xupeng Miao, Bin Cui

    Abstract: Due to the recent success of diffusion models, text-to-image generation is becoming increasingly popular and achieves a wide range of applications. Among them, text-to-image editing, or continuous text-to-image generation, attracts lots of attention and can potentially improve the quality of generated images. It's common to see that users may want to slightly edit the generated image by making min… ▽ More

    Submitted 4 January, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: AAAI 2024

  35. arXiv:2305.14791  [pdf, other

    cs.CL

    Prompting Large Language Models for Counterfactual Generation: An Empirical Study

    Authors: Yongqi Li, Mayi Xu, Xin Miao, Shen Zhou, Tieyun Qian

    Abstract: Large language models (LLMs) have made remarkable progress in a wide range of natural language understanding and generation tasks. However, their ability to generate counterfactuals has not been examined systematically. To bridge this gap, we present a comprehensive evaluation framework on various types of NLU tasks, which covers all key factors in determining LLMs' capability of generating counte… ▽ More

    Submitted 23 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to LREC-COLING 2024, camera ready version

  36. arXiv:2305.09940   

    cs.DC

    OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

    Authors: Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Bin Cui

    Abstract: Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architecture… ▽ More

    Submitted 17 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: An older version is in existence, and the article has been updated there. The URL for the updated version is arXiv:2209.13258

  37. arXiv:2305.09781  [pdf, other

    cs.CL cs.DC cs.LG

    SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

    Authors: Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

    Abstract: This paper introduces SpecInfer, a system that accelerates generative large language model (LLM) serving with tree-based speculative inference and verification. The key idea behind SpecInfer is leveraging small speculative models to predict the LLM's outputs; the predictions are organized as a token tree, whose nodes each represent a candidate token sequence. The correctness of all candidate token… ▽ More

    Submitted 31 March, 2024; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: ASPLOS'24

  38. arXiv:2304.03946  [pdf, other

    cs.DC cs.LG

    FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

    Authors: Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui

    Abstract: With the increasing data volume, there is a trend of using large-scale pre-trained models to store the knowledge into an enormous number of model parameters. The training of these models is composed of lots of dense algebras, requiring a huge amount of hardware resources. Recently, sparsely-gated Mixture-of-Experts (MoEs) are becoming more popular and have demonstrated impressive pretraining scala… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: Accepted by SIGMOD 2023

    Journal ref: Proc. ACM Manag. Data, Vol. 1, No. 1, Article 110. Publication date: May 2023

  39. arXiv:2304.00903  [pdf

    cs.CY

    Adoption of Artificial Intelligence in Schools: Unveiling Factors Influencing Teachers Engagement

    Authors: Mutlu Cukurova, Xin Miao, Richard Brooker

    Abstract: Albeit existing evidence about the impact of AI-based adaptive learning platforms, their scaled adoption in schools is slow at best. In addition, AI tools adopted in schools may not always be the considered and studied products of the research community. Therefore, there have been increasing concerns about identifying factors influencing adoption, and studying the extent to which these factors can… ▽ More

    Submitted 5 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: 12 pages,2 tables, 1 figure, International conference of artificial intelligence in education

    ACM Class: K.3.1

  40. arXiv:2303.09167  [pdf, other

    cs.CV cs.AI

    Emotional Reaction Intensity Estimation Based on Multimodal Data

    Authors: Shangfei Wang, Jiaqiang Wu, Feiyi Zheng, Xin Li, Xuewei Li, Suwen Wang, Yi Wu, Yanan Chang, Xiangyu Miao

    Abstract: This paper introduces our method for the Emotional Reaction Intensity (ERI) Estimation Challenge, in CVPR 2023: 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW). Based on the multimodal data provided by the originazers, we extract acoustic and visual features with different pretrained models. The multimodal features are mixed together by Transformer Encoders with cros… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  41. arXiv:2303.09145  [pdf, other

    cs.CV

    Facial Affective Behavior Analysis Method for 5th ABAW Competition

    Authors: Shangfei Wang, Yanan Chang, Yi Wu, Xiangyu Miao, Jiaqiang Wu, Zhouan Zhu, Jiahe Wang, Yufei Xiao

    Abstract: Facial affective behavior analysis is important for human-computer interaction. 5th ABAW competition includes three challenges from Aff-Wild2 database. Three common facial affective analysis tasks are involved, i.e. valence-arousal estimation, expression classification, action unit recognition. For the three challenges, we construct three different models to solve the corresponding problems to imp… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  42. arXiv:2303.02868  [pdf, other

    cs.LG cs.DC

    Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent

    Authors: Xiaonan Nie, Yi Liu, Fangcheng Fu, **bao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui

    Abstract: Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models. Many products and services in Tencent Inc., such as WeChat, QQ, and Tencent Advertisement, have been opted in to gain the power of pre-trained models. In this work, we present Angel-PTM, a productive deep learning system designed for pre-training and fine-tuning Transfor… ▽ More

    Submitted 5 March, 2023; originally announced March 2023.

  43. arXiv:2211.16065  [pdf, other

    eess.AS cs.SD

    Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline

    Authors: Paul-Gauthier Noé, Xiaoxiao Miao, Xin Wang, Junichi Yamagishi, Jean-François Bonastre, Driss Matrouf

    Abstract: The use of modern vocoders in an analysis/synthesis pipeline allows us to investigate high-quality voice conversion that can be used for privacy purposes. Here, we propose to transform the speaker embedding and the pitch in order to hide the sex of the speaker. ECAPA-TDNN-based speaker representation fed into a HiFiGAN vocoder is protected using a neural-discriminant analysis approach, which is co… ▽ More

    Submitted 24 March, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023

  44. arXiv:2211.13878  [pdf, other

    cs.LG cs.DB cs.DC

    Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

    Authors: Xupeng Miao, Yujie Wang, Youhe Jiang, Chunan Shi, Xiaonan Nie, Hailin Zhang, Bin Cui

    Abstract: Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models. However, how to train these models over multiple GPUs efficiently is still challenging due to a large number of parallelism choices. Existing DL systems either rely on manual efforts to make distributed training plan… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Journal ref: VLDB 2023

  45. arXiv:2211.00216  [pdf, other

    cs.LG cs.AI cs.DB cs.DC

    Distributed Graph Neural Network Training: A Survey

    Authors: Yingxia Shao, Hongzheng Li, Xizhi Gu, Hongbo Yin, Yawen Li, Xupeng Miao, Wentao Zhang, Bin Cui, Lei Chen

    Abstract: Graph neural networks (GNNs) are a type of deep learning models that are trained on graphs and have been successfully applied in various domains. Despite the effectiveness of GNNs, it is still challenging for GNNs to efficiently scale to large graphs. As a remedy, distributed computing becomes a promising solution of training large-scale GNNs, since it is able to provide abundant computing resourc… ▽ More

    Submitted 25 August, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

  46. arXiv:2209.14169  [pdf, other

    cs.CV cs.AI cs.MM

    CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

    Authors: Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzheng Ma, Xupeng Miao, Xuming He, Bin Cui

    Abstract: Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with great transferability, which achieves promising accuracy for zero-shot classification. To further improve its downstream performance, existing works propose additional learnable modules upon CLIP and fine-tune them by few-shot training sets. However, the resulting extra training cost and data require… ▽ More

    Submitted 18 December, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: Accepted by AAAI 2023, 12 pages, 6 figures

  47. arXiv:2209.13258  [pdf, other

    cs.DC

    OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

    Authors: Youhe Jiang, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Bin Cui

    Abstract: Large-scale deep learning models contribute to significant performance improvements on varieties of downstream tasks. Current data and model parallelism approaches utilize model replication and partition techniques to support the distributed training of ultra-large models. However, directly deploying these systems often leads to sub-optimal training efficiency due to the complex model architecture… ▽ More

    Submitted 18 May, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted by IJCAI 2023

  48. arXiv:2209.00485  [pdf, other

    eess.AS cs.SD

    Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

    Authors: Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi

    Abstract: Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring. However, the sequential optimization of the front-end and ba… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Submitted to TASLP

  49. Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates

    Authors: Fangcheng Fu, Xupeng Miao, Jiawei Jiang, Huanran Xue, Bin Cui

    Abstract: Vertical federated learning (VFL) is an emerging paradigm that allows different parties (e.g., organizations or enterprises) to collaboratively build machine learning models with privacy protection. In the training phase, VFL only exchanges the intermediate statistics, i.e., forward activations and backward derivatives, across parties to compute model gradients. Nevertheless, due to its geo-distri… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

    Comments: VLDB 2022

  50. arXiv:2207.09716  [pdf, other

    cs.CV

    Multi-Task Learning for Emotion Descriptors Estimation at the fourth ABAW Challenge

    Authors: Yanan Chang, Yi Wu, Xiangyu Miao, Jiahe Wang, Shangfei Wang

    Abstract: Facial valence/arousal, expression and action unit are related tasks in facial affective analysis. However, the tasks only have limited performance in the wild due to the various collected conditions. The 4th competition on affective behavior analysis in the wild (ABAW) provided images with valence/arousal, expression and action unit labels. In this paper, we introduce multi-task learning framewor… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.