Skip to main content

Showing 1–50 of 84 results for author: Chai, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00719  [pdf

    cs.CR cs.DC cs.LG

    A Whole-Process Certifiably Robust Aggregation Method Against Backdoor Attacks in Federated Learning

    Authors: Anqi Zhou, Yezheng Liu, Yidong Chai, Hongyi Zhu, Xinyue Ge, Yuanchun Jiang, Meng Wang

    Abstract: Federated Learning (FL) has garnered widespread adoption across various domains such as finance, healthcare, and cybersecurity. Nonetheless, FL remains under significant threat from backdoor attacks, wherein malicious actors insert triggers into trained models, enabling them to perform certain tasks while still meeting FL's primary objectives. In response, robust aggregation methods have been prop… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 14 pages

  2. arXiv:2406.14090  [pdf, other

    cs.AI

    Personalized Music Recommendation with a Heterogeneity-aware Deep Bayesian Network

    Authors: Erkang **g, Yezheng Liu, Yidong Chai, Shuo Yu, Longshun Liu, Yuanchun Jiang, Yang Wang

    Abstract: Music recommender systems are crucial in music streaming platforms, providing users with music they would enjoy. Recent studies have shown that user emotions can affect users' music mood preferences. However, existing emotion-aware music recommender systems (EMRSs) explicitly or implicitly assume that users' actual emotional states expressed by an identical emotion word are homogeneous. They also… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 34 pages, 19 figures

  3. arXiv:2406.11687  [pdf, other

    cs.CL

    Tokenization Falling Short: The Curse of Tokenization

    Authors: Yekun Chai, Yewei Fang, Qiwei Peng, Xuhong Li

    Abstract: Language models typically tokenize raw text into sequences of subword identifiers from a predefined vocabulary, a process inherently sensitive to typographical errors, length variations, and largely oblivious to the internal structure of tokens-issues we term the curse of tokenization. In this study, we delve into these drawbacks and demonstrate that large language models (LLMs) remain susceptible… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2404.11502  [pdf, other

    cs.CL cs.AI

    Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models

    Authors: Yushuo Chen, Tianyi Tang, Erge Xiang, Linjiang Li, Wayne Xin Zhao, **g Wang, Yunpeng Chai, Ji-Rong Wen

    Abstract: In real world, large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications. For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work, and numerous optimization algorithms and code libraries have been proposed to improve it. Nonetheless… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  5. arXiv:2404.10710  [pdf, other

    cs.CL cs.CV

    Dual Modalities of Text: Visual and Textual Generative Pre-training

    Authors: Yekun Chai, Qingyi Liu, **gwu Xiao, Shuohuan Wang, Yu Sun, Hua Wu

    Abstract: Harnessing visual texts represents a burgeoning frontier in the evolution of language modeling. In this paper, we introduce a novel pre-training framework for a suite of pixel-based autoregressive language models, pre-training on a corpus of over 400 million documents rendered as RGB images. Our approach is characterized by a dual-modality training regimen, engaging both visual data through next p… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  6. arXiv:2404.07840  [pdf, other

    cs.CL cs.LG

    On Training Data Influence of GPT Models

    Authors: Qingyi Liu, Yekun Chai, Shuohuan Wang, Yu Sun, Qiwei Peng, Keze Wang, Hua Wu

    Abstract: Amidst the rapid advancements in generative language models, the investigation of how training data shapes the performance of GPT models is still emerging. This paper presents GPTfluence, a novel approach that leverages a featurized simulation to assess the impact of training examples on the training dynamics of GPT models. Our approach not only traces the influence of individual training instance… ▽ More

    Submitted 16 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  7. arXiv:2404.03659  [pdf, other

    cs.LG cs.CR

    Federated Unlearning for Human Activity Recognition

    Authors: Kongyang Chen, Dong** zhang, Ya** Chai, Weibin Zhang, Shaowei Wang, Jiaxing Shen

    Abstract: The rapid evolution of Internet of Things (IoT) technology has spurred the widespread adoption of Human Activity Recognition (HAR) in various daily life domains. Federated Learning (FL) is frequently utilized to build a global HAR model by aggregating user contributions without transmitting raw individual data. Despite substantial progress in user privacy protection with FL, challenges persist. Re… ▽ More

    Submitted 17 January, 2024; originally announced April 2024.

  8. arXiv:2404.00399  [pdf, other

    cs.CL cs.AI cs.LG

    Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

    Authors: Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak , et al. (20 additional authors not shown)

    Abstract: Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, where… ▽ More

    Submitted 23 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Preprint

  9. arXiv:2402.19173  [pdf, other

    cs.SE cs.AI

    StarCoder 2 and The Stack v2: The Next Generation

    Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

    Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  10. arXiv:2402.16694  [pdf, other

    cs.CL cs.PL cs.SE

    HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization

    Authors: Qiwei Peng, Yekun Chai, Xuhong Li

    Abstract: Large language models (LLMs) have made significant progress in generating codes from textual prompts. However, existing benchmarks have mainly concentrated on translating English prompts to multilingual codes or have been constrained to very limited natural languages (NLs). These benchmarks have overlooked the vast landscape of massively multilingual NL to multilingual code, leaving a critical gap… ▽ More

    Submitted 24 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: LREC-COLING 2024

  11. arXiv:2402.15583  [pdf, other

    cs.CV cs.LG

    Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving

    Authors: Yichen Xie, Hongge Chen, Gregory P. Meyer, Yong Jae Lee, Eric M. Wolff, Masayoshi Tomizuka, Wei Zhan, Yuning Chai, Xin Huang

    Abstract: Due to the lack of depth cues in images, multi-frame inputs are important for the success of vision-based perception, prediction, and planning in autonomous driving. Observations from different angles enable the recovery of 3D object states from 2D image inputs if we can identify the same instance in different input frames. However, the dynamic nature of autonomous driving scenes leads to signific… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  12. arXiv:2402.10045  [pdf

    cs.CV cs.LG

    Short-Form Videos and Mental Health: A Knowledge-Guided Neural Topic Model

    Authors: Jiaheng Xie, Ruicheng Liang, Yidong Chai, Yang Liu, Daniel Zeng

    Abstract: While short-form videos head to reshape the entire social media landscape, experts are exceedingly worried about their depressive impacts on viewers, as evidenced by medical studies. To prevent widespread consequences, platforms are eager to predict these videos' impact on viewers' mental health. Subsequently, they can take intervention measures, such as revising recommendation algorithms and disp… ▽ More

    Submitted 21 March, 2024; v1 submitted 10 January, 2024; originally announced February 2024.

  13. arXiv:2401.12988  [pdf

    cs.CL cs.AI

    Few-Shot Learning for Chronic Disease Management: Leveraging Large Language Models and Multi-Prompt Engineering with Medical Knowledge Injection

    Authors: Haoxin Liu, Wenli Zhang, Jiaheng Xie, Buomsoo Kim, Zhu Zhang, Yidong Chai

    Abstract: This study harnesses state-of-the-art AI technology for chronic disease management, specifically in detecting various mental disorders through user-generated textual content. Existing studies typically rely on fully supervised machine learning, which presents challenges such as the labor-intensive manual process of annotating extensive training data for each disease and the need to design speciali… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    MSC Class: K.5 ACM Class: I.2.7; H.4.m

  14. arXiv:2312.11276  [pdf, other

    cs.CL

    Compositional Generalization for Multi-label Text Classification: A Data-Augmentation Approach

    Authors: Yuyang Chai, Zhuang Li, Jiahui Liu, Lei Chen, Fei Li, Donghong Ji, Chong Teng

    Abstract: Despite significant advancements in multi-label text classification, the ability of existing models to generalize to novel and seldom-encountered complex concepts, which are compositions of elementary ones, remains underexplored. This research addresses this gap. By creating unique data splits across three benchmarks, we assess the compositional generalization ability of existing multi-label text… ▽ More

    Submitted 20 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI'24

  15. arXiv:2312.00784  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

    Authors: Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee

    Abstract: While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary visual… ▽ More

    Submitted 26 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR2024. Project page: https://vip-llava.github.io/

  16. arXiv:2311.13951  [pdf, other

    cs.CL

    MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria

    Authors: Wentao Ge, Shunian Chen, Guiming Hardy Chen, Zhihong Chen, Junying Chen, Shuo Yan, Chenghao Zhu, Ziyue Lin, Wenya Xie, Xinyi Zhang, Yichen Chai, Xiaoyu Liu, Dingjie Song, Xidong Wang, Anningzhe Gao, Zhiyi Zhang, Jianquan Li, Xiang Wan, Benyou Wang

    Abstract: Multimodal large language models (MLLMs) (e.g., GPT-4V, LLaVA, and Claude-3) have broadened the scope of AI applications. Yet, evaluating their performance presents a significant challenge owing to the inherently subjective nature of tasks that do not yield clear-cut solutions especially for those open-ended queries. Existing automatic evaluation methodologies are mainly limited in evaluating obje… ▽ More

    Submitted 27 April, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: 23 pages

  17. arXiv:2310.01045  [pdf, other

    cs.CL

    Tool-Augmented Reward Modeling

    Authors: Lei Li, Yekun Chai, Shuohuan Wang, Yu Sun, Hao Tian, Ningyu Zhang, Hua Wu

    Abstract: Reward modeling (a.k.a., preference modeling) is instrumental for aligning large language models with human preferences, particularly within the context of reinforcement learning from human feedback (RLHF). While conventional reward models (RMs) have exhibited remarkable scalability, they oft struggle with fundamental functionality such as arithmetic computation, code execution, and factual lookup… ▽ More

    Submitted 11 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Spotlight

  18. arXiv:2309.16148  [pdf, other

    cs.CV

    OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions

    Authors: ** Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

    Abstract: One-shot talking head generation has no explicit head movement reference, thus it is difficult to generate talking heads with head motions. Some existing works only edit the mouth area and generate still talking heads, leading to unreal talking head performance. Other works construct one-to-one map** between audio signal and head motion sequences, introducing ambiguity correspondences into the m… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: Paper Under Review

  19. arXiv:2309.12113  [pdf, other

    cs.AI

    Incentivizing Massive Unknown Workers for Budget-Limited Crowdsensing: From Off-Line and On-Line Perspectives

    Authors: Feng Li, Yuqi Chai, Huan Yang, Pengfei Hu, Lingjie Duan

    Abstract: How to incentivize strategic workers using limited budget is a very fundamental problem for crowdsensing systems; nevertheless, since the sensing abilities of the workers may not always be known as prior knowledge due to the diversities of their sensor devices and behaviors, it is difficult to properly select and pay the unknown workers. Although the uncertainties of the workers can be addressed b… ▽ More

    Submitted 2 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  20. arXiv:2309.11903  [pdf

    cs.CR

    Full mesh networking technology with peer to peer grid topology based on variable parameter full dimensional space

    Authors: Wenqiang Song, Chuan He, Zhaoyang Xie, Yuanyuan Chai

    Abstract: The continuous development of computer network technology has accelerated the pace of informatization, and at the same time, network security issues are becoming increasingly prominent. Networking technology with different network topologies is one of the important means to solve network security problems. The security of VPN is based on the division of geographical boundaries, but the granularity… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: 9th International Conference on Networks & Communications (NWCOM 2023)

  21. arXiv:2309.05810  [pdf, other

    cs.CV cs.CR cs.LG cs.RO

    SHIFT3D: Synthesizing Hard Inputs For Tricking 3D Detectors

    Authors: Hongge Chen, Zhao Chen, Gregory P. Meyer, Dennis Park, Carl Vondrick, Ashish Shrivastava, Yuning Chai

    Abstract: We present SHIFT3D, a differentiable pipeline for generating 3D shapes that are structurally plausible yet challenging to 3D object detectors. In safety-critical applications like autonomous driving, discovering such novel challenging objects can offer insight into unknown vulnerabilities of 3D detectors. By representing objects with a signed distanced function (SDF), we show that gradient error s… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  22. arXiv:2308.16635  [pdf, other

    cs.CV

    MFR-Net: Multi-faceted Responsive Listening Head Generation via Denoising Diffusion Model

    Authors: ** Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

    Abstract: Face-to-face communication is a common scenario including roles of speakers and listeners. Most existing research methods focus on producing speaker videos, while the generation of listener heads remains largely overlooked. Responsive listening head generation is an important task that aims to model face-to-face communication scenarios by generating a listener head video given a speaker video and… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  23. arXiv:2308.15012  [pdf, other

    cs.DB

    SALI: A Scalable Adaptive Learned Index Framework based on Probability Models

    Authors: Jiake Ge, Huanchen Zhang, Boyu Shi, Yuanhui Luo, Yunda Guo, Yunpeng Chai, Yuxing Chen, Anqun Pan

    Abstract: The growth in data storage capacity and the increasing demands for high performance have created several challenges for concurrent indexing structures. One promising solution is learned indexes, which use a learning-based approach to fit the distribution of stored data and predictively locate target keys, significantly improving lookup performance. Despite their advantages, prevailing learned inde… ▽ More

    Submitted 4 September, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted by Conference SIGMOD 24, June 09-15, 2024, Santiago, Chile

  24. arXiv:2308.12560  [pdf, other

    cs.CV

    NOVA: NOvel View Augmentation for Neural Composition of Dynamic Objects

    Authors: Dakshit Agrawal, Jiajie Xu, Siva Karthik Mustikovela, Ioannis Gkioulekas, Ashish Shrivastava, Yuning Chai

    Abstract: We propose a novel-view augmentation (NOVA) strategy to train NeRFs for photo-realistic 3D composition of dynamic objects in a static scene. Compared to prior work, our framework significantly reduces blending artifacts when inserting multiple dynamic objects into a 3D scene at novel views and times; achieves comparable PSNR without the need for additional ground truth modalities like optical flow… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted for publication in ICCV Computer Vision for Metaverse Workshop 2023 (code is available at https://github.com/dakshitagrawal/NoVA)

  25. arXiv:2308.04424  [pdf, other

    cs.CL

    A Bi-directional Multi-hop Inference Model for Joint Dialog Sentiment Classification and Act Recognition

    Authors: Li Zheng, Fei Li, Yuyang Chai, Chong Teng, Donghong Ji

    Abstract: The joint task of Dialog Sentiment Classification (DSC) and Act Recognition (DAR) aims to predict the sentiment label and act label for each utterance in a dialog simultaneously. However, current methods encode the dialog context in only one direction, which limits their ability to thoroughly comprehend the context. Moreover, these methods overlook the explicit correlations between sentiment and a… ▽ More

    Submitted 12 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted by NLPCC 2023

  26. arXiv:2307.14491  [pdf, other

    cs.MM cs.SD eess.AS

    A Unified Framework for Modality-Agnostic Deepfakes Detection

    Authors: Cai Yu, Peng Chen, Jiahe Tian, ** Liu, Jiao Dai, Xi Wang, Yesheng Chai, Shan Jia, Siwei Lyu, Jizhong Han

    Abstract: As AI-generated content (AIGC) thrives, deepfakes have expanded from single-modality falsification to cross-modal fake content creation, where either audio or visual components can be manipulated. While using two unimodal detectors can detect audio-visual deepfakes, cross-modal forgery clues could be overlooked. Existing multimodal deepfake detection methods typically establish correspondence betw… ▽ More

    Submitted 24 October, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  27. Natural Language Instructions for Intuitive Human Interaction with Robotic Assistants in Field Construction Work

    Authors: Somin Park, Xi Wang, Carol C. Menassa, Vineet R. Kamat, Joyce Y. Chai

    Abstract: The introduction of robots is widely considered to have significant potential of alleviating the issues of worker shortage and stagnant productivity that afflict the construction industry. However, it is challenging to use fully automated robots in complex and unstructured construction sites. Human-Robot Collaboration (HRC) has shown promise of combining human workers' flexibility and robot assist… ▽ More

    Submitted 11 July, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

  28. arXiv:2306.08162  [pdf, other

    cs.CL cs.AI cs.LG

    INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation

    Authors: Yuji Chai, John Gkountouras, Glenn G. Ko, David Brooks, Gu-Yeon Wei

    Abstract: We introduce a method that dramatically reduces fine-tuning VRAM requirements and rectifies quantization errors in quantized Large Language Models. First, we develop an extremely memory-efficient fine-tuning (EMEF) method for quantized models using Low-Rank Adaptation (LoRA), and drawing upon it, we construct an error-correcting algorithm designed to minimize errors induced by the quantization pro… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  29. arXiv:2305.17497  [pdf, other

    cs.CL

    FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing

    Authors: Zhuang Li, Yuyang Chai, Terry Yue Zhuo, Lizhen Qu, Gholamreza Haffari, Fei Li, Donghong Ji, Quan Hung Tran

    Abstract: Textual scene graph parsing has become increasingly important in various vision-language applications, including image caption evaluation and image retrieval. However, existing scene graph parsers that convert image captions into scene graphs often suffer from two types of errors. First, the generated scene graphs fail to capture the true semantics of the captions or the corresponding images, resu… ▽ More

    Submitted 1 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: 9 pages, ACL 2023 (findings)

  30. arXiv:2303.17789  [pdf, other

    cs.CV

    FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions

    Authors: ** Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

    Abstract: One-shot talking head generation has received growing attention in recent years, with various creative and practical applications. An ideal natural and vivid generated talking head video should contain natural head pose changes. However, it is challenging to map head pose sequences from driving audio since there exists a natural gap between audio-visual modalities. In this work, we propose a Flow-… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted by ICME2023

  31. arXiv:2303.05078  [pdf, other

    cs.CV

    Efficient Transformer-based 3D Object Detection with Dynamic Token Halting

    Authors: Mao Ye, Gregory P. Meyer, Yuning Chai, Qiang Liu

    Abstract: Balancing efficiency and accuracy is a long-standing problem for deploying deep learning models. The trade-off is even more important for real-time safety-critical systems like autonomous vehicles. In this paper, we propose an effective approach for accelerating transformer-based 3D object detectors by dynamically halting tokens at different layers depending on their contribution to the detection… ▽ More

    Submitted 11 October, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  32. arXiv:2302.11875  [pdf, other

    cs.CL

    Improved Training of Mixture-of-Experts Language GANs

    Authors: Yekun Chai, Qiyue Yin, Junge Zhang

    Abstract: Despite the dramatic success in image generation, Generative Adversarial Networks (GANs) still face great challenges in synthesizing sequences of discrete elements, in particular human language. The difficulty in generator training arises from the limited representation capacity and uninformative learning signals obtained from the discriminator. In this work, we (1) first empirically show that the… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: Accepted at ICASSP 2023

  33. arXiv:2302.08197  [pdf, other

    cs.CV

    OPT: One-shot Pose-Controllable Talking Head Generation

    Authors: ** Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han

    Abstract: One-shot talking head generation produces lip-sync talking heads based on arbitrary audio and one source face. To guarantee the naturalness and realness, recent methods propose to achieve free pose control instead of simply editing mouth areas. However, existing methods do not preserve accurate identity of source face when generating head motions. To solve the identity mismatch problem and achieve… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP2023

  34. arXiv:2302.04456  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models

    Authors: Pengfei Zhu, Chao Pang, Yekun Chai, Lei Li, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu

    Abstract: In recent years, the burgeoning interest in diffusion models has led to significant advances in image and speech generation. Nevertheless, the direct synthesis of music waveforms from unrestricted textual prompts remains a relatively underexplored domain. In response to this lacuna, this paper introduces a pioneering contribution in the form of a text-to-waveform music generation model, underpinne… ▽ More

    Submitted 21 September, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted by AACL demo 2023

  35. arXiv:2301.10999  [pdf, other

    cs.LG cs.PF

    PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices

    Authors: Yuji Chai, Devashree Tripathy, Chuteng Zhou, Dibakar Gope, Igor Fedorov, Ramon Matas, David Brooks, Gu-Yeon Wei, Paul Whatmough

    Abstract: The ability to accurately predict deep neural network (DNN) inference performance metrics, such as latency, power, and memory footprint, for an arbitrary DNN on a target hardware platform is essential to the design of DNN based models. This ability is critical for the (manual or automatic) design, optimization, and deployment of practical DNNs for a specific hardware deployment platform. Unfortuna… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

  36. arXiv:2212.06742  [pdf, other

    cs.CL cs.LG cs.PL cs.SE

    ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

    Authors: Yekun Chai, Shuohuan Wang, Chao Pang, Yu Sun, Hao Tian, Hua Wu

    Abstract: Software engineers working with the same programming language (PL) may speak different natural languages (NLs) and vice versa, erecting huge barriers to communication and working efficiency. Recent studies have demonstrated the effectiveness of generative pre-training in computer programs, yet they are always English-centric. In this work, we step towards bridging the gap between multilingual NLs… ▽ More

    Submitted 19 May, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: Accepted at ACL 2023 (Findings)

  37. arXiv:2211.16883  [pdf, other

    cs.CL

    X-PuDu at SemEval-2022 Task 6: Multilingual Learning for English and Arabic Sarcasm Detection

    Authors: Yaqian Han, Yekun Chai, Shuohuan Wang, Yu Sun, Hongyi Huang, Guanghao Chen, Yitong Xu, Yang Yang

    Abstract: Detecting sarcasm and verbal irony from people's subjective statements is crucial to understanding their intended meanings and real sentiments and positions in social scenarios. This paper describes the X-PuDu system that participated in SemEval-2022 Task 6, iSarcasmEval - Intended Sarcasm Detection in English and Arabic, which aims at detecting intended sarcasm in various settings of natural lang… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

    Comments: SemEval-2022 Task 6

  38. arXiv:2210.12050  [pdf, other

    cs.CL

    Clip-Tuning: Towards Derivative-free Prompt Learning with a Mixture of Rewards

    Authors: Yekun Chai, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

    Abstract: Derivative-free prompt learning has emerged as a lightweight alternative to prompt tuning, which only requires model inference to optimize the prompts. However, existing work did not take full advantage of the over-parameterized characteristics of large pre-trained language models (PLMs). In this paper, we propose Clip-Tuning, a simple yet effective method that adopts diverse frozen "thinned" netw… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 (Findings)

  39. arXiv:2209.12127  [pdf, other

    cs.LG

    SpeedLimit: Neural Architecture Search for Quantized Transformer Models

    Authors: Yuji Chai, Luke Bailey, Yunho **, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

    Abstract: While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an u… ▽ More

    Submitted 13 October, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

  40. Heterogeneous Domain Adaptation with Adversarial Neural Representation Learning: Experiments on E-Commerce and Cybersecurity

    Authors: Mohammadreza Ebrahimi, Yidong Chai, Hao Helen Zhang, Hsinchun Chen

    Abstract: Learning predictive models in new domains with scarce training data is a growing challenge in modern supervised learning scenarios. This incentivizes develo** domain adaptation methods that leverage the knowledge in known domains (source) and adapt to new domains (target) with a different probability distribution. This becomes more challenging when the source and target domains are in heterogene… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: Forthcoming in IEEE Transactions on Pattern Recognition and Machine Intelligence (TPAMI)

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1862-1875, 2023

  41. arXiv:2203.08430  [pdf, other

    cs.CL

    Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure

    Authors: Yuan Chai, Yaobo Liang, Nan Duan

    Abstract: Multilingual pre-trained language models, such as mBERT and XLM-R, have shown impressive cross-lingual ability. Surprisingly, both of them use multilingual masked language model (MLM) without any cross-lingual supervision or aligned data. Despite the encouraging results, we still lack a clear understanding of why cross-lingual ability could emerge from multilingual MLM. In our work, we argue that… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  42. Occupancy Flow Fields for Motion Forecasting in Autonomous Driving

    Authors: Reza Mahjourian, **kyu Kim, Yuning Chai, Mingxing Tan, Ben Sapp, Dragomir Anguelov

    Abstract: We propose Occupancy Flow Fields, a new representation for motion forecasting of multiple agents, an important task in autonomous driving. Our representation is a spatio-temporal grid with each grid cell containing both the probability of the cell being occupied by any agent, and a two-dimensional flow vector representing the direction and magnitude of the motion in that cell. Our method successfu… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

    Journal ref: IEEE Robotics and Automation Letters

  43. Cross-Domain Deep Code Search with Meta Learning

    Authors: Yitian Chai, Hongyu Zhang, Beijun Shen, Xiaodong Gu

    Abstract: Recently, pre-trained programming language models such as CodeBERT have demonstrated substantial gains in code search. Despite showing great performance, they rely on the availability of large amounts of parallel data to fine-tune the semantic map**s between queries and code. This restricts their practicality in domain-specific languages with relatively scarce and expensive data. In this paper,… ▽ More

    Submitted 12 March, 2024; v1 submitted 1 January, 2022; originally announced January 2022.

    Comments: Accepted by ICSE 2022 (The 44th International Conference on Software Engineering)

  44. arXiv:2110.09004  [pdf, other

    cs.CV

    NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

    Authors: Diwei Sheng, Yuxiang Chai, Xinru Li, Chen Feng, Jianzhe Lin, Claudio Silva, John-Ross Rizzo

    Abstract: Visual place recognition (VPR) is critical in not only localization and map** for autonomous driving vehicles, but also in assistive navigation for the visually impaired population. To enable a long-term VPR system on a large scale, several challenges need to be addressed. First, different applications could require different image view directions, such as front views for self-driving cars while… ▽ More

    Submitted 25 July, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

    Comments: 8 pages, 10 figures, published in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

  45. arXiv:2109.13081  [pdf, other

    cs.RO cs.AI

    Semi-Autonomous Teleoperation via Learning Non-Prehensile Manipulation Skills

    Authors: Sangbeom Park, Yoonbyung Chai, Sunghyun Park, Jeongeun Park, Kyungjae Lee, Sungjoon Choi

    Abstract: In this paper, we present a semi-autonomous teleoperation framework for a pick-and-place task using an RGB-D sensor. In particular, we assume that the target object is located in a cluttered environment where both prehensile gras** and non-prehensile manipulation are combined for efficient teleoperation. A trajectory-based reinforcement learning is utilized for learning the non-prehensile manipu… ▽ More

    Submitted 25 May, 2022; v1 submitted 27 September, 2021; originally announced September 2021.

  46. arXiv:2109.03529  [pdf, other

    cs.CL

    RefineCap: Concept-Aware Refinement for Image Captioning

    Authors: Yekun Chai, Shuo **, Junliang Xing

    Abstract: Automatically translating images to texts involves image scene understanding and language modeling. In this paper, we propose a novel model, termed RefineCap, that refines the output vocabulary of the language decoder using decoder-guided visual semantics, and implicitly learns the map** between visual tag words and images. The proposed Visual-Concept Refinement method can allow the generator to… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: Accepted at ViGIL @NAACL 2021

  47. Measuring daily-life fear perception change: a computational study in the context of COVID-19

    Authors: Yuchen Chai, Juan Palacios, Jianghao Wang, Yichun Fan, Siqi Zheng

    Abstract: COVID-19, as a global health crisis, has triggered the fear emotion with unprecedented intensity. Besides the fear of getting infected, the outbreak of COVID-19 also created significant disruptions in people's daily life and thus evoked intensive psychological responses indirect to COVID-19 infections. Here, we construct an expressed fear database using 16 million social media posts generated by 5… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

    Comments: 15 pages

  48. arXiv:2107.05176  [pdf, other

    cs.CV cs.CL

    Zero-Shot Compositional Concept Learning

    Authors: Guangyue Xu, Parisa Kordjamshidi, Joyce Y. Chai

    Abstract: In this paper, we study the problem of recognizing compositional attribute-object concepts within the zero-shot learning (ZSL) framework. We propose an episode-based cross-attention (EpiCA) network which combines merits of cross-attention mechanism and episode-based training strategy to recognize novel compositional concepts. Firstly, EpiCA bases on cross-attention to correlate concept-visual info… ▽ More

    Submitted 11 July, 2021; originally announced July 2021.

  49. arXiv:2106.14880  [pdf, other

    cs.CV

    HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps

    Authors: Lu Mi, Hang Zhao, Charlie Nash, Xiaohan **, Jiyang Gao, Chen Sun, Cordelia Schmid, Nir Shavit, Yuning Chai, Dragomir Anguelov

    Abstract: High Definition (HD) maps are maps with precise definitions of road lanes with rich semantics of the traffic rules. They are critical for several key stages in an autonomous driving system, including motion forecasting and planning. However, there are only a small amount of real-world road topologies and geometries, which significantly limits our ability to test out the self-driving stack to gener… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

  50. arXiv:2106.13381  [pdf, other

    cs.CV

    To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels

    Authors: Yuning Chai, Pei Sun, Jiquan Ngiam, Weiyue Wang, Benjamin Caine, Vijay Vasudevan, Xiao Zhang, Dragomir Anguelov

    Abstract: 3D object detection is vital for many robotics applications. For tasks where a 2D perspective range image exists, we propose to learn a 3D representation directly from this range image view. To this end, we designed a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network. Its layers can consume any arbitrary convolution kernel in place… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Journal ref: CVPR 2021