Skip to main content

Showing 1–50 of 94 results for author: Qian, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19650  [pdf, other

    cs.CL

    DECOR: Improving Coherence in L2 English Writing with a Novel Benchmark for Incoherence Detection, Reasoning, and Rewriting

    Authors: Xuanming Zhang, Anthony Diaz, Zixun Chen, Qingyang Wu, Kun Qian, Erik Voss, Zhou Yu

    Abstract: Coherence in writing, an aspect that second-language (L2) English learners often struggle with, is crucial in assessing L2 English writing. Existing automated writing evaluation systems primarily use basic surface linguistic features to detect coherence in writing. However, little effort has been made to correct the detected incoherence, which could significantly benefit L2 language learners seeki… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 21 pages, 5 figures, 20 tables

  2. arXiv:2406.17681  [pdf, other

    cs.CL

    VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

    Authors: Kun Qian, Shunji Wan, Claudia Tang, Youzhi Wang, Xuanming Zhang, Maximillian Chen, Zhou Yu

    Abstract: As large language models achieve impressive scores on traditional benchmarks, an increasing number of researchers are becoming concerned about benchmark data leakage during pre-training, commonly known as the data contamination problem. To ensure fair evaluation, recent benchmarks release only the training and validation sets, kee** the test set labels closed-source. They require anyone wishing… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.15119  [pdf, other

    cs.SD cs.AI eess.AS

    Speech Emotion Recognition under Resource Constraints with Data Distillation

    Authors: Yi Chang, Zhao Ren, Zhonghao Zhao, Thanh Tam Nguyen, Kun Qian, Tanja Schultz, Björn W. Schuller

    Abstract: Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  4. arXiv:2406.13617  [pdf, ps, other

    cs.CL cs.AI

    Optimizing Psychological Counseling with Instruction-Tuned Large Language Models

    Authors: Wenjie Li, Tianyu Sun, Kun Qian, Wenhong Wang

    Abstract: The advent of large language models (LLMs) has significantly advanced various fields, including natural language processing and automated dialogue systems. This paper explores the application of LLMs in psychological counseling, addressing the increasing demand for mental health services. We present a method for instruction tuning LLMs with specialized prompts to enhance their performance in provi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 9 pages

  5. arXiv:2406.08380  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Unsupervised Speech Recognition Without Pronunciation Models

    Authors: Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of develo** ASR systems without paired speech and text corpora by pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  6. arXiv:2406.07042  [pdf, other

    cs.CV

    EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network

    Authors: Yining Shi, Kun Jiang, Ke Wang, Kangan Qian, Yunlong Wang, Jiusi Li, Tuopu Wen, Mengmeng Yang, Yiliang Xu, Diange Yang

    Abstract: 3D occupancy prediction (Occ) is a rapidly rising challenging perception task in the field of autonomous driving which represents the driving scene as uniformly partitioned 3D voxel grids with semantics. Compared to 3D object detection, grid perception has great advantage of better recognizing irregularly shaped, unknown category, or partially occluded general objects. However, existing 3D occupan… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: preprint under review

  7. arXiv:2406.04496  [pdf, other

    cs.CL cs.AI cs.LG

    Time Sensitive Knowledge Editing through Efficient Finetuning

    Authors: Xiou Ge, Ali Mousavi, Edouard Grave, Armand Joulin, Kun Qian, Benjamin Han, Mostafa Arefiyan, Yunyao Li

    Abstract: Large Language Models (LLMs) have demonstrated impressive capability in different tasks and are bringing transformative changes to many domains. However, kee** the knowledge in LLMs up-to-date remains a challenge once pretraining is complete. It is thus essential to design effective methods to both update obsolete knowledge and induce new knowledge into LLMs. Existing locate-and-edit knowledge e… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference

  8. arXiv:2405.20336  [pdf, other

    cs.CV cs.SD eess.AS

    RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

    Authors: Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan

    Abstract: In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works that typically address these two modalities in isolation. To facilitate this, we first collect the RapVerse dataset, a large dataset containing synchronous rap** vocals, lyrics, and high-quality 3D holistic bo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project website: https://vis-www.cs.umass.edu/RapVerse

  9. arXiv:2405.15646  [pdf, other

    cs.RO

    LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots

    Authors: Ruoyu Wang, Zhipeng Yang, Zinan Zhao, Xinyan Tong, Zhi Hong, Kun Qian

    Abstract: The development of a general purpose service robot for daily life necessitates the robot's ability to deploy a myriad of fundamental behaviors judiciously. Recent advancements in training Large Language Models (LLMs) can be used to generate action sequences directly, given an instruction in natural language with no additional domain information. However, while the outputs of LLMs are semantically… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  10. arXiv:2405.11383  [pdf

    cs.LG

    Investigating KAN-Based Physics-Informed Neural Networks for EMI/EMC Simulations

    Authors: Kun Qian, Mohamed Kheir

    Abstract: The main objective of this paper is to investigate the feasibility of employing Physics-Informed Neural Networks (PINNs) techniques, in particular KolmogorovArnold Networks (KANs), for facilitating Electromagnetic Interference (EMI) simulations. It introduces some common EM problem formulations and how they can be solved using AI-driven solutions instead of lengthy and complex full-wave numerical… ▽ More

    Submitted 21 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: 8 pages

  11. arXiv:2404.19217  [pdf, other

    cs.RO

    FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills

    Authors: Yongqiang Zhao, Kun Qian, Boyi Duan, Shan Luo

    Abstract: Simulation is a widely used tool in robotics to reduce hardware consumption and gather large-scale data. Despite previous efforts to simulate optical tactile sensors, there remain challenges in efficiently synthesizing images and replicating marker motion under different contact loads. In this work, we propose a fast optical tactile simulator, named FOTS, for simulating optical tactile sensors. We… ▽ More

    Submitted 30 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  12. arXiv:2404.05814  [pdf, other

    cs.CV q-bio.NC

    Towards Explainable Automated Neuroanatomy

    Authors: Kui Qian, Litao Qiao, Beth Friedman, Edward O'Donnell, David Kleinfeld, Yoav Freund

    Abstract: We present a novel method for quantifying the microscopic structure of brain tissue. It is based on the automated recognition of interpretable features obtained by analyzing the shapes of cells. This contrasts with prevailing methods of brain anatomical analysis in two ways. First, contemporary methods use gray-scale values derived from smoothed version of the anatomical images, which dissipated v… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  13. arXiv:2403.01954  [pdf, other

    cs.CL cs.AI cs.LO

    DECIDER: A Rule-Controllable Decoding Strategy for Language Generation by Imitating Dual-System Cognitive Theory

    Authors: Chen Xu, Tian Lan, Changlong Yu, Wei Wang, Jun Gao, Yu Ji, Qunxi Dong, Kun Qian, Piji Li, Wei Bi, Bin Hu

    Abstract: Lexicon-based constrained decoding approaches aim to control the meaning or style of the generated text through certain target concepts. Existing approaches over-focus the targets themselves, leading to a lack of high-level reasoning about how to achieve them. However, human usually tackles tasks by following certain rules that not only focuses on the targets but also on semantically relevant conc… ▽ More

    Submitted 9 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE TKDE (Major Revision), 12 pages, 6 figures

  14. arXiv:2402.03041  [pdf, other

    cs.NI

    Demystifying Datapath Accelerator Enhanced Off-path SmartNIC

    Authors: Xuzheng Chen, Jie Zhang, Ting Fu, Yifan Shen, Shu Ma, Kun Qian, Lingjun Zhu, Chao Shi, Yin Zhang, Ming Liu, Zeke Wang

    Abstract: Network speeds grow quickly in the modern cloud, so SmartNICs are introduced to offload network processing tasks, even application logic. However, typical multicore SmartNICs such as BlueFiled-2 are only capable of processing control-plane tasks with their embedded CPU that has limited memory bandwidth and computing power. On the other hand, hot cloud applications evolve, such that a limited numbe… ▽ More

    Submitted 23 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    MSC Class: 68M10 ACM Class: C.2.1

  15. arXiv:2402.01227  [pdf, other

    cs.SD cs.AI cs.HC eess.AS

    STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

    Authors: Yi Chang, Zhao Ren, Zixing Zhang, Xin **g, Kun Qian, Xi Shao, Bin Hu, Tanja Schultz, Björn W. Schuller

    Abstract: Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  16. arXiv:2401.00134  [pdf, other

    cs.DC cs.LG

    Unicron: Economizing Self-Healing LLM Training at Scale

    Authors: Tao He, Xue Li, Zhibin Wang, Kun Qian, **gbo Xu, Wenyuan Yu, **gren Zhou

    Abstract: Training large-scale language models is increasingly critical in various domains, but it is hindered by frequent failures, leading to significant time and economic costs. Current failure recovery methods in cloud-based settings inadequately address the diverse and complex scenarios that arise, focusing narrowly on erasing downtime for individual tasks without considering the overall cost impact on… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

  17. arXiv:2312.09424  [pdf, other

    cs.CL cs.AI

    Open Domain Knowledge Extraction for Knowledge Graphs

    Authors: Kun Qian, Anton Belyi, Fei Wu, Samira Khorshidi, Azadeh Nikfarjam, Rahul Khot, Yisi Sang, Katherine Luna, Xianqi Chu, Eric Choi, Yash Govind, Chloe Seivwright, Yiwen Sun, Ahmed Fakhry, Theo Rekatsinas, Ihab Ilyas, Xiaoguang Qi, Yunyao Li

    Abstract: The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from ope… ▽ More

    Submitted 30 October, 2023; originally announced December 2023.

    Comments: 7 pages, 7 figures, 5 tables, preprint technical report, no code or data is released

    MSC Class: 68T30 (primary) ACM Class: F.4.1; I.2.4

  18. arXiv:2311.16892  [pdf, other

    cs.IR

    Enhancing Item-level Bundle Representation for Bundle Recommendation

    Authors: Xiaoyu Du, Kun Qian, Yunshan Ma, Xinguang Xiang

    Abstract: Bundle recommendation approaches offer users a set of related items on a particular topic. The current state-of-the-art (SOTA) method utilizes contrastive learning to learn representations at both the bundle and item levels. However, due to the inherent difference between the bundle-level and item-level preferences, the item-level representations may not receive sufficient information from the bun… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  19. arXiv:2311.09431  [pdf, other

    cs.LG cs.CL

    Striped Attention: Faster Ring Attention for Causal Transformers

    Authors: William Brandon, Aniruddha Nrusimha, Kevin Qian, Zachary Ankner, Tian **, Zhiye Song, Jonathan Ragan-Kelley

    Abstract: To help address the growing demand for ever-longer sequence lengths in transformer models, Liu et al. recently proposed Ring Attention, an exact attention algorithm capable of overcoming per-device memory bottle- necks by distributing self-attention across multiple devices. In this paper, we study the performance characteristics of Ring Attention in the important special case of causal transformer… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  20. arXiv:2311.08718  [pdf, other

    cs.CL

    Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

    Authors: Bairu Hou, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang, Yang Zhang

    Abstract: Uncertainty decomposition refers to the task of decomposing the total uncertainty of a predictive model into aleatoric (data) uncertainty, resulting from inherent randomness in the data-generating process, and epistemic (model) uncertainty, resulting from missing information in the model's training data. In large language models (LLMs) specifically, identifying sources of uncertainty is an importa… ▽ More

    Submitted 10 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: ICML 2024, 19 pages, 4 figures

  21. arXiv:2310.17119  [pdf, other

    cs.CL

    FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge

    Authors: Farima Fatahi Bayat, Kun Qian, Benjamin Han, Yisi Sang, Anton Belyi, Samira Khorshidi, Fei Wu, Ihab F. Ilyas, Yunyao Li

    Abstract: Detecting factual errors in textual information, whether generated by large language models (LLM) or curated by humans, is crucial for making informed decisions. LLMs' inability to attribute their claims to external knowledge and their tendency to hallucinate makes it difficult to rely on their responses. Humans, too, are prone to factual errors in their writing. Since manual detection and correct… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 (Demonstration Track)

  22. arXiv:2310.16772  [pdf, other

    cs.AI cs.MA

    AI Agent as Urban Planner: Steering Stakeholder Dynamics in Urban Planning via Consensus-based Multi-Agent Reinforcement Learning

    Authors: Kejiang Qian, Lingjun Mao, Xin Liang, Yimin Ding, ** Gao, Xinran Wei, Ziyi Guo, Jiajie Li

    Abstract: In urban planning, land use readjustment plays a pivotal role in aligning land use configurations with the current demands for sustainable urban development. However, present-day urban planning practices face two main issues. Firstly, land use decisions are predominantly dependent on human experts. Besides, while resident engagement in urban planning can promote urban sustainability and livability… ▽ More

    Submitted 9 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

  23. arXiv:2310.11915  [pdf, other

    cs.HC

    SHA-SCP: A UI Element Spatial Hierarchy Aware Smartphone User Click Behavior Prediction Method

    Authors: Ling Chen, Yiyi Peng, Kai Qian, Hongyu Shi, Xiaofan Zhang

    Abstract: Predicting user click behavior and making relevant recommendations based on the user's historical click behavior are critical to simplifying operations and improving user experience. Modeling UI elements is essential to user click behavior prediction, while the complexity and variety of the UI make it difficult to adequately capture the information of different scales. In addition, the lack of rel… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  24. arXiv:2310.11870  [pdf, other

    cs.CL cs.AI

    AI Nushu: An Exploration of Language Emergence in Sisterhood -Through the Lens of Computational Linguistics

    Authors: Yuqian Sun, Yuying Tang, Ze Gao, Zhijun Pan, Chuyan Xu, Yurou Chen, Kejiang Qian, Zhigang Wang, Tristan Braud, Chang Hee Lee, Ali Asadipour

    Abstract: This paper presents "AI Nushu," an emerging language system inspired by Nushu (women's scripts), the unique language created and used exclusively by ancient Chinese women who were thought to be illiterate under a patriarchal society. In this interactive installation, two artificial intelligence (AI) agents are trained in the Chinese dictionary and the Nushu corpus. By continually observing their e… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted for publication at SIGGRAPH Asia 2023

    MSC Class: 14J60 (Primary) 14F05; 14J26 (Secondary) ACM Class: F.2.2; I.2.7

  25. arXiv:2310.04865  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    ForeSeer: Product Aspect Forecasting Using Temporal Graph Embedding

    Authors: Zixuan Liu, Gaurush Hiranandani, Kun Qian, Eddie W. Huang, Yi Xu, Belinda Zeng, Karthik Subbian, Sheng Wang

    Abstract: Develo** text mining approaches to mine aspects from customer reviews has been well-studied due to its importance in understanding customer needs and product attributes. In contrast, it remains unclear how to predict the future emerging aspects of a new product that currently has little review information. This task, which we named product aspect forecasting, is critical for recommending new pro… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  26. arXiv:2308.08169  [pdf, other

    cs.CL cs.AI

    Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System

    Authors: Jianguo Zhang, Stephen Roller, Kun Qian, Zhiwei Liu, Rui Meng, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

    Abstract: End-to-end task-oriented dialogue (TOD) systems have achieved promising performance by leveraging sophisticated natural language understanding and natural language generation capabilities of pre-trained models. This work enables the TOD systems with more flexibility through a simple cache. The cache provides the flexibility to dynamically update the TOD systems and handle both existing and unseen… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted by SIGDIAL 2023 as a long paper

  27. arXiv:2307.10172  [pdf, other

    cs.CL cs.AI

    DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

    Authors: Jianguo Zhang, Kun Qian, Zhiwei Liu, Shelby Heinecke, Rui Meng, Ye Liu, Zhou Yu, Huan Wang, Silvio Savarese, Caiming Xiong

    Abstract: Despite advancements in conversational AI, language models encounter challenges to handle diverse conversational tasks, and existing dialogue dataset collections often lack diversity and comprehensiveness. To tackle these issues, we introduce DialogStudio: the largest and most diverse collection of dialogue datasets, unified under a consistent format while preserving their original information. Ou… ▽ More

    Submitted 5 February, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: 17 pages, accepted by EACL 2024 Findings as a long paper. All datasets, licenses, codes, and models are available at at https://github.com/salesforce/DialogStudio

  28. arXiv:2306.15686  [pdf, other

    eess.AS cs.CL

    Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning

    Authors: Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Yonggan Fu, Yingyan Lin

    Abstract: Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  29. Blockchain-enabled Parametric Solar Energy Insurance via Remote Sensing

    Authors: Mingyu Hao, Keyang Qian, Sid Chi-Kin Chau

    Abstract: Despite its popularity, the nature of solar energy is highly uncertain and weather dependent, affecting the business viability and investment of solar energy generation, especially for household users. To stabilize the income from solar energy generation, there have been limited traditional options, such as using energy storage to pool excessive solar energy in off-peak periods or financial deriva… ▽ More

    Submitted 17 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: To appear in ACM e-Energy 2023

  30. arXiv:2305.08384  [pdf, other

    cs.CR cs.NI

    Privacy-preserving Blockchain-enabled Parametric Insurance via Remote Sensing and IoT

    Authors: Mingyu Hao, Keyang Qian, Sid Chi-Kin Chau

    Abstract: Traditional Insurance, a popular approach of financial risk management, has suffered from the issues of high operational costs, opaqueness, inefficiency and a lack of trust. Recently, blockchain-enabled "parametric insurance" through authorized data sources (e.g., remote sensing and IoT) aims to overcome these issues by automating the underwriting and claim processes of insurance policies on a blo… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  31. Decentralized Governance for Virtual Community(DeGov4VC): Optimal Policy Design of Human-plant Symbiosis Co-creation

    Authors: Yan Xiang, Qianhui Fan, Kejiang Qian, Jiajie Li, Yuying Tang, Ze Gao

    Abstract: Does the decentralized nature of user behavior in interactive virtual communities help create rules promoting user engagement? Through scenarios like planting, this framework suggests a new paradigm for mutual influence that allows users to impact communities' political decisions. Sixteen participants in the first round of interviews were involved in the framework's creation. Then we developed and… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted In Designing Interactive Systems Conference (DIS Companion 23), July 10-14, 2023, Pittsburgh, PA, USA. ACM, New York, NY, USA, 7 pages

  32. arXiv:2305.00540  [pdf, other

    math.NA cs.LG

    SRL-Assisted AFM: Generating Planar Unstructured Quadrilateral Meshes with Supervised and Reinforcement Learning-Assisted Advancing Front Method

    Authors: Hua Tong, Kuanren Qian, Eni Halilaj, Yongjie Jessica Zhang

    Abstract: High-quality mesh generation is the foundation of accurate finite element analysis. Due to the vast interior vertices search space and complex initial boundaries, mesh generation for complicated domains requires substantial manual processing and has long been considered the most challenging and time-consuming bottleneck of the entire modeling and analysis process. In this paper, we present a novel… ▽ More

    Submitted 30 April, 2023; originally announced May 2023.

    Comments: 18 pages, 11 figures, submitted to Journal of Computational Science

  33. arXiv:2305.00154  [pdf, other

    eess.SY cs.LG cs.MA

    Learning to Seek: Multi-Agent Online Source Seeking Against Non-Stochastic Disturbances

    Authors: Bin Du, Kun Qian, Christian Claudel, Dengfeng Sun

    Abstract: This paper proposes to leverage the emerging~learning techniques and devise a multi-agent online source {seeking} algorithm under unknown environment. Of particular significance in our problem setups are: i) the underlying environment is not only unknown, but dynamically changing and also perturbed by two types of non-stochastic disturbances; and ii) a group of agents is deployed and expected to c… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

  34. arXiv:2304.05489  [pdf, other

    cs.CL

    User Adaptive Language Learning Chatbots with a Curriculum

    Authors: Kun Qian, Ryan Shea, Yu Li, Luke Kutszik Fryer, Zhou Yu

    Abstract: Along with the development of systems for natural language understanding and generation, dialog systems have been widely adopted for language learning and practicing. Many current educational dialog systems perform chitchat, where the generated content and vocabulary are not constrained. However, for learners in a school setting, practice through dialog is more effective if it aligns with students… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  35. arXiv:2303.16897  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

    Authors: Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan

    Abstract: Modeling sounds emitted from physical object interactions is critical for immersive perceptual experiences in real and virtual worlds. Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound. However, they require fine details of both the object geometries and impact locations, which are rarely availab… ▽ More

    Submitted 8 July, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: CVPR 2023. Project page: https://sukun1045.github.io/video-physics-sound-diffusion/

  36. arXiv:2302.05932  [pdf, other

    cs.CL

    Stabilized In-Context Learning with Pre-trained Language Models for Few Shot Dialogue State Tracking

    Authors: Derek Chen, Kun Qian, Zhou Yu

    Abstract: Prompt-based methods with large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks. These models improve even further with the addition of a few labeled in-context exemplars to guide output generation. However, for more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial, leadin… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

    Comments: 14 pages, 3 figures, 7 tables. Accepted at EACL 2023

  37. arXiv:2302.02050  [pdf, other

    cs.HC

    Location-based AR for Social Justice: Case Studies, Lessons, and Open Challenges

    Authors: Hope Schroeder, Rob Tokanel, Kyle Qian, Khoi Le

    Abstract: Dear Visitor and Charleston Reconstructed were location-based augmented reality (AR) experiences created between 2018 and 2020 dealing with two controversial monument sites in the US. The projects were motivated by the ability of AR to 1) link layers of context to physical sites in ways that are otherwise difficult or impossible and 2) to visualize changes to physical spaces, potentially inspiring… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

  38. arXiv:2301.09362  [pdf, other

    cs.SD cs.LG eess.AS

    A Comprehensive Survey on Heart Sound Analysis in the Deep Learning Era

    Authors: Zhao Ren, Yi Chang, Thanh Tam Nguyen, Yang Tan, Kun Qian, Björn W. Schuller

    Abstract: Heart sound auscultation has been applied in clinical usage for early screening of cardiovascular diseases. Due to the high demand for auscultation expertise, automatic auscultation can help with auxiliary diagnosis and reduce the burden of training professional clinicians. Nevertheless, there is a limit to classic machine learning's performance improvement in the era of big data. Deep learning ha… ▽ More

    Submitted 11 May, 2024; v1 submitted 23 January, 2023; originally announced January 2023.

    Comments: Accepted by IEEE Computational Intelligence Magazine

  39. arXiv:2211.16773  [pdf, other

    cs.CL

    KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning

    Authors: Xiao Yu, Qingyang Wu, Kun Qian, Zhou Yu

    Abstract: In task-oriented dialogs (TOD), reinforcement learning (RL) algorithms train a model to directly optimize response for task-related metrics. However, RL needs to perform exploration, which can be time-consuming due to the slow auto-regressive sequence generation process. We investigate an approach to create a more efficient RL-based algorithm to improve TOD performance in an offline setting. First… ▽ More

    Submitted 19 October, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: Accepted at EMNLP 2023

  40. arXiv:2211.01522  [pdf, other

    cs.LG cs.SD eess.AS

    Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

    Authors: Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Lin

    Abstract: Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low-resource Automatic Speech Recognition (ASR) and other speech processing tasks, which can mitigate the necessity of a large amount of transcribed speech and thus has driven a growing demand for on-device ASR and other speech processing. However, advanced speech SSL models have become increasingly la… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted at NeurIPS 2022

  41. arXiv:2210.14977  [pdf, other

    cs.SD cs.AI eess.AS

    Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning

    Authors: Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Björn W. Schuller

    Abstract: Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices repr… ▽ More

    Submitted 11 May, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted by ICASSP 2023

  42. arXiv:2209.09443  [pdf

    cond-mat.mes-hall cs.ET physics.app-ph

    Cryogenic in-memory computing using tunable chiral edge states

    Authors: Yuting Liu, Albert Lee, Kun Qian, Peng Zhang, Haoran He, Zheyu Ren, Shun Kong Cheung, Yaoyin Li, Xu Zhang, Zichao Ma, Zhihua Xiao, Guoqiang Yu, Xin Wang, Junwei Liu, Zhongrui Wang, Kang L. Wang, Qiming Shao

    Abstract: Energy-efficient hardware implementation of machine learning algorithms for quantum computation requires nonvolatile and electrically-programmable devices, memristors, working at cryogenic temperatures that enable in-memory computing. Magnetic topological insulators are promising candidates due to their tunable magnetic order by electrical currents with high energy efficiency. Here, we utilize mag… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

    Comments: 33 pages, 12 figures

  43. arXiv:2208.00548  [pdf

    cs.SI physics.soc-ph

    Safer Traffic Recovery from the Pandemic in London -- Spatiotemporal Data Mining of Car Crashes

    Authors: Kejiang Qian, Yi**g Li

    Abstract: In the aim to support London's safer recovery from the pandemic by improving road safety intelligently, this study investigated the spatiotemporal patterns of age-involved car crashes and affecting factors, upon answering two main research questions: (1)"What are the spatial and temporal patterns of car crashes as well as their changes in two typical years, 2019 and 2020, in London, and how the in… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

    Comments: 38 pages, 8 figures, 6 tables

  44. Ransomware Classification and Detection With Machine Learning Algorithms

    Authors: Mohammad Masum, Md Jobair Hossain Faruk, Hossain Shahriar, Kai Qian, Dan Lo, Muhaiminul Islam Adnan

    Abstract: Malicious attacks, malware, and ransomware families pose critical security issues to cybersecurity, and it may cause catastrophic damages to computer systems, data centers, web, and mobile applications across various industries and businesses. Traditional anti-ransomware systems struggle to fight against newly created sophisticated attacks. Therefore, state-of-the-art techniques like traditional a… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Journal ref: 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)

  45. arXiv:2205.12471  [pdf, other

    cs.CL

    Learning a Better Initialization for Soft Prompts via Meta-Learning

    Authors: Yukun Huang, Kun Qian, Zhou Yu

    Abstract: Prompt tuning (PT) is an effective approach to adapting pre-trained language models to downstream tasks. Without a good initialization, prompt tuning doesn't perform well under few-shot settings. So pre-trained prompt tuning (PPT) is proposed to initialize prompts by leveraging pre-training data. We propose MetaPT (Meta-learned Prompt Tuning) to further improve PPT's initialization by considering… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  46. arXiv:2204.09224  [pdf, other

    cs.SD cs.AI eess.AS

    ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

    Authors: Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang

    Abstract: Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted va… ▽ More

    Submitted 23 June, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

  47. arXiv:2203.15863  [pdf, other

    eess.AS cs.AI cs.CL

    WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models

    Authors: Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

    Abstract: Large-scale auto-regressive language models pretrained on massive text have demonstrated their impressive ability to perform new natural language tasks with only a few text examples, without the need for fine-tuning. Recent studies further show that such a few-shot learning ability can be extended to the text-image setting by training an encoder to encode the images into embeddings functioning lik… ▽ More

    Submitted 13 April, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: submitted to INTERSPEECH 2022

  48. arXiv:2203.15796  [pdf, other

    eess.AS cs.AI

    Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

    Authors: Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

    Abstract: An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech. Develo** such a system can significantly improve the availability of speech techno… ▽ More

    Submitted 15 August, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: INTERSPEECH 2022

  49. arXiv:2203.14156  [pdf, other

    eess.AS cs.AI cs.SD

    SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks

    Authors: Chak Ho Chan, Kaizhi Qian, Yang Zhang, Mark Hasegawa-Johnson

    Abstract: SpeechSplit can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner. However, SpeechSplit requires careful tuning of the autoencoder bottlenecks, which can be time-consuming and less robust. This paper proposes SpeechSplit 2.0, which constrains the information flow of the speech component to… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

  50. arXiv:2203.03556  [pdf, other

    cs.LG quant-ph

    Quantum Deep Learning for Mutant COVID-19 Strain Prediction

    Authors: Yu-Xin **, Jun-Jie Hu, Qi Li, Zhi-Cheng Luo, Fang-Yan Zhang, Hao Tang, Kun Qian, Xian-Min **

    Abstract: New COVID-19 epidemic strains like Delta and Omicron with increased transmissibility and pathogenicity emerge and spread across the whole world rapidly while causing high mortality during the pandemic period. Early prediction of possible variants (especially spike protein) of COVID-19 epidemic strains based on available mutated SARS-CoV-2 RNA sequences may lead to early prevention and treatment. H… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: 34 pages, 4 figures, 2 supplementary figures