Skip to main content

Showing 1–42 of 42 results for author: Yi, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17349  [pdf, other

    cs.CR cs.CV

    Semantic Deep Hiding for Robust Unlearnable Examples

    Authors: Ruohan Meng, Chenyu Yi, Yi Yu, Siyuan Yang, Bingquan Shen, Alex C. Kot

    Abstract: Ensuring data privacy and protection has become paramount in the era of deep learning. Unlearnable examples are proposed to mislead the deep learning models and prevent data from unauthorized exploration by adding small perturbations to data. However, such perturbations (e.g., noise, texture, color change) predominantly impact low-level features, making them vulnerable to common countermeasures. I… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by TIFS 2024

  2. arXiv:2406.02539  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Parrot: Multilingual Visual Instruction Tuning

    Authors: Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

    Abstract: The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training p… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2404.17753  [pdf, other

    cs.CV cs.AI

    Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

    Authors: Chao Yi, Lu Ren, De-Chuan Zhan, Han-Jia Ye

    Abstract: CLIP showcases exceptional cross-modal matching capabilities due to its training on image-text contrastive learning tasks. However, without specific optimization for unimodal scenarios, its performance in single-modality feature extraction might be suboptimal. Despite this, some studies have directly used CLIP's image encoder for tasks like few-shot classification, introducing a misalignment betwe… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  4. arXiv:2403.13797  [pdf, other

    cs.LG cs.CV

    Bridge the Modality and Capacity Gaps in Vision-Language Model Selection

    Authors: Chao Yi, De-Chuan Zhan, Han-Jia Ye

    Abstract: Vision Language Models (VLMs) excel in zero-shot image classification by pairing images with textual category names. The expanding variety of Pre-Trained VLMs enhances the likelihood of identifying a suitable VLM for specific tasks. Thus, a promising zero-shot image classification strategy is selecting the most appropriate Pre-Trained VLM from the VLM Zoo, relying solely on the text data of the ta… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  5. arXiv:2403.13237  [pdf, ps, other

    cs.CR math.OC

    Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0

    Authors: Jiana Liao, **bo Wen, Jiawen Kang, Changyan Yi, Yang Zhang, Yutao Jiao, Dusit Niyato, Dong In Kim, Shengli Xie

    Abstract: Web 3.0 is recognized as a pioneering paradigm that empowers users to securely oversee data without reliance on a centralized authority. Blockchains, as a core technology to realize Web 3.0, can facilitate decentralized and transparent data management. Nevertheless, the evolution of blockchain-enabled Web 3.0 is still in its nascent phase, grappling with challenges such as ensuring efficiency and… ▽ More

    Submitted 8 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  6. arXiv:2402.18936  [pdf, ps, other

    cs.NI eess.SP

    Energy-Efficient UAV Swarm Assisted MEC with Dynamic Clustering and Scheduling

    Authors: Jialiuyuan Li, Jiayuan Chen, Changyan Yi, Tong Zhang, Kun Zhu, Jun Cai

    Abstract: In this paper, the energy-efficient unmanned aerial vehicle (UAV) swarm assisted mobile edge computing (MEC) with dynamic clustering and scheduling is studied. In the considered system model, UAVs are divided into multiple swarms, with each swarm consisting of a leader UAV and several follower UAVs to provide computing services to end-users. Unlike existing work, we allow UAVs to dynamically clust… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  7. arXiv:2402.18927  [pdf, other

    cs.CV cs.MM cs.NI

    Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

    Authors: Xiang Chen, Wenjie Zhu, Jiayuan Chen, Tong Zhang, Changyan Yi, Jun Cai

    Abstract: This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM). TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  8. arXiv:2402.11500  [pdf, other

    cs.GT cs.NI

    A Three-Party Repeated Coalition Formation Game for PLS in Wireless Communications with IRSs

    Authors: Haipeng Zhou, Ruoyang Chen, Changyan Yi, Juan Li, Jun Cai

    Abstract: In this paper, a repeated coalition formation game (RCFG) with dynamic decision-making for physical layer security (PLS) in wireless communications with intelligent reflecting surfaces (IRSs) has been investigated. In the considered system, one central legitimate transmitter (LT) aims to transmit secret signals to a group of legitimate receivers (LRs) under the threat of a proactive eavesdropper (… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted to IEEE WCNC 2024

  9. arXiv:2401.16710  [pdf, other

    cs.NI

    Dynamic Human Digital Twin Deployment at the Edge for Task Execution: A Two-Timescale Accuracy-Aware Online Optimization

    Authors: Yuye Yang, You Shi, Changyan Yi, Jun Cai, Jiawen Kang, Dusit Niyato, Xuemin, Shen

    Abstract: Human digital twin (HDT) is an emerging paradigm that bridges physical twins (PTs) with powerful virtual twins (VTs) for assisting complex task executions in human-centric services. In this paper, we study a two-timescale online optimization for building HDT under an end-edge-cloud collaborative framework. As a unique feature of HDT, we consider that PTs' corresponding VTs are deployed on edge ser… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  10. arXiv:2401.13699  [pdf, other

    cs.HC cs.AI cs.LG

    Generative AI-Driven Human Digital Twin in IoT-Healthcare: A Comprehensive Survey

    Authors: Jiayuan Chen, You Shi, Changyan Yi, Hongyang Du, Jiawen Kang, Dusit Niyato

    Abstract: The Internet of things (IoT) can significantly enhance the quality of human life, specifically in healthcare, attracting extensive attentions to IoT-healthcare services. Meanwhile, the human digital twin (HDT) is proposed as an innovative paradigm that can comprehensively characterize the replication of the individual human body in the digital world and reflect its physical status in real time. Na… ▽ More

    Submitted 28 June, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

  11. arXiv:2401.02705  [pdf, other

    cs.AI

    XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model

    Authors: Zhitao Wang, Wei Wang, Zirao Li, Long Wang, Can Yi, Xinjie Xu, Luyang Cao, Han**g Su, Shouzhi Chen, Jun Zhou

    Abstract: In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting… ▽ More

    Submitted 10 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  12. arXiv:2312.12063  [pdf, other

    cs.NI cs.AI cs.GT

    Resource-efficient Generative Mobile Edge Networks in 6G Era: Fundamentals, Framework and Case Study

    Authors: Bingkun Lai, **bo Wen, Jiawen Kang, Hongyang Du, Jiangtian Nie, Changyan Yi, Dong In Kim, Shengli Xie

    Abstract: As the next-generation wireless communication system, Sixth-Generation (6G) technologies are emerging, enabling various mobile edge networks that can revolutionize wireless communication and connectivity. By integrating Generative Artificial Intelligence (GAI) with mobile edge networks, generative mobile edge networks possess immense potential to enhance the intelligence and efficiency of wireless… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  13. arXiv:2312.02896  [pdf, other

    cs.CV

    BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

    Authors: Rizhao Cai, Zirui Song, Dayan Guan, Zhenhao Chen, Xing Luo, Chenyu Yi, Alex Kot

    Abstract: Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown remarkable capabilities in visual reasoning with common image styles. However, their robustness against diverse style shifts, crucial for practical applications, remains largely unexplored. In this paper, we propose a new benchmark, BenchLMM, to assess the robustness of LMMs against three different styles: artistic image style, ima… ▽ More

    Submitted 5 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Code is available at https://github.com/AIFEG/BenchLMM

  14. arXiv:2310.05341  [pdf, other

    cs.CV cs.AI

    A Critical Look at Classic Test-Time Adaptation Methods in Semantic Segmentation

    Authors: Chang'an Yi, Haotian Chen, Yifan Zhang, Yonghui Xu, Lizhen Cui

    Abstract: Test-time adaptation (TTA) aims to adapt a model, initially trained on training data, to potential distribution shifts in the test data. Most existing TTA studies, however, focus on classification tasks, leaving a notable gap in the exploration of TTA for semantic segmentation. This pronounced emphasis on classification might lead numerous newcomers and engineers to mistakenly assume that classic… ▽ More

    Submitted 11 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  15. arXiv:2309.09984  [pdf

    q-bio.NC cs.NE

    BDEC:Brain Deep Embedded Clustering model

    Authors: Xiaoxiao Ma, Chunzhi Yi, Zhicai Zhong, Hui Zhou, Baichun Wei, Haiqi Zhu, Feng Jiang

    Abstract: An essential premise for neuroscience brain network analysis is the successful segmentation of the cerebral cortex into functionally homogeneous regions. Resting-state functional magnetic resonance imaging (rs-fMRI), capturing the spontaneous activities of the brain, provides the potential for cortical parcellation. Previous parcellation methods can be roughly categorized into three groups, mainly… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  16. arXiv:2308.09158  [pdf, other

    cs.LG cs.CL cs.CV

    ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse

    Authors: Yi-Kai Zhang, Lu Ren, Chao Yi, Qi-Wei Wang, De-Chuan Zhan, Han-Jia Ye

    Abstract: The rapid expansion of foundation pre-trained models and their fine-tuned counterparts has significantly contributed to the advancement of machine learning. Leveraging pre-trained models to extract knowledge and expedite learning in real-world tasks, known as "Model Reuse", has become crucial in various applications. Previous research focuses on reusing models within a certain aspect, including re… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  17. arXiv:2307.12115  [pdf, other

    cs.NI cs.AI cs.LG

    A Revolution of Personalized Healthcare: Enabling Human Digital Twin with Mobile AIGC

    Authors: Jiayuan Chen, Changyan Yi, Hongyang Du, Dusit Niyato, Jiawen Kang, Jun Cai, Xuemin, Shen

    Abstract: Mobile Artificial Intelligence-Generated Content (AIGC) technology refers to the adoption of AI algorithms deployed at mobile edge networks to automate the information creation process while fulfilling the requirements of end users. Mobile AIGC has recently attracted phenomenal attentions and can be a key enabling technology for an emerging application, called human digital twin (HDT). HDT empower… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

  18. arXiv:2306.07008  [pdf, other

    quant-ph cs.IT

    Quantum Phase Estimation by Compressed Sensing

    Authors: Changhao Yi, Cunlu Zhou, Jun Takahashi

    Abstract: As a signal recovery algorithm, compressed sensing is particularly useful when the data has low-complexity and samples are rare, which matches perfectly with the task of quantum phase estimation (QPE). In this work we present a new Heisenberg-limited QPE algorithm for early quantum computers based on compressed sensing. More specifically, given many copies of a proper initial state and queries to… ▽ More

    Submitted 18 December, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

  19. arXiv:2305.11392  [pdf, other

    cs.CV cs.CL

    Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

    Authors: Mingliang Zhai, Yulin Li, Xiameng Qin, Chen Yi, Qunyi Xie, Chengquan Zhang, Kun Yao, Yuwei Wu, Yunde Jia

    Abstract: Transformers achieve promising performance in document understanding because of their high effectiveness and still suffer from quadratic computational complexity dependency on the sequence length. General efficient transformers are challenging to be directly adapted to model document. They are unable to handle the layout representation in documents, e.g. word, line and paragraph, on different gran… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: IJCAI 2023

  20. arXiv:2305.10432  [pdf, other

    cs.LG

    Model-Contrastive Federated Domain Adaptation

    Authors: Chang'an Yi, Haotian Chen, Yonghui Xu, Yifan Zhang

    Abstract: Federated domain adaptation (FDA) aims to collaboratively transfer knowledge from source clients (domains) to the related but different target client, without communicating the local data of any client. Moreover, the source clients have different data distributions, leading to extremely challenging in knowledge transfer. Despite the recent progress in FDA, we empirically find that existing methods… ▽ More

    Submitted 7 May, 2023; originally announced May 2023.

    Comments: 13 pages

  21. arXiv:2304.07454  [pdf, other

    cs.HC cs.NI

    Realizing Immersive Communications in Human Digital Twin by Edge Computing Empowered Tactile Internet: Visions and Case Study

    Authors: Hao Xiang, Changyan Yi, Kun Wu, Jiayuan Chen, Jun Cai, Dusit Niyato, Xuemin, Shen

    Abstract: Human digital twin (HDT) is expected to revolutionize the future human lifestyle and prompts the development of advanced human-centric applications (e.g., Metaverse) by bridging physical and virtual spaces. However, the fulfillment of HDT poses stringent demands on the pervasive connectivity, real-time feedback, multi-modal data transmission and ultra-high reliability, which urge the need of enabl… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

  22. arXiv:2303.17614  [pdf, other

    cs.HC cs.AI eess.SP

    Estimating Continuous Muscle Fatigue For Multi-Muscle Coordinated Exercise: A Pilot Study

    Authors: Chunzhi Yi, Baichun Wei, Wei **, Jianfei Zhu, Seungmin Rho, Zhiyuan Chen, Feng Jiang

    Abstract: Assessing the progression of muscle fatigue for daily exercises provides vital indicators for precise rehabilitation, personalized training dose, especially under the context of Metaverse. Assessing fatigue of multi-muscle coordination-involved daily exercises requires the neuromuscular features that represent the fatigue-induced characteristics of spatiotemporal adaptions of multiple muscles and… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: submitted to IEEE JBHI

  23. arXiv:2303.15107  [pdf, other

    cs.HC

    ActiveSelfHAR: Incorporating Self Training into Active Learning to Improve Cross-Subject Human Activity Recognition

    Authors: Baichun Wei, Chunzhi Yi, Qi Zhang, Haiqi Zhu, Jianfei Zhu, Feng Jiang

    Abstract: Deep learning-based human activity recognition (HAR) methods have shown great promise in the applications of smart healthcare systems and wireless body sensor network (BSN). Despite their demonstrated performance in laboratory settings, the real-world implementation of such methods is still hindered by the cross-subject issue when adapting to new users. To solve this issue, we propose ActiveSelfHA… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

  24. arXiv:2303.11163  [pdf, other

    cs.IR cs.AI cs.CL cs.CY

    Finding Similar Exercises in Retrieval Manner

    Authors: Tongwen Huang, Xihua Li, Chao Yi, Xuemin Zhao, Yunbo Cao

    Abstract: When students make a mistake in an exercise, they can consolidate it by ``similar exercises'' which have the same concepts, purposes and methods. Commonly, for a certain subject and study stage, the size of the exercise bank is in the range of millions to even tens of millions, how to find similar exercises for a given exercise becomes a crucial technical problem. Generally, we can assign a variet… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: 37th Conference on AAAI 2023 Artificial Intelligence for Education(AI4Edu)

    Journal ref: 37th Conference on AAAI 2023 Artificial Intelligence for Education(AI4Edu)

  25. arXiv:2302.14509   

    cs.LG cs.AI

    Policy Dispersion in Non-Markovian Environment

    Authors: Bohao Qu, Xiaofeng Cao, Jielong Yang, Hechang Chen, Chang Yi, Ivor W. Tsang, Yew-Soon Ong

    Abstract: Markov Decision Process (MDP) presents a mathematical framework to formulate the learning processes of agents in reinforcement learning. MDP is limited by the Markovian assumption that a reward only depends on the immediate state and action. However, a reward sometimes depends on the history of states and actions, which may result in the decision process in a non-Markovian environment. In such env… ▽ More

    Submitted 2 June, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: In further research, we found that the core content of the paper requires significant modification and that the entire paper needs to be restructured. To enhance the scientific quality and contributions of the paper, we have decided to resubmit it after completing the necessary revisions and improvements

  26. arXiv:2302.14309  [pdf, other

    cs.CV

    Temporal Coherent Test-Time Optimization for Robust Video Classification

    Authors: Chenyu Yi, Siyuan Yang, Yufei Wang, Haoliang Li, Yap-Peng Tan, Alex C. Kot

    Abstract: Deep neural networks are likely to fail when the test data is corrupted in real-world deployment (e.g., blur, weather, etc.). Test-time optimization is an effective way that adapts models to generalize to corrupted data during testing, which has been shown in the image domain. However, the techniques for improving video classification corruption robustness remain few. In this work, we propose a Te… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  27. Networking Architecture and Key Supporting Technologies for Human Digital Twin in Personalized Healthcare: A Comprehensive Survey

    Authors: Jiayuan Chen, Changyan Yi, Samuel D. Okegbile, Jun Cai, Xuemin, Shen

    Abstract: Digital twin (DT), refers to a promising technique to digitally and accurately represent actual physical entities. One typical advantage of DT is that it can be used to not only virtually replicate a system's detailed operations but also analyze the current condition, predict future behaviour, and refine the control optimization. Although DT has been widely implemented in various fields, such as s… ▽ More

    Submitted 23 June, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

  28. arXiv:2210.10311  [pdf, other

    cs.DC cs.AI

    Latency Aware Semi-synchronous Client Selection and Model Aggregation for Wireless Federated Learning

    Authors: Liangkun Yu, Xiang Sun, Rana Albelaihi, Chen Yi

    Abstract: Federated learning (FL) is a collaborative machine learning framework that requires different clients (e.g., Internet of Things devices) to participate in the machine learning model training process by training and uploading their local models to an FL server in each global iteration. Upon receiving the local models from all the clients, the FL server generates a global model by aggregating the re… ▽ More

    Submitted 28 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

  29. arXiv:2209.08326  [pdf, other

    eess.AS cs.CL

    Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

    Authors: Ye Bai, Jie Li, Wen**g Han, Hao Ni, Kaituo Xu, Zhuo Zhang, Cheng Yi, Xiaorui Wang

    Abstract: While transformers and their variant conformers show promising performance in speech recognition, the parameterized property leads to much memory cost during training and inference. Some works use cross-layer weight-sharing to reduce the parameters of the model. However, the inevitable loss of capacity harms the model performance. To address this issue, this paper proposes a parameter-efficient co… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: accepted in INTERSPEECH 2022

  30. arXiv:2208.10867  [pdf

    cs.NI

    A Quinary Coding and Matrix Structure-based Channel Hop** Algorithm for Blind Rendezvous in Cognitive Radio Networks

    Authors: Qinglin Liu, Zhiyong Lin, Zongheng Wei, Jianfeng Wen, Congming Yi, Hai Liu

    Abstract: The multi-channel blind rendezvous problem in distributed cognitive radio networks (DCRNs) refers to how users in the network can hop to the same channel at the same time slot without any prior knowledge (i.e., each user is unaware of other users' information). The channel hop** (CH) technique is a typical solution to this blind rendezvous problem. In this paper, we propose a quinary coding and… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

    Comments: 10 pages

  31. arXiv:2203.16745   

    cs.IT

    Channel Measurement and Characterization with Modified SAGE Algorithm in an Indoor Corridor at 300 GHz

    Authors: Li Yuanbo, Wang Yiqin, Chen Yi, Yu Ziming, Han Chong

    Abstract: The much higher frequencies in the Terahertz (THz) band prevent the effective utilization of channel models dedicated for microwave or millimeter-wave frequency bands. In this paper, a measurement campaign is conducted in an indoor corridor scenario at 306-321 GHz with a frequency-domain Vector Network Analyzer (VNA)-based sounder. To realize high-resolution multipath component (MPC) extraction fo… ▽ More

    Submitted 14 March, 2023; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Please find the latest version as arXiv:2212.11756

  32. arXiv:2110.06513  [pdf, other

    cs.CV

    Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions

    Authors: Chenyu Yi, Siyuan Yang, Haoliang Li, Yap-peng Tan, Alex Kot

    Abstract: The state-of-the-art deep neural networks are vulnerable to common corruptions (e.g., input data degradations, distortions, and disturbances caused by weather changes, system error, and processing). While much progress has been made in analyzing and improving the robustness of models in image understanding, the robustness in video understanding is largely unexplored. In this paper, we establish a… ▽ More

    Submitted 22 August, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPs 2021 Dataset and Benchmark Track. Our codes are available on https://github.com/Newbeeyoung/Video-Corruption-Robustness

  33. arXiv:2105.03559  [pdf, ps, other

    cs.GT cs.DC

    Applications of Auction and Mechanism Design in Edge Computing: A Survey

    Authors: Houming Qiu, Kun Zhu, Nguyen Cong Luong, Changyan Yi, Dusit Niyato, Dong In Kim

    Abstract: Edge computing as a promising technology provides lower latency, more efficient transmission, and faster speed of data processing since the edge servers are closer to the user devices. Each edge server with limited resources can offload latency-sensitive and computation-intensive tasks from nearby user devices. However, edge computing faces challenges such as resource allocation, energy consumptio… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.

  34. arXiv:2101.06699  [pdf, other

    cs.CL cs.SD eess.AS

    Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition

    Authors: Cheng Yi, Shiyu Zhou, Bo Xu

    Abstract: End-to-end models have achieved impressive results on the task of automatic speech recognition (ASR). For low-resource ASR tasks, however, labeled data can hardly satisfy the demand of end-to-end models. Self-supervised acoustic pre-training has already shown its amazing ASR performance, while the transcription is still inadequate for language modeling in end-to-end models. In this work, we fuse a… ▽ More

    Submitted 24 January, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

  35. arXiv:2012.12121  [pdf, other

    cs.CL

    Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages

    Authors: Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

    Abstract: There are several domains that own corresponding widely used feature extractors, such as ResNet, BERT, and GPT-x. These models are usually pre-trained on large amounts of unlabeled data by self-supervision and can be effectively applied to downstream tasks. In the speech domain, wav2vec2.0 starts to show its powerful representation ability and feasibility of ultra-low resource speech recognition o… ▽ More

    Submitted 17 January, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

  36. arXiv:2012.01101  [pdf

    cs.AI cs.MA

    Multi-Objective Optimization of the Textile Manufacturing Process Using Deep-Q-Network Based Multi-Agent Reinforcement Learning

    Authors: Zhenglei He, Kim Phuc Tran, Sebastien Thomassey, Xianyi Zeng, Jie Xu, Changhai Yi

    Abstract: Multi-objective optimization of the textile manufacturing process is an increasing challenge because of the growing complexity involved in the development of the textile industry. The use of intelligent techniques has been often discussed in this domain, although a significant improvement from certain successful applications has been reported, the traditional methods failed to work with high-as we… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

  37. arXiv:2005.14038  [pdf, other

    cs.DC

    HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism

    Authors: Jay H. Park, Gyeongchan Yun, Chang M. Yi, Nguyen T. Nguyen, Seungmin Lee, Jaesik Choi, Sam H. Noh, Young-ri Choi

    Abstract: Deep Neural Network (DNN) models have continuously been growing in size in order to improve the accuracy and quality of the models. Moreover, for training of large DNN models, the use of heterogeneous GPUs is inevitable due to the short release cycle of new GPU architectures. In this paper, we investigate how to enable training of large DNN models on a heterogeneous GPU cluster that possibly inclu… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

  38. arXiv:2005.10113  [pdf, other

    eess.AS cs.CL cs.SD

    A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition

    Authors: Linhao Dong, Cheng Yi, Jianzong Wang, Shiyu Zhou, Shuang Xu, Xueli Jia, Bo Xu

    Abstract: End-to-end models are gaining wider attention in the field of automatic speech recognition (ASR). One of their advantages is the simplicity of building that directly recognizes the speech frame sequence into the text label sequence by neural networks. According to the driving end in the recognition process, end-to-end ASR models could be categorized into two types: label-synchronous and frame-sync… ▽ More

    Submitted 25 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: 4 pages, 2 figures

  39. arXiv:2005.09867  [pdf

    cs.LG cs.AI

    A reinforcement learning based decision support system in textile manufacturing process

    Authors: Zhenglei He, Kim Phuc Tran, Sébastien Thomassey, Xianyi Zeng, Changhai Yi

    Abstract: This paper introduced a reinforcement learning based decision support system in textile manufacturing process. A solution optimization problem of color fading ozonation is discussed and set up as a Markov Decision Process (MDP) in terms of tuple {S, A, P, R}. Q-learning is used to train an agent in the interaction with the setup environment by accumulating the reward R. According to the applicatio… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Journal ref: 15th International Conference on Intelligent Systems and Knowledge Engineering (ISKE2020), Aug 2020, Cologne, Germany

  40. Time-Series Anomaly Detection Service at Microsoft

    Authors: Hansheng Ren, Bixiong Xu, Yu**g Wang, Chao Yi, Congrui Huang, Xiaoyu Kou, Tony Xing, Mao Yang, Jie Tong, Qi Zhang

    Abstract: Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: KDD 2019

  41. arXiv:1808.09074  [pdf, other

    cs.HC

    EmbeddingVis: A Visual Analytics Approach to Comparative Network Embedding Inspection

    Authors: Quan Li, Kristanto Sean Njotoprawiro, Hammad Haleem, Qiaoan Chen, Chris Yi, Xiaojuan Ma

    Abstract: Constructing latent vector representation for nodes in a network through embedding models has shown its practicality in many graph analysis applications, such as node classification, clustering, and link prediction. However, despite the high efficiency and accuracy of learning an embedding model, people have little clue of what information about the original network is preserved in the embedding v… ▽ More

    Submitted 27 August, 2018; originally announced August 2018.

    Comments: Proceedings of IEEE VIS 2018 (VAST 2018) (to appear), Berlin, Germany, Oct 21-26, 2018

  42. arXiv:1803.04042  [pdf, other

    cs.LG stat.ML

    Interpreting Deep Classifier by Visual Distillation of Dark Knowledge

    Authors: Kai Xu, Dae Hoon Park, Chang Yi, Charles Sutton

    Abstract: Interpreting black box classifiers, such as deep networks, allows an analyst to validate a classifier before it is deployed in a high-stakes setting. A natural idea is to visualize the deep network's representations, so as to "see what the network sees". In this paper, we demonstrate that standard dimension reduction methods in this setting can yield uninformative or even misleading visualizations… ▽ More

    Submitted 11 March, 2018; originally announced March 2018.