Search | arXiv e-print repository

Semantic Deep Hiding for Robust Unlearnable Examples

Authors: Ruohan Meng, Chenyu Yi, Yi Yu, Siyuan Yang, Bingquan Shen, Alex C. Kot

Abstract: Ensuring data privacy and protection has become paramount in the era of deep learning. Unlearnable examples are proposed to mislead the deep learning models and prevent data from unauthorized exploration by adding small perturbations to data. However, such perturbations (e.g., noise, texture, color change) predominantly impact low-level features, making them vulnerable to common countermeasures. I… ▽ More Ensuring data privacy and protection has become paramount in the era of deep learning. Unlearnable examples are proposed to mislead the deep learning models and prevent data from unauthorized exploration by adding small perturbations to data. However, such perturbations (e.g., noise, texture, color change) predominantly impact low-level features, making them vulnerable to common countermeasures. In contrast, semantic images with intricate shapes have a wealth of high-level features, making them more resilient to countermeasures and potential for producing robust unlearnable examples. In this paper, we propose a Deep Hiding (DH) scheme that adaptively hides semantic images enriched with high-level features. We employ an Invertible Neural Network (INN) to invisibly integrate predefined images, inherently hiding them with deceptive perturbations. To enhance data unlearnability, we introduce a Latent Feature Concentration module, designed to work with the INN, regularizing the intra-class variance of these perturbations. To further boost the robustness of unlearnable examples, we design a Semantic Images Generation module that produces hidden semantic images. By utilizing similar semantic information, this module generates similar semantic images for samples within the same classes, thereby enlarging the inter-class distance and narrowing the intra-class distance. Extensive experiments on CIFAR-10, CIFAR-100, and an ImageNet subset, against 18 countermeasures, reveal that our proposed method exhibits outstanding robustness for unlearnable examples, demonstrating its efficacy in preventing unauthorized data exploitation. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Accepted by TIFS 2024

arXiv:2406.02539 [pdf, other]

Parrot: Multilingual Visual Instruction Tuning

Authors: Hai-Long Sun, Da-Wei Zhou, Yang Li, Shiyin Lu, Chao Yi, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, De-Chuan Zhan, Han-Jia Ye

Abstract: The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training p… ▽ More The rapid development of Multimodal Large Language Models (MLLMs) like GPT-4V has marked a significant step towards artificial general intelligence. Existing methods mainly focus on aligning vision encoders with LLMs through supervised fine-tuning (SFT) to endow LLMs with multimodal abilities, making MLLMs' inherent ability to react to multiple languages progressively deteriorate as the training process evolves. We empirically find that the imbalanced SFT datasets, primarily composed of English-centric image-text pairs, lead to significantly reduced performance in non-English languages. This is due to the failure of aligning the vision encoder and LLM with multilingual tokens during the SFT process. In this paper, we introduce Parrot, a novel method that utilizes textual guidance to drive visual token alignment at the language level. Parrot makes the visual tokens condition on diverse language inputs and uses Mixture-of-Experts (MoE) to promote the alignment of multilingual tokens. Specifically, to enhance non-English visual tokens alignment, we compute the cross-attention using the initial visual features and textual embeddings, the result of which is then fed into the MoE router to select the most relevant experts. The selected experts subsequently convert the initial visual tokens into language-specific visual tokens. Moreover, considering the current lack of benchmarks for evaluating multilingual capabilities within the field, we collect and make available a Massive Multilingual Multimodal Benchmark which includes 6 languages, 15 categories, and 12,000 questions, named as MMMB. Our method not only demonstrates state-of-the-art performance on multilingual MMBench and MMMB, but also excels across a broad range of multimodal tasks. Both the source code and the training dataset of Parrot will be made publicly available. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2404.17753 [pdf, other]

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

Authors: Chao Yi, Lu Ren, De-Chuan Zhan, Han-Jia Ye

Abstract: CLIP showcases exceptional cross-modal matching capabilities due to its training on image-text contrastive learning tasks. However, without specific optimization for unimodal scenarios, its performance in single-modality feature extraction might be suboptimal. Despite this, some studies have directly used CLIP's image encoder for tasks like few-shot classification, introducing a misalignment betwe… ▽ More CLIP showcases exceptional cross-modal matching capabilities due to its training on image-text contrastive learning tasks. However, without specific optimization for unimodal scenarios, its performance in single-modality feature extraction might be suboptimal. Despite this, some studies have directly used CLIP's image encoder for tasks like few-shot classification, introducing a misalignment between its pre-training objectives and feature extraction methods. This inconsistency can diminish the quality of the image's feature representation, adversely affecting CLIP's effectiveness in target tasks. In this paper, we view text features as precise neighbors of image features in CLIP's space and present a novel CrOss-moDal nEighbor Representation(CODER) based on the distance structure between images and their neighbor texts. This feature extraction method aligns better with CLIP's pre-training objectives, thereby fully leveraging CLIP's robust cross-modal capabilities. The key to construct a high-quality CODER lies in how to create a vast amount of high-quality and diverse texts to match with images. We introduce the Auto Text Generator(ATG) to automatically generate the required texts in a data-free and training-free manner. We apply CODER to CLIP's zero-shot and few-shot image classification tasks. Experiment results across various datasets and models confirm CODER's effectiveness. Code is available at:https://github.com/YCaigogogo/CVPR24-CODER. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2403.13797 [pdf, other]

Bridge the Modality and Capacity Gaps in Vision-Language Model Selection

Authors: Chao Yi, De-Chuan Zhan, Han-Jia Ye

Abstract: Vision Language Models (VLMs) excel in zero-shot image classification by pairing images with textual category names. The expanding variety of Pre-Trained VLMs enhances the likelihood of identifying a suitable VLM for specific tasks. Thus, a promising zero-shot image classification strategy is selecting the most appropriate Pre-Trained VLM from the VLM Zoo, relying solely on the text data of the ta… ▽ More Vision Language Models (VLMs) excel in zero-shot image classification by pairing images with textual category names. The expanding variety of Pre-Trained VLMs enhances the likelihood of identifying a suitable VLM for specific tasks. Thus, a promising zero-shot image classification strategy is selecting the most appropriate Pre-Trained VLM from the VLM Zoo, relying solely on the text data of the target dataset without access to the dataset's images. In this paper, we analyze two inherent challenges in assessing the ability of a VLM in this Language-Only VLM selection: the "Modality Gap" -- the disparity in VLM's embeddings across two different modalities, making text a less reliable substitute for images; and the "Capability Gap" -- the discrepancy between the VLM's overall ranking and its ranking for target dataset, hindering direct prediction of a model's dataset-specific performance from its general performance. We propose VLM Selection With gAp Bridging (SWAB) to mitigate the negative impact of these two gaps. SWAB first adopts optimal transport to capture the relevance between open-source datasets and target dataset with a transportation matrix. It then uses this matrix to transfer useful statistics of VLMs from open-source datasets to the target dataset for bridging those two gaps and enhancing the VLM's capacity estimation for VLM selection. Experiments across various VLMs and image classification datasets validate SWAB's effectiveness. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13237 [pdf, ps, other]

Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0

Authors: Jiana Liao, **bo Wen, Jiawen Kang, Changyan Yi, Yang Zhang, Yutao Jiao, Dusit Niyato, Dong In Kim, Shengli Xie

Abstract: Web 3.0 is recognized as a pioneering paradigm that empowers users to securely oversee data without reliance on a centralized authority. Blockchains, as a core technology to realize Web 3.0, can facilitate decentralized and transparent data management. Nevertheless, the evolution of blockchain-enabled Web 3.0 is still in its nascent phase, grappling with challenges such as ensuring efficiency and… ▽ More Web 3.0 is recognized as a pioneering paradigm that empowers users to securely oversee data without reliance on a centralized authority. Blockchains, as a core technology to realize Web 3.0, can facilitate decentralized and transparent data management. Nevertheless, the evolution of blockchain-enabled Web 3.0 is still in its nascent phase, grappling with challenges such as ensuring efficiency and reliability to enhance block propagation performance. In this paper, we design a Graph Attention Network (GAT)-based reliable block propagation optimization framework for blockchain-enabled Web 3.0. We first innovatively apply a data-freshness metric called age of block to measure block propagation efficiency in public blockchains. To achieve the reliability of block propagation, we introduce a reputation mechanism based on the subjective logic model, including the local and recommended opinions to calculate the miner reputation value. Moreover, considering that the GAT possesses the excellent ability to process graph-structured data, we utilize the GAT with reinforcement learning to obtain the optimal block propagation trajectory. Numerical results demonstrate that the proposed scheme exhibits the most outstanding block propagation efficiency and reliability compared with traditional routing mechanisms. △ Less

Submitted 8 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2402.18936 [pdf, ps, other]

Energy-Efficient UAV Swarm Assisted MEC with Dynamic Clustering and Scheduling

Authors: Jialiuyuan Li, Jiayuan Chen, Changyan Yi, Tong Zhang, Kun Zhu, Jun Cai

Abstract: In this paper, the energy-efficient unmanned aerial vehicle (UAV) swarm assisted mobile edge computing (MEC) with dynamic clustering and scheduling is studied. In the considered system model, UAVs are divided into multiple swarms, with each swarm consisting of a leader UAV and several follower UAVs to provide computing services to end-users. Unlike existing work, we allow UAVs to dynamically clust… ▽ More In this paper, the energy-efficient unmanned aerial vehicle (UAV) swarm assisted mobile edge computing (MEC) with dynamic clustering and scheduling is studied. In the considered system model, UAVs are divided into multiple swarms, with each swarm consisting of a leader UAV and several follower UAVs to provide computing services to end-users. Unlike existing work, we allow UAVs to dynamically cluster into different swarms, i.e., each follower UAV can change its leader based on the time-varying spatial positions, updated application placement, etc. in a dynamic manner. Meanwhile, UAVs are required to dynamically schedule their energy replenishment, application placement, trajectory planning and task delegation. With the aim of maximizing the long-term energy efficiency of the UAV swarm assisted MEC system, a joint optimization problem of dynamic clustering and scheduling is formulated. Taking into account the underlying cooperation and competition among intelligent UAVs, we further reformulate this optimization problem as a combination of a series of strongly coupled multi-agent stochastic games, and then propose a novel reinforcement learning-based UAV swarm dynamic coordination (RLDC) algorithm for obtaining the equilibrium. Simulations are conducted to evaluate the performance of the RLDC algorithm and demonstrate its superiority over counterparts. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.18927 [pdf, other]

Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering

Authors: Xiang Chen, Wenjie Zhu, Jiayuan Chen, Tong Zhang, Changyan Yi, Jun Cai

Abstract: This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM). TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred… ▽ More This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM). TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred by an object detection model. ROIM determines each offloading frame's resolution and detection model configuration to ensure that the analysis results can return in time. TAODM and ROIM interact jointly to filter the repetitive spatial-temporal semantic information to maximize the processing rate while ensuring high video analysis accuracy. Unlike most existing works, this paper investigates the real-time video analysis systems where the intelligent visual device connects to the edge server through a wireless network with fluctuating network conditions. We decompose the real-time video analysis problem into the offloading decision and configurations selection sub-problems. To solve these two sub-problems, we introduce a double deep Q network (DDQN) based offloading approach and a contextual multi-armed bandit (CMAB) based adaptive configurations selection approach, respectively. A DDQN-CMAB reinforcement learning (DCRL) training framework is further developed to integrate these two approaches to improve the overall video analyzing performance. Extensive simulations are conducted to evaluate the performance of the proposed solution, and demonstrate its superiority over counterparts. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.11500 [pdf, other]

A Three-Party Repeated Coalition Formation Game for PLS in Wireless Communications with IRSs

Authors: Haipeng Zhou, Ruoyang Chen, Changyan Yi, Juan Li, Jun Cai

Abstract: In this paper, a repeated coalition formation game (RCFG) with dynamic decision-making for physical layer security (PLS) in wireless communications with intelligent reflecting surfaces (IRSs) has been investigated. In the considered system, one central legitimate transmitter (LT) aims to transmit secret signals to a group of legitimate receivers (LRs) under the threat of a proactive eavesdropper (… ▽ More In this paper, a repeated coalition formation game (RCFG) with dynamic decision-making for physical layer security (PLS) in wireless communications with intelligent reflecting surfaces (IRSs) has been investigated. In the considered system, one central legitimate transmitter (LT) aims to transmit secret signals to a group of legitimate receivers (LRs) under the threat of a proactive eavesdropper (EV), while there exist a number of third-party IRSs (TIRSs) which can choose to form a coalition with either legitimate pairs (LPs) or the EV to improve their respective performances in exchange for potential benefits (e.g., payments). Unlike existing works that commonly restricted to friendly IRSs or malicious IRSs only, we study the complicated dynamic ally-adversary relationships among LPs, EV and TIRSs, under unpredictable wireless channel conditions, and introduce a RCFG to model their long-term strategic interactions. Particularly, we first analyze the existence of Nash equilibrium (NE) in the formulated RCFG, and then propose a switch operations-based coalition selection along with a deep reinforcement learning (DRL)-based algorithm for obtaining such equilibrium. Simulations examine the feasibility of the proposed algorithm and show its superiority over counterparts. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: Accepted to IEEE WCNC 2024

arXiv:2401.16710 [pdf, other]

Dynamic Human Digital Twin Deployment at the Edge for Task Execution: A Two-Timescale Accuracy-Aware Online Optimization

Authors: Yuye Yang, You Shi, Changyan Yi, Jun Cai, Jiawen Kang, Dusit Niyato, Xuemin, Shen

Abstract: Human digital twin (HDT) is an emerging paradigm that bridges physical twins (PTs) with powerful virtual twins (VTs) for assisting complex task executions in human-centric services. In this paper, we study a two-timescale online optimization for building HDT under an end-edge-cloud collaborative framework. As a unique feature of HDT, we consider that PTs' corresponding VTs are deployed on edge ser… ▽ More Human digital twin (HDT) is an emerging paradigm that bridges physical twins (PTs) with powerful virtual twins (VTs) for assisting complex task executions in human-centric services. In this paper, we study a two-timescale online optimization for building HDT under an end-edge-cloud collaborative framework. As a unique feature of HDT, we consider that PTs' corresponding VTs are deployed on edge servers, consisting of not only generic models placed by downloading experiential knowledge from the cloud but also customized models updated by collecting personalized data from end devices. To maximize task execution accuracy with stringent energy and delay constraints, and by taking into account HDT's inherent mobility and status variation uncertainties, we jointly and dynamically optimize VTs' construction and PTs' task offloading, along with communication and computation resource allocations. Observing that decision variables are asynchronous with different triggers, we propose a novel two-timescale accuracy-aware online optimization approach (TACO). Specifically, TACO utilizes an improved Lyapunov method to decompose the problem into multiple instant ones, and then leverages piecewise McCormick envelopes and block coordinate descent based algorithms, addressing two timescales alternately. Theoretical analyses and simulations show that the proposed approach can reach asymptotic optimum within a polynomial-time complexity, and demonstrate its superiority over counterparts. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.13699 [pdf, other]

Generative AI-Driven Human Digital Twin in IoT-Healthcare: A Comprehensive Survey

Authors: Jiayuan Chen, You Shi, Changyan Yi, Hongyang Du, Jiawen Kang, Dusit Niyato

Abstract: The Internet of things (IoT) can significantly enhance the quality of human life, specifically in healthcare, attracting extensive attentions to IoT-healthcare services. Meanwhile, the human digital twin (HDT) is proposed as an innovative paradigm that can comprehensively characterize the replication of the individual human body in the digital world and reflect its physical status in real time. Na… ▽ More The Internet of things (IoT) can significantly enhance the quality of human life, specifically in healthcare, attracting extensive attentions to IoT-healthcare services. Meanwhile, the human digital twin (HDT) is proposed as an innovative paradigm that can comprehensively characterize the replication of the individual human body in the digital world and reflect its physical status in real time. Naturally, HDT is envisioned to empower IoT-healthcare beyond the application of healthcare monitoring by acting as a versatile and vivid human digital testbed, simulating the outcomes and guiding the practical treatments. However, successfully establishing HDT requires high-fidelity virtual modeling and strong information interactions but possibly with scarce, biased and noisy data. Fortunately, a recent popular technology called generative artificial intelligence (GAI) may be a promising solution because it can leverage advanced AI algorithms to automatically create, manipulate, and modify valuable while diverse data. This survey particularly focuses on the implementation of GAI-driven HDT in IoT-healthcare. We start by introducing the background of IoT-healthcare and the potential of GAI-driven HDT. Then, we delve into the fundamental techniques and present the overall framework of GAI-driven HDT. After that, we explore the realization of GAI-driven HDT in detail, including GAI-enabled data acquisition, communication, data management, digital modeling, and data analysis. Besides, we discuss typical IoT-healthcare applications that can be revolutionized by GAI-driven HDT, namely personalized health monitoring and diagnosis, personalized prescription, and personalized rehabilitation. Finally, we conclude this survey by highlighting some future research directions. △ Less

Submitted 28 June, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.02705 [pdf, other]

XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model

Authors: Zhitao Wang, Wei Wang, Zirao Li, Long Wang, Can Yi, Xinjie Xu, Luyang Cao, Han**g Su, Shouzhi Chen, Jun Zhou

Abstract: In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting… ▽ More In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting the automation level of the current system, particularly the stage of test scripts generation. With recent notable successes, large language models (LLMs) demonstrate significant potential in attaining human-like intelligence and there has been a growing research area that employs LLMs as autonomous agents to obtain human-like decision-making capabilities. Inspired by these works, we propose an LLM-powered multi-agent collaborative system, named XUAT-Copilot, for automated UAT. The proposed system mainly consists of three LLM-based agents responsible for action planning, state checking and parameter selecting, respectively, and two additional modules for state sensing and case rewriting. The agents interact with testing device, make human-like decision and generate action command in a collaborative way. The proposed multi-agent system achieves a close effectiveness to human testers in our experimental studies and gains a significant improvement of Pass@1 accuracy compared with single-agent architecture. More importantly, the proposed system has launched in the formal testing environment of WeChat Pay mobile app, which saves a considerable amount of manpower in the daily development work. △ Less

Submitted 10 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

arXiv:2312.12063 [pdf, other]

Resource-efficient Generative Mobile Edge Networks in 6G Era: Fundamentals, Framework and Case Study

Authors: Bingkun Lai, **bo Wen, Jiawen Kang, Hongyang Du, Jiangtian Nie, Changyan Yi, Dong In Kim, Shengli Xie

Abstract: As the next-generation wireless communication system, Sixth-Generation (6G) technologies are emerging, enabling various mobile edge networks that can revolutionize wireless communication and connectivity. By integrating Generative Artificial Intelligence (GAI) with mobile edge networks, generative mobile edge networks possess immense potential to enhance the intelligence and efficiency of wireless… ▽ More As the next-generation wireless communication system, Sixth-Generation (6G) technologies are emerging, enabling various mobile edge networks that can revolutionize wireless communication and connectivity. By integrating Generative Artificial Intelligence (GAI) with mobile edge networks, generative mobile edge networks possess immense potential to enhance the intelligence and efficiency of wireless communication networks. In this article, we propose the concept of generative mobile edge networks and overview widely adopted GAI technologies and their applications in mobile edge networks. We then discuss the potential challenges faced by generative mobile edge networks in resource-constrained scenarios. To address these challenges, we develop a universal resource-efficient generative incentive mechanism framework, in which we design resource-efficient methods for network overhead reduction, formulate appropriate incentive mechanisms for the resource allocation problem, and utilize Generative Diffusion Models (GDMs) to find the optimal incentive mechanism solutions. Furthermore, we conduct a case study on resource-constrained mobile edge networks, employing model partition for efficient AI task offloading and proposing a GDM-based Stackelberg model to motivate edge devices to contribute computing resources for mobile edge intelligence. Finally, we propose several open directions that could contribute to the future popularity of generative mobile edge networks. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.02896 [pdf, other]

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Authors: Rizhao Cai, Zirui Song, Dayan Guan, Zhenhao Chen, Xing Luo, Chenyu Yi, Alex Kot

Abstract: Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown remarkable capabilities in visual reasoning with common image styles. However, their robustness against diverse style shifts, crucial for practical applications, remains largely unexplored. In this paper, we propose a new benchmark, BenchLMM, to assess the robustness of LMMs against three different styles: artistic image style, ima… ▽ More Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown remarkable capabilities in visual reasoning with common image styles. However, their robustness against diverse style shifts, crucial for practical applications, remains largely unexplored. In this paper, we propose a new benchmark, BenchLMM, to assess the robustness of LMMs against three different styles: artistic image style, imaging sensor style, and application style, where each style has five sub-styles. Utilizing BenchLMM, we comprehensively evaluate state-of-the-art LMMs and reveal: 1) LMMs generally suffer performance degradation when working with other styles; 2) An LMM performs better than another model in common style does not guarantee its superior performance in other styles; 3) LMMs' reasoning capability can be enhanced by prompting LMMs to predict the style first, based on which we propose a versatile and training-free method for improving LMMs; 4) An intelligent LMM is expected to interpret the causes of its errors when facing stylistic variations. We hope that our benchmark and analysis can shed new light on develo** more intelligent and versatile LMMs. △ Less

Submitted 5 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: Code is available at https://github.com/AIFEG/BenchLMM

arXiv:2310.05341 [pdf, other]

A Critical Look at Classic Test-Time Adaptation Methods in Semantic Segmentation

Authors: Chang'an Yi, Haotian Chen, Yifan Zhang, Yonghui Xu, Lizhen Cui

Abstract: Test-time adaptation (TTA) aims to adapt a model, initially trained on training data, to potential distribution shifts in the test data. Most existing TTA studies, however, focus on classification tasks, leaving a notable gap in the exploration of TTA for semantic segmentation. This pronounced emphasis on classification might lead numerous newcomers and engineers to mistakenly assume that classic… ▽ More Test-time adaptation (TTA) aims to adapt a model, initially trained on training data, to potential distribution shifts in the test data. Most existing TTA studies, however, focus on classification tasks, leaving a notable gap in the exploration of TTA for semantic segmentation. This pronounced emphasis on classification might lead numerous newcomers and engineers to mistakenly assume that classic TTA methods designed for classification can be directly applied to segmentation. Nonetheless, this assumption remains unverified, posing an open question. To address this, we conduct a systematic, empirical study to disclose the unique challenges of segmentation TTA, and to determine whether classic TTA strategies can effectively address this task. Our comprehensive results have led to three key observations. First, the classic batch norm updating strategy, commonly used in classification TTA, only brings slight performance improvement, and in some cases it might even adversely affect the results. Even with the application of advanced distribution estimation techniques like batch renormalization, the problem remains unresolved. Second, the teacher-student scheme does enhance training stability for segmentation TTA in the presence of noisy pseudo-labels. However, it cannot directly result in performance improvement compared to the original model without TTA. Third, segmentation TTA suffers a severe long-tailed imbalance problem, which is substantially more complex than that in TTA for classification. This long-tailed challenge significantly affects segmentation TTA performance, even when the accuracy of pseudo-labels is high. In light of these observations, we conclude that TTA for segmentation presents significant challenges, and simply using classic TTA methods cannot address this problem well. △ Less

Submitted 11 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

arXiv:2309.09984 [pdf]

BDEC:Brain Deep Embedded Clustering model

Authors: Xiaoxiao Ma, Chunzhi Yi, Zhicai Zhong, Hui Zhou, Baichun Wei, Haiqi Zhu, Feng Jiang

Abstract: An essential premise for neuroscience brain network analysis is the successful segmentation of the cerebral cortex into functionally homogeneous regions. Resting-state functional magnetic resonance imaging (rs-fMRI), capturing the spontaneous activities of the brain, provides the potential for cortical parcellation. Previous parcellation methods can be roughly categorized into three groups, mainly… ▽ More An essential premise for neuroscience brain network analysis is the successful segmentation of the cerebral cortex into functionally homogeneous regions. Resting-state functional magnetic resonance imaging (rs-fMRI), capturing the spontaneous activities of the brain, provides the potential for cortical parcellation. Previous parcellation methods can be roughly categorized into three groups, mainly employing either local gradient, global similarity, or a combination of both. The traditional clustering algorithms, such as "K-means" and "Spectral clustering" may affect the reproducibility or the biological interpretation of parcellations; The region growing-based methods influence the expression of functional homogeneity in the brain at a large scale; The parcellation method based on probabilistic graph models inevitably introduce model assumption biases. In this work, we develop an assumption-free model called as BDEC, which leverages the robust data fitting capability of deep learning. To the best of our knowledge, this is the first study that uses deep learning algorithm for rs-fMRI-based parcellation. By comparing with nine commonly used brain parcellation methods, the BDEC model demonstrates significantly superior performance in various functional homogeneity indicators. Furthermore, it exhibits favorable results in terms of validity, network analysis, task homogeneity, and generalization capability. These results suggest that the BDEC parcellation captures the functional characteristics of the brain and holds promise for future voxel-wise brain network analysis in the dimensionality reduction of fMRI data. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2308.09158 [pdf, other]

ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse

Authors: Yi-Kai Zhang, Lu Ren, Chao Yi, Qi-Wei Wang, De-Chuan Zhan, Han-Jia Ye

Abstract: The rapid expansion of foundation pre-trained models and their fine-tuned counterparts has significantly contributed to the advancement of machine learning. Leveraging pre-trained models to extract knowledge and expedite learning in real-world tasks, known as "Model Reuse", has become crucial in various applications. Previous research focuses on reusing models within a certain aspect, including re… ▽ More The rapid expansion of foundation pre-trained models and their fine-tuned counterparts has significantly contributed to the advancement of machine learning. Leveraging pre-trained models to extract knowledge and expedite learning in real-world tasks, known as "Model Reuse", has become crucial in various applications. Previous research focuses on reusing models within a certain aspect, including reusing model weights, structures, and hypothesis spaces. This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend. ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference. This empowers deep learning practitioners to explore downstream tasks and identify the complementary advantages among different methods. ZhiJian is readily accessible at https://github.com/zhangyikaii/lamda-zhijian facilitating seamless utilization of pre-trained models and streamlining the model reuse process for researchers and developers. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2307.12115 [pdf, other]

A Revolution of Personalized Healthcare: Enabling Human Digital Twin with Mobile AIGC

Authors: Jiayuan Chen, Changyan Yi, Hongyang Du, Dusit Niyato, Jiawen Kang, Jun Cai, Xuemin, Shen

Abstract: Mobile Artificial Intelligence-Generated Content (AIGC) technology refers to the adoption of AI algorithms deployed at mobile edge networks to automate the information creation process while fulfilling the requirements of end users. Mobile AIGC has recently attracted phenomenal attentions and can be a key enabling technology for an emerging application, called human digital twin (HDT). HDT empower… ▽ More Mobile Artificial Intelligence-Generated Content (AIGC) technology refers to the adoption of AI algorithms deployed at mobile edge networks to automate the information creation process while fulfilling the requirements of end users. Mobile AIGC has recently attracted phenomenal attentions and can be a key enabling technology for an emerging application, called human digital twin (HDT). HDT empowered by the mobile AIGC is expected to revolutionize the personalized healthcare by generating rare disease data, modeling high-fidelity digital twin, building versatile testbeds, and providing 24/7 customized medical services. To promote the development of this new breed of paradigm, in this article, we propose a system architecture of mobile AIGC-driven HDT and highlight the corresponding design requirements and challenges. Moreover, we illustrate two use cases, i.e., mobile AIGC-driven HDT in customized surgery planning and personalized medication. In addition, we conduct an experimental study to prove the effectiveness of the proposed mobile AIGC-driven HDT solution, which shows a particular application in a virtual physical therapy teaching platform. Finally, we conclude this article by briefly discussing several open issues and future directions. △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2306.07008 [pdf, other]

Quantum Phase Estimation by Compressed Sensing

Authors: Changhao Yi, Cunlu Zhou, Jun Takahashi

Abstract: As a signal recovery algorithm, compressed sensing is particularly useful when the data has low-complexity and samples are rare, which matches perfectly with the task of quantum phase estimation (QPE). In this work we present a new Heisenberg-limited QPE algorithm for early quantum computers based on compressed sensing. More specifically, given many copies of a proper initial state and queries to… ▽ More As a signal recovery algorithm, compressed sensing is particularly useful when the data has low-complexity and samples are rare, which matches perfectly with the task of quantum phase estimation (QPE). In this work we present a new Heisenberg-limited QPE algorithm for early quantum computers based on compressed sensing. More specifically, given many copies of a proper initial state and queries to some unitary operators, our algorithm is able to recover the frequency with a total runtime $\mathcal{O}(ε^{-1}\text{poly}\log(ε^{-1}))$, where $ε$ is the accuracy. Moreover, the maximal runtime satisfies $T_{\max}ε\ll π$, which is comparable to the state of art algorithms, and our algorithm is also robust against certain amount of noise from sampling. We also consider the more general quantum eigenvalue estimation problem (QEEP) and show numerically that the off-grid compressed sensing can be a strong candidate for solving the QEEP. △ Less

Submitted 18 December, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2305.11392 [pdf, other]

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

Authors: Mingliang Zhai, Yulin Li, Xiameng Qin, Chen Yi, Qunyi Xie, Chengquan Zhang, Kun Yao, Yuwei Wu, Yunde Jia

Abstract: Transformers achieve promising performance in document understanding because of their high effectiveness and still suffer from quadratic computational complexity dependency on the sequence length. General efficient transformers are challenging to be directly adapted to model document. They are unable to handle the layout representation in documents, e.g. word, line and paragraph, on different gran… ▽ More Transformers achieve promising performance in document understanding because of their high effectiveness and still suffer from quadratic computational complexity dependency on the sequence length. General efficient transformers are challenging to be directly adapted to model document. They are unable to handle the layout representation in documents, e.g. word, line and paragraph, on different granularity levels and seem hard to achieve a good trade-off between efficiency and performance. To tackle the concerns, we propose Fast-StrucTexT, an efficient multi-modal framework based on the StrucTexT algorithm with an hourglass transformer architecture, for visual document understanding. Specifically, we design a modality-guided dynamic token merging block to make the model learn multi-granularity representation and prunes redundant tokens. Additionally, we present a multi-modal interaction module called Symmetry Cross Attention (SCA) to consider multi-modal fusion and efficiently guide the token mergence. The SCA allows one modality input as query to calculate cross attention with another modality in a dual phase. Extensive experiments on FUNSD, SROIE, and CORD datasets demonstrate that our model achieves the state-of-the-art performance and almost 1.9X faster inference time than the state-of-the-art methods. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: IJCAI 2023

arXiv:2305.10432 [pdf, other]

Model-Contrastive Federated Domain Adaptation

Authors: Chang'an Yi, Haotian Chen, Yonghui Xu, Yifan Zhang

Abstract: Federated domain adaptation (FDA) aims to collaboratively transfer knowledge from source clients (domains) to the related but different target client, without communicating the local data of any client. Moreover, the source clients have different data distributions, leading to extremely challenging in knowledge transfer. Despite the recent progress in FDA, we empirically find that existing methods… ▽ More Federated domain adaptation (FDA) aims to collaboratively transfer knowledge from source clients (domains) to the related but different target client, without communicating the local data of any client. Moreover, the source clients have different data distributions, leading to extremely challenging in knowledge transfer. Despite the recent progress in FDA, we empirically find that existing methods can not leverage models of heterogeneous domains and thus they fail to achieve excellent performance. In this paper, we propose a model-based method named FDAC, aiming to address {\bf F}ederated {\bf D}omain {\bf A}daptation based on {\bf C}ontrastive learning and Vision Transformer (ViT). In particular, contrastive learning can leverage the unlabeled data to train excellent models and the ViT architecture performs better than convolutional neural networks (CNNs) in extracting adaptable features. To the best of our knowledge, FDAC is the first attempt to learn transferable representations by manipulating the latent architecture of ViT under the federated setting. Furthermore, FDAC can increase the target data diversity by compensating from each source model with insufficient knowledge of samples and features, based on domain augmentation and semantic matching. Extensive experiments on several real datasets demonstrate that FDAC outperforms all the comparative methods in most conditions. Moreover, FDCA can also improve communication efficiency which is another key factor in the federated setting. △ Less

Submitted 7 May, 2023; originally announced May 2023.

Comments: 13 pages

arXiv:2304.07454 [pdf, other]

Realizing Immersive Communications in Human Digital Twin by Edge Computing Empowered Tactile Internet: Visions and Case Study

Authors: Hao Xiang, Changyan Yi, Kun Wu, Jiayuan Chen, Jun Cai, Dusit Niyato, Xuemin, Shen

Abstract: Human digital twin (HDT) is expected to revolutionize the future human lifestyle and prompts the development of advanced human-centric applications (e.g., Metaverse) by bridging physical and virtual spaces. However, the fulfillment of HDT poses stringent demands on the pervasive connectivity, real-time feedback, multi-modal data transmission and ultra-high reliability, which urge the need of enabl… ▽ More Human digital twin (HDT) is expected to revolutionize the future human lifestyle and prompts the development of advanced human-centric applications (e.g., Metaverse) by bridging physical and virtual spaces. However, the fulfillment of HDT poses stringent demands on the pervasive connectivity, real-time feedback, multi-modal data transmission and ultra-high reliability, which urge the need of enabling immersive communications. In this article, we shed light on the design of an immersive communication framework for HDT by edge computing empowered tactile Internet (namely IC-HDT-ECoTI). Aiming at offering strong interactions and extremely immersive quality of experience, we introduce the system architecture of IC-HDT-ECoTI, and analyze its major design requirements and challenges. Moreover, we present core guidelines and detailed steps for system implementations. In addition, we conduct an experimental study based on our recently built testbed, which shows a particular use case of IC-HDT-ECoTI in physical therapy, and the obtained results indicate that the proposed framework can significantly improve the effectiveness of the system. Finally, we conclude this article with a brief discussion of open issues and future directions. △ Less

Submitted 17 June, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

arXiv:2303.17614 [pdf, other]

Estimating Continuous Muscle Fatigue For Multi-Muscle Coordinated Exercise: A Pilot Study

Authors: Chunzhi Yi, Baichun Wei, Wei **, Jianfei Zhu, Seungmin Rho, Zhiyuan Chen, Feng Jiang

Abstract: Assessing the progression of muscle fatigue for daily exercises provides vital indicators for precise rehabilitation, personalized training dose, especially under the context of Metaverse. Assessing fatigue of multi-muscle coordination-involved daily exercises requires the neuromuscular features that represent the fatigue-induced characteristics of spatiotemporal adaptions of multiple muscles and… ▽ More Assessing the progression of muscle fatigue for daily exercises provides vital indicators for precise rehabilitation, personalized training dose, especially under the context of Metaverse. Assessing fatigue of multi-muscle coordination-involved daily exercises requires the neuromuscular features that represent the fatigue-induced characteristics of spatiotemporal adaptions of multiple muscles and the estimator that captures the time-evolving progression of fatigue. In this paper, we propose to depict fatigue by the features of muscle compensation and spinal module activation changes and estimate continuous fatigue by a physiological rationale model. First, we extract muscle synergy fractionation and the variance of spinal module spikings as features inspired by the prior of fatigue-induced neuromuscular adaptations. Second, we treat the features as observations and develop a Bayesian Gaussian process to capture the time-evolving progression. Third, we solve the issue of lacking supervision information by mathematically formulating the time-evolving characteristics of fatigue as the loss function. Finally, we adapt the metrics that follow the physiological principles of fatigue to quantitatively evaluate the performance. Our extensive experiments present a 0.99 similarity between days, a over 0.7 similarity with other views of fatigue and a nearly 1 weak monotonicity, which outperform other methods. This study would aim the objective assessment of muscle fatigue. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: submitted to IEEE JBHI

arXiv:2303.15107 [pdf, other]

ActiveSelfHAR: Incorporating Self Training into Active Learning to Improve Cross-Subject Human Activity Recognition

Authors: Baichun Wei, Chunzhi Yi, Qi Zhang, Haiqi Zhu, Jianfei Zhu, Feng Jiang

Abstract: Deep learning-based human activity recognition (HAR) methods have shown great promise in the applications of smart healthcare systems and wireless body sensor network (BSN). Despite their demonstrated performance in laboratory settings, the real-world implementation of such methods is still hindered by the cross-subject issue when adapting to new users. To solve this issue, we propose ActiveSelfHA… ▽ More Deep learning-based human activity recognition (HAR) methods have shown great promise in the applications of smart healthcare systems and wireless body sensor network (BSN). Despite their demonstrated performance in laboratory settings, the real-world implementation of such methods is still hindered by the cross-subject issue when adapting to new users. To solve this issue, we propose ActiveSelfHAR, a framework that combines active learning's benefit of sparsely acquiring data with actual labels and self- training's benefit of effectively utilizing unlabeled data to enable the deep model to adapt to the target domain, i.e., the new users. In this framework, the model trained in the last iteration or the source domain is first utilized to generate pseudo labels of the target-domain samples and construct a self-training set based on the confidence score. Second, we propose to use the spatio-temporal relationships among the samples in the non-self-training set to augment the core set selected by active learning. Finally, we combine the self-training set and the augmented core set to fine-tune the model. We demonstrate our method by comparing it with state-of-the-art methods on two IMU-based datasets and an EMG-based dataset. Our method presents similar HAR accuracies with the upper bound, i.e. fully supervised fine-tuning with less than 1\% labeled data of the target dataset and significantly improves data efficiency and time cost. Our work highlights the potential of implementing user-independent HAR methods into smart healthcare systems and BSN. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.11163 [pdf, other]

Finding Similar Exercises in Retrieval Manner

Authors: Tongwen Huang, Xihua Li, Chao Yi, Xuemin Zhao, Yunbo Cao

Abstract: When students make a mistake in an exercise, they can consolidate it by ``similar exercises'' which have the same concepts, purposes and methods. Commonly, for a certain subject and study stage, the size of the exercise bank is in the range of millions to even tens of millions, how to find similar exercises for a given exercise becomes a crucial technical problem. Generally, we can assign a variet… ▽ More When students make a mistake in an exercise, they can consolidate it by ``similar exercises'' which have the same concepts, purposes and methods. Commonly, for a certain subject and study stage, the size of the exercise bank is in the range of millions to even tens of millions, how to find similar exercises for a given exercise becomes a crucial technical problem. Generally, we can assign a variety of explicit labels to the exercise, and then query through the labels, but the label annotation is time-consuming, laborious and costly, with limited precision and granularity, so it is not feasible. In practice, we define ``similar exercises'' as a retrieval process of finding a set of similar exercises based on recall, ranking and re-rank procedures, called the \textbf{FSE} problem (Finding similar exercises). Furthermore, comprehensive representation of the semantic information of exercises was obtained through representation learning. In addition to the reasonable architecture, we also explore what kind of tasks are more conducive to the learning of exercise semantic information from pre-training and supervised learning. It is difficult to annotate similar exercises and the annotation consistency among experts is low. Therefore this paper also provides solutions to solve the problem of low-quality annotated data. Compared with other methods, this paper has obvious advantages in both architecture rationality and algorithm precision, which now serves the daily teaching of hundreds of schools. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: 37th Conference on AAAI 2023 Artificial Intelligence for Education(AI4Edu)

Journal ref: 37th Conference on AAAI 2023 Artificial Intelligence for Education(AI4Edu)

arXiv:2302.14509

Policy Dispersion in Non-Markovian Environment

Authors: Bohao Qu, Xiaofeng Cao, Jielong Yang, Hechang Chen, Chang Yi, Ivor W. Tsang, Yew-Soon Ong

Abstract: Markov Decision Process (MDP) presents a mathematical framework to formulate the learning processes of agents in reinforcement learning. MDP is limited by the Markovian assumption that a reward only depends on the immediate state and action. However, a reward sometimes depends on the history of states and actions, which may result in the decision process in a non-Markovian environment. In such env… ▽ More Markov Decision Process (MDP) presents a mathematical framework to formulate the learning processes of agents in reinforcement learning. MDP is limited by the Markovian assumption that a reward only depends on the immediate state and action. However, a reward sometimes depends on the history of states and actions, which may result in the decision process in a non-Markovian environment. In such environments, agents receive rewards via temporally-extended behaviors sparsely, and the learned policies may be similar. This leads the agents acquired with similar policies generally overfit to the given task and can not quickly adapt to perturbations of environments. To resolve this problem, this paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment, in which a policy dispersion scheme is designed for seeking diverse policy representation. Specifically, we first adopt a transformer-based method to learn policy embeddings. Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies. Finally, we prove that if the dispersion matrix is positive definite, the dispersed embeddings can effectively enlarge the disagreements across policies, yielding a diverse expression for the original policy embedding distribution. Experimental results show that this dispersion scheme can obtain more expressive diverse policies, which then derive more robust performance than recent learning baselines under various learning environments. △ Less

Submitted 2 June, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: In further research, we found that the core content of the paper requires significant modification and that the entire paper needs to be restructured. To enhance the scientific quality and contributions of the paper, we have decided to resubmit it after completing the necessary revisions and improvements

arXiv:2302.14309 [pdf, other]

Temporal Coherent Test-Time Optimization for Robust Video Classification

Authors: Chenyu Yi, Siyuan Yang, Yufei Wang, Haoliang Li, Yap-Peng Tan, Alex C. Kot

Abstract: Deep neural networks are likely to fail when the test data is corrupted in real-world deployment (e.g., blur, weather, etc.). Test-time optimization is an effective way that adapts models to generalize to corrupted data during testing, which has been shown in the image domain. However, the techniques for improving video classification corruption robustness remain few. In this work, we propose a Te… ▽ More Deep neural networks are likely to fail when the test data is corrupted in real-world deployment (e.g., blur, weather, etc.). Test-time optimization is an effective way that adapts models to generalize to corrupted data during testing, which has been shown in the image domain. However, the techniques for improving video classification corruption robustness remain few. In this work, we propose a Temporal Coherent Test-time Optimization framework (TeCo) to utilize spatio-temporal information in test-time optimization for robust video classification. To exploit information in video with self-supervised learning, TeCo uses global content from video clips and optimizes models for entropy minimization. TeCo minimizes the entropy of the prediction based on the global content from video clips. Meanwhile, it also feeds local content to regularize the temporal coherence at the feature level. TeCo retains the generalization ability of various video classification models and achieves significant improvements in corruption robustness across Mini Kinetics-C and Mini SSV2-C. Furthermore, TeCo sets a new baseline in video classification corruption robustness via test-time optimization. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2301.03930 [pdf, other]

doi 10.1109/COMST.2023.3308717

Networking Architecture and Key Supporting Technologies for Human Digital Twin in Personalized Healthcare: A Comprehensive Survey

Authors: Jiayuan Chen, Changyan Yi, Samuel D. Okegbile, Jun Cai, Xuemin, Shen

Abstract: Digital twin (DT), refers to a promising technique to digitally and accurately represent actual physical entities. One typical advantage of DT is that it can be used to not only virtually replicate a system's detailed operations but also analyze the current condition, predict future behaviour, and refine the control optimization. Although DT has been widely implemented in various fields, such as s… ▽ More Digital twin (DT), refers to a promising technique to digitally and accurately represent actual physical entities. One typical advantage of DT is that it can be used to not only virtually replicate a system's detailed operations but also analyze the current condition, predict future behaviour, and refine the control optimization. Although DT has been widely implemented in various fields, such as smart manufacturing and transportation, its conventional paradigm is limited to embody non-living entities, e.g., robots and vehicles. When adopted in human-centric systems, a novel concept, called human digital twin (HDT) has thus been proposed. Particularly, HDT allows in silico representation of individual human body with the ability to dynamically reflect molecular status, physiological status, emotional and psychological status, as well as lifestyle evolutions. These prompt the expected application of HDT in personalized healthcare (PH), which can facilitate remote monitoring, diagnosis, prescription, surgery and rehabilitation. However, despite the large potential, HDT faces substantial research challenges in different aspects, and becomes an increasingly popular topic recently. In this survey, with a specific focus on the networking architecture and key technologies for HDT in PH applications, we first discuss the differences between HDT and conventional DTs, followed by the universal framework and essential functions of HDT. We then analyze its design requirements and challenges in PH applications. After that, we provide an overview of the networking architecture of HDT, including data acquisition layer, data communication layer, computation layer, data management layer and data analysis and decision making layer. Besides reviewing the key technologies for implementing such networking architecture in detail, we conclude this survey by presenting future research directions of HDT. △ Less

Submitted 23 June, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

arXiv:2210.10311 [pdf, other]

Latency Aware Semi-synchronous Client Selection and Model Aggregation for Wireless Federated Learning

Authors: Liangkun Yu, Xiang Sun, Rana Albelaihi, Chen Yi

Abstract: Federated learning (FL) is a collaborative machine learning framework that requires different clients (e.g., Internet of Things devices) to participate in the machine learning model training process by training and uploading their local models to an FL server in each global iteration. Upon receiving the local models from all the clients, the FL server generates a global model by aggregating the re… ▽ More Federated learning (FL) is a collaborative machine learning framework that requires different clients (e.g., Internet of Things devices) to participate in the machine learning model training process by training and uploading their local models to an FL server in each global iteration. Upon receiving the local models from all the clients, the FL server generates a global model by aggregating the received local models. This traditional FL process may suffer from the straggler problem in heterogeneous client settings, where the FL server has to wait for slow clients to upload their local models in each global iteration, thus increasing the overall training time. One of the solutions is to set up a deadline and only the clients that can upload their local models before the deadline would be selected in the FL process. This solution may lead to a slow convergence rate and global model overfitting issues due to the limited client selection. In this paper, we propose the Latency awarE Semi-synchronous client Selection and mOdel aggregation for federated learNing (LESSON) method that allows all the clients to participate in the whole FL process but with different frequencies. That is, faster clients would be scheduled to upload their models more frequently than slow clients, thus resolving the straggler problem and accelerating the convergence speed, while avoiding model overfitting. Also, LESSON is capable of adjusting the tradeoff between the model accuracy and convergence rate by varying the deadline. Extensive simulations have been conducted to compare the performance of LESSON with the other two baseline methods, i.e., FedAvg and FedCS. The simulation results demonstrate that LESSON achieves faster convergence speed than FedAvg and FedCS, and higher model accuracy than FedCS. △ Less

Submitted 28 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

arXiv:2209.08326 [pdf, other]

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

Authors: Ye Bai, Jie Li, Wen**g Han, Hao Ni, Kaituo Xu, Zhuo Zhang, Cheng Yi, Xiaorui Wang

Abstract: While transformers and their variant conformers show promising performance in speech recognition, the parameterized property leads to much memory cost during training and inference. Some works use cross-layer weight-sharing to reduce the parameters of the model. However, the inevitable loss of capacity harms the model performance. To address this issue, this paper proposes a parameter-efficient co… ▽ More While transformers and their variant conformers show promising performance in speech recognition, the parameterized property leads to much memory cost during training and inference. Some works use cross-layer weight-sharing to reduce the parameters of the model. However, the inevitable loss of capacity harms the model performance. To address this issue, this paper proposes a parameter-efficient conformer via sharing sparsely-gated experts. Specifically, we use sparsely-gated mixture-of-experts (MoE) to extend the capacity of a conformer block without increasing computation. Then, the parameters of the grouped conformer blocks are shared so that the number of parameters is reduced. Next, to ensure the shared blocks with the flexibility of adapting representations at different levels, we design the MoE routers and normalization individually. Moreover, we use knowledge distillation to further improve the performance. Experimental results show that the proposed model achieves competitive performance with 1/3 of the parameters of the encoder, compared with the full-parameter model. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Comments: accepted in INTERSPEECH 2022

arXiv:2208.10867 [pdf]

A Quinary Coding and Matrix Structure-based Channel Hop** Algorithm for Blind Rendezvous in Cognitive Radio Networks

Authors: Qinglin Liu, Zhiyong Lin, Zongheng Wei, Jianfeng Wen, Congming Yi, Hai Liu

Abstract: The multi-channel blind rendezvous problem in distributed cognitive radio networks (DCRNs) refers to how users in the network can hop to the same channel at the same time slot without any prior knowledge (i.e., each user is unaware of other users' information). The channel hop** (CH) technique is a typical solution to this blind rendezvous problem. In this paper, we propose a quinary coding and… ▽ More The multi-channel blind rendezvous problem in distributed cognitive radio networks (DCRNs) refers to how users in the network can hop to the same channel at the same time slot without any prior knowledge (i.e., each user is unaware of other users' information). The channel hop** (CH) technique is a typical solution to this blind rendezvous problem. In this paper, we propose a quinary coding and matrix structure-based CH algorithm called QCMS-CH. The QCMS-CH algorithm can guarantee the rendezvous of users using only one cognitive radio in the scenario of the asynchronous clock (i.e., arbitrary time drift between the users), heterogeneous channels (i.e., the available channel sets of users are distinct), and symmetric role (i.e., all users play a same role). The QCMS-CH algorithm first represents a randomly selected channel (denoted by R) as a fixed-length quaternary number. Then it encodes the quaternary number into a quinary bootstrap** sequence according to a carefully designed quaternary-quinary coding table with the prefix "R00". Finally, it builds a CH matrix column by column according to the bootstrap** sequence and six different types of elaborately generated subsequences. The user can access the CH matrix row by row and accordingly perform its channel hop** to attempt to rendezvous with other users. We prove the correctness of QCMS-CH and derive an upper bound on its Maximum Time-to-Rendezvous (MTTR). Simulation results show that the QCMS-CH algorithm outperforms the state-of-the-art in terms of the MTTR and the Expected Time-to-Rendezvous (ETTR). △ Less

Submitted 23 August, 2022; originally announced August 2022.

Comments: 10 pages

arXiv:2203.16745

Channel Measurement and Characterization with Modified SAGE Algorithm in an Indoor Corridor at 300 GHz

Authors: Li Yuanbo, Wang Yiqin, Chen Yi, Yu Ziming, Han Chong

Abstract: The much higher frequencies in the Terahertz (THz) band prevent the effective utilization of channel models dedicated for microwave or millimeter-wave frequency bands. In this paper, a measurement campaign is conducted in an indoor corridor scenario at 306-321 GHz with a frequency-domain Vector Network Analyzer (VNA)-based sounder. To realize high-resolution multipath component (MPC) extraction fo… ▽ More The much higher frequencies in the Terahertz (THz) band prevent the effective utilization of channel models dedicated for microwave or millimeter-wave frequency bands. In this paper, a measurement campaign is conducted in an indoor corridor scenario at 306-321 GHz with a frequency-domain Vector Network Analyzer (VNA)-based sounder. To realize high-resolution multipath component (MPC) extraction for the direction-scan measurement campaigns in the THz band, a novel modified space-alternating generalized expectation-maximization (SAGE) algorithm is further proposed. Moreover, critical channel characteristics, including the path loss, shadow fading, K-factor, delay spread, angular spreads, cluster parameters, and cross correlations are calculated and analyzed in the LoS case. Besides, two contrasted measurement campaigns in the NLoS case are conducted, with and without additional reflective foils on walls to serve as effective scatterers. Comparison results indicate that the reflective foils are useful to improve the channel conditions in the NLoS case by nearly 6 dB, which is potential to be utilized as alternative of intelligent reflecting surfaces (IRS) to enhance the coverage ability of THz communications. △ Less

Submitted 14 March, 2023; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: Please find the latest version as arXiv:2212.11756

arXiv:2110.06513 [pdf, other]

Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions

Authors: Chenyu Yi, Siyuan Yang, Haoliang Li, Yap-peng Tan, Alex Kot

Abstract: The state-of-the-art deep neural networks are vulnerable to common corruptions (e.g., input data degradations, distortions, and disturbances caused by weather changes, system error, and processing). While much progress has been made in analyzing and improving the robustness of models in image understanding, the robustness in video understanding is largely unexplored. In this paper, we establish a… ▽ More The state-of-the-art deep neural networks are vulnerable to common corruptions (e.g., input data degradations, distortions, and disturbances caused by weather changes, system error, and processing). While much progress has been made in analyzing and improving the robustness of models in image understanding, the robustness in video understanding is largely unexplored. In this paper, we establish a corruption robustness benchmark, Mini Kinetics-C and Mini SSV2-C, which considers temporal corruptions beyond spatial corruptions in images. We make the first attempt to conduct an exhaustive study on the corruption robustness of established CNN-based and Transformer-based spatial-temporal models. The study provides some guidance on robust model design and training: Transformer-based model performs better than CNN-based models on corruption robustness; the generalization ability of spatial-temporal models implies robustness against temporal corruptions; model corruption robustness (especially robustness in the temporal domain) enhances with computational cost and model capacity, which may contradict the current trend of improving the computational efficiency of models. Moreover, we find the robustness intervention for image-related tasks (e.g., training models with noise) may not work for spatial-temporal models. △ Less

Submitted 22 August, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: Accepted to NeurIPs 2021 Dataset and Benchmark Track. Our codes are available on https://github.com/Newbeeyoung/Video-Corruption-Robustness

arXiv:2105.03559 [pdf, ps, other]

Applications of Auction and Mechanism Design in Edge Computing: A Survey

Authors: Houming Qiu, Kun Zhu, Nguyen Cong Luong, Changyan Yi, Dusit Niyato, Dong In Kim

Abstract: Edge computing as a promising technology provides lower latency, more efficient transmission, and faster speed of data processing since the edge servers are closer to the user devices. Each edge server with limited resources can offload latency-sensitive and computation-intensive tasks from nearby user devices. However, edge computing faces challenges such as resource allocation, energy consumptio… ▽ More Edge computing as a promising technology provides lower latency, more efficient transmission, and faster speed of data processing since the edge servers are closer to the user devices. Each edge server with limited resources can offload latency-sensitive and computation-intensive tasks from nearby user devices. However, edge computing faces challenges such as resource allocation, energy consumption, security and privacy issues, etc. Auction mechanisms can well characterize bidirectional interactions between edge servers and user devices under the above constraints in edge computing. As demonstrated by the existing works, auction and mechanism design approaches are outstanding on achieving optimal allocation strategy while guaranteeing mutual satisfaction among edge servers and user devices, especially for scenarios with scarce resources. In this paper, we introduce a comprehensive survey of recent researches that apply auction approaches in edge computing. Firstly, a brief overview of edge computing including three common edge computing paradigms, i.e., cloudlet, fog computing and mobile edge computing, is presented. Then, we introduce fundamentals and backgrounds of auction schemes commonly used in edge computing systems. After then, a comprehensive survey of applications of auction-based approaches applied for edge computing is provided, which is categorized by different auction approaches. Finally, several open challenges and promising research directions are discussed. △ Less

Submitted 7 May, 2021; originally announced May 2021.

arXiv:2101.06699 [pdf, other]

doi 10.1109/LSP.2021.3071668

Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech Recognition

Authors: Cheng Yi, Shiyu Zhou, Bo Xu

Abstract: End-to-end models have achieved impressive results on the task of automatic speech recognition (ASR). For low-resource ASR tasks, however, labeled data can hardly satisfy the demand of end-to-end models. Self-supervised acoustic pre-training has already shown its amazing ASR performance, while the transcription is still inadequate for language modeling in end-to-end models. In this work, we fuse a… ▽ More End-to-end models have achieved impressive results on the task of automatic speech recognition (ASR). For low-resource ASR tasks, however, labeled data can hardly satisfy the demand of end-to-end models. Self-supervised acoustic pre-training has already shown its amazing ASR performance, while the transcription is still inadequate for language modeling in end-to-end models. In this work, we fuse a pre-trained acoustic encoder (wav2vec2.0) and a pre-trained linguistic encoder (BERT) into an end-to-end ASR model. The fused model only needs to learn the transfer from speech to language during fine-tuning on limited labeled data. The length of the two modalities is matched by a monotonic attention mechanism without additional parameters. Besides, a fully connected layer is introduced for the hidden map** between modalities. We further propose a scheduled fine-tuning strategy to preserve and utilize the text context modeling ability of the pre-trained linguistic encoder. Experiments show our effective utilizing of pre-trained modules. Our model achieves better recognition performance on CALLHOME corpus (15 hours) than other end-to-end models. △ Less

Submitted 24 January, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

arXiv:2012.12121 [pdf, other]

Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages

Authors: Cheng Yi, Jianzhong Wang, Ning Cheng, Shiyu Zhou, Bo Xu

Abstract: There are several domains that own corresponding widely used feature extractors, such as ResNet, BERT, and GPT-x. These models are usually pre-trained on large amounts of unlabeled data by self-supervision and can be effectively applied to downstream tasks. In the speech domain, wav2vec2.0 starts to show its powerful representation ability and feasibility of ultra-low resource speech recognition o… ▽ More There are several domains that own corresponding widely used feature extractors, such as ResNet, BERT, and GPT-x. These models are usually pre-trained on large amounts of unlabeled data by self-supervision and can be effectively applied to downstream tasks. In the speech domain, wav2vec2.0 starts to show its powerful representation ability and feasibility of ultra-low resource speech recognition on the Librispeech corpus, which belongs to the audiobook domain. However, wav2vec2.0 has not been examined on real spoken scenarios and languages other than English. To verify its universality over languages, we apply pre-trained models to solve low-resource speech recognition tasks in various spoken languages. We achieve more than 20% relative improvements in six languages compared with previous work. Among these languages, English achieves a gain of 52.4%. Moreover, using coarse-grained modeling units, such as subword or character, achieves better results than fine-grained modeling units, such as phone or letter. △ Less

Submitted 17 January, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

arXiv:2012.01101 [pdf]

Multi-Objective Optimization of the Textile Manufacturing Process Using Deep-Q-Network Based Multi-Agent Reinforcement Learning

Authors: Zhenglei He, Kim Phuc Tran, Sebastien Thomassey, Xianyi Zeng, Jie Xu, Changhai Yi

Abstract: Multi-objective optimization of the textile manufacturing process is an increasing challenge because of the growing complexity involved in the development of the textile industry. The use of intelligent techniques has been often discussed in this domain, although a significant improvement from certain successful applications has been reported, the traditional methods failed to work with high-as we… ▽ More Multi-objective optimization of the textile manufacturing process is an increasing challenge because of the growing complexity involved in the development of the textile industry. The use of intelligent techniques has been often discussed in this domain, although a significant improvement from certain successful applications has been reported, the traditional methods failed to work with high-as well as human intervention. Upon which, this paper proposed a multi-agent reinforcement learning (MARL) framework to transform the optimization process into a stochastic game and introduced the deep Q-networks algorithm to train the multiple agents. A utilitarian selection mechanism was employed in the stochastic game, which (-greedy policy) in each state to avoid the interruption of multiple equilibria and achieve the correlated equilibrium optimal solutions of the optimizing process. The case study result reflects that the proposed MARL system is possible to achieve the optimal solutions for the textile ozonation process and it performs better than the traditional approaches. △ Less

Submitted 2 December, 2020; originally announced December 2020.

arXiv:2005.14038 [pdf, other]

HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism

Authors: Jay H. Park, Gyeongchan Yun, Chang M. Yi, Nguyen T. Nguyen, Seungmin Lee, Jaesik Choi, Sam H. Noh, Young-ri Choi

Abstract: Deep Neural Network (DNN) models have continuously been growing in size in order to improve the accuracy and quality of the models. Moreover, for training of large DNN models, the use of heterogeneous GPUs is inevitable due to the short release cycle of new GPU architectures. In this paper, we investigate how to enable training of large DNN models on a heterogeneous GPU cluster that possibly inclu… ▽ More Deep Neural Network (DNN) models have continuously been growing in size in order to improve the accuracy and quality of the models. Moreover, for training of large DNN models, the use of heterogeneous GPUs is inevitable due to the short release cycle of new GPU architectures. In this paper, we investigate how to enable training of large DNN models on a heterogeneous GPU cluster that possibly includes whimpy GPUs that, as a standalone, could not be used for training. We present a DNN training system, HetPipe (Heterogeneous Pipeline), that integrates pipelined model parallelism (PMP) with data parallelism (DP). In HetPipe, a group of multiple GPUs, called a virtual worker, processes minibatches in a pipelined manner, and multiple such virtual workers employ data parallelism for higher performance. We also propose a novel parameter synchronization model, which we refer to as Wave Synchronous Parallel (WSP) to accommodate both PMP and DP for virtual workers, and provide convergence proof of WSP. Our experimental results on a given heterogeneous setting show that with HetPipe, DNN models converge up to 49% faster compared to the state-of-the-art DP technique. △ Less

Submitted 28 May, 2020; originally announced May 2020.

arXiv:2005.10113 [pdf, other]

A Comparison of Label-Synchronous and Frame-Synchronous End-to-End Models for Speech Recognition

Authors: Linhao Dong, Cheng Yi, Jianzong Wang, Shiyu Zhou, Shuang Xu, Xueli Jia, Bo Xu

Abstract: End-to-end models are gaining wider attention in the field of automatic speech recognition (ASR). One of their advantages is the simplicity of building that directly recognizes the speech frame sequence into the text label sequence by neural networks. According to the driving end in the recognition process, end-to-end ASR models could be categorized into two types: label-synchronous and frame-sync… ▽ More End-to-end models are gaining wider attention in the field of automatic speech recognition (ASR). One of their advantages is the simplicity of building that directly recognizes the speech frame sequence into the text label sequence by neural networks. According to the driving end in the recognition process, end-to-end ASR models could be categorized into two types: label-synchronous and frame-synchronous, each of which has unique model behaviour and characteristic. In this work, we make a detailed comparison on a representative label-synchronous model (transformer) and a soft frame-synchronous model (continuous integrate-and-fire (CIF) based model). The results on three public dataset and a large-scale dataset with 12000 hours of training data show that the two types of models have respective advantages that are consistent with their synchronous mode. △ Less

Submitted 25 May, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

Comments: 4 pages, 2 figures

arXiv:2005.09867 [pdf]

A reinforcement learning based decision support system in textile manufacturing process

Authors: Zhenglei He, Kim Phuc Tran, Sébastien Thomassey, Xianyi Zeng, Changhai Yi

Abstract: This paper introduced a reinforcement learning based decision support system in textile manufacturing process. A solution optimization problem of color fading ozonation is discussed and set up as a Markov Decision Process (MDP) in terms of tuple {S, A, P, R}. Q-learning is used to train an agent in the interaction with the setup environment by accumulating the reward R. According to the applicatio… ▽ More This paper introduced a reinforcement learning based decision support system in textile manufacturing process. A solution optimization problem of color fading ozonation is discussed and set up as a Markov Decision Process (MDP) in terms of tuple {S, A, P, R}. Q-learning is used to train an agent in the interaction with the setup environment by accumulating the reward R. According to the application result, it is found that the proposed MDP model has well expressed the optimization problem of textile manufacturing process discussed in this paper, therefore the use of reinforcement learning to support decision making in this sector is conducted and proven that is applicable with promising prospects. △ Less

Submitted 20 May, 2020; originally announced May 2020.

Journal ref: 15th International Conference on Intelligent Systems and Knowledge Engineering (ISKE2020), Aug 2020, Cologne, Germany

arXiv:1906.03821 [pdf, other]

doi 10.1145/3292500.3330680

Time-Series Anomaly Detection Service at Microsoft

Authors: Hansheng Ren, Bixiong Xu, Yu**g Wang, Chao Yi, Congrui Huang, Xiaoyu Kou, Tony Xing, Mao Yang, Jie Tong, Qi Zhang

Abstract: Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which… ▽ More Large companies need to monitor various metrics (for example, Page Views and Revenue) of their applications and services in real time. At Microsoft, we develop a time-series anomaly detection service which helps customers to monitor the time-series continuously and alert for potential incidents on time. In this paper, we introduce the pipeline and algorithm of our anomaly detection service, which is designed to be accurate, efficient and general. The pipeline consists of three major modules, including data ingestion, experimentation platform and online compute. To tackle the problem of time-series anomaly detection, we propose a novel algorithm based on Spectral Residual (SR) and Convolutional Neural Network (CNN). Our work is the first attempt to borrow the SR model from visual saliency detection domain to time-series anomaly detection. Moreover, we innovatively combine SR and CNN together to improve the performance of SR model. Our approach achieves superior experimental results compared with state-of-the-art baselines on both public datasets and Microsoft production data. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: KDD 2019

arXiv:1808.09074 [pdf, other]

EmbeddingVis: A Visual Analytics Approach to Comparative Network Embedding Inspection

Authors: Quan Li, Kristanto Sean Njotoprawiro, Hammad Haleem, Qiaoan Chen, Chris Yi, Xiaojuan Ma

Abstract: Constructing latent vector representation for nodes in a network through embedding models has shown its practicality in many graph analysis applications, such as node classification, clustering, and link prediction. However, despite the high efficiency and accuracy of learning an embedding model, people have little clue of what information about the original network is preserved in the embedding v… ▽ More Constructing latent vector representation for nodes in a network through embedding models has shown its practicality in many graph analysis applications, such as node classification, clustering, and link prediction. However, despite the high efficiency and accuracy of learning an embedding model, people have little clue of what information about the original network is preserved in the embedding vectors. The abstractness of low-dimensional vector representation, stochastic nature of the construction process, and non-transparent hyper-parameters all obscure understanding of network embedding results. Visualization techniques have been introduced to facilitate embedding vector inspection, usually by projecting the embedding space to a two-dimensional display. Although the existing visualization methods allow simple examination of the structure of embedding space, they cannot support in-depth exploration of the embedding vectors. In this paper, we design an exploratory visual analytics system that supports the comparative visual interpretation of embedding vectors at the cluster, instance, and structural levels. To be more specific, it facilitates comparison of what and how node metrics are preserved across different embedding models and investigation of relationships between node metrics and selected embedding vectors. Several case studies confirm the efficacy of our system. Experts' feedback suggests that our approach indeed helps them better embrace the understanding of network embedding models. △ Less

Submitted 27 August, 2018; originally announced August 2018.

Comments: Proceedings of IEEE VIS 2018 (VAST 2018) (to appear), Berlin, Germany, Oct 21-26, 2018

arXiv:1803.04042 [pdf, other]

Interpreting Deep Classifier by Visual Distillation of Dark Knowledge

Authors: Kai Xu, Dae Hoon Park, Chang Yi, Charles Sutton

Abstract: Interpreting black box classifiers, such as deep networks, allows an analyst to validate a classifier before it is deployed in a high-stakes setting. A natural idea is to visualize the deep network's representations, so as to "see what the network sees". In this paper, we demonstrate that standard dimension reduction methods in this setting can yield uninformative or even misleading visualizations… ▽ More Interpreting black box classifiers, such as deep networks, allows an analyst to validate a classifier before it is deployed in a high-stakes setting. A natural idea is to visualize the deep network's representations, so as to "see what the network sees". In this paper, we demonstrate that standard dimension reduction methods in this setting can yield uninformative or even misleading visualizations. Instead, we present DarkSight, which visually summarizes the predictions of a classifier in a way inspired by notion of dark knowledge. DarkSight embeds the data points into a low-dimensional space such that it is easy to compress the deep classifier into a simpler one, essentially combining model compression and dimension reduction. We compare DarkSight against t-SNE both qualitatively and quantitatively, demonstrating that DarkSight visualizations are more informative. Our method additionally yields a new confidence measure based on dark knowledge by quantifying how unusual a given vector of predictions is. △ Less

Submitted 11 March, 2018; originally announced March 2018.

Showing 1–42 of 42 results for author: Yi, C