-
Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy
Authors:
Xiang Jiao,
Dingzhu Wen,
Guangxu Zhu,
Wei Jiang,
Wu Luo,
Yuanming Shi
Abstract:
Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the e…
▽ More
Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the effective and efficient execution of the inference task underpinned by the network, measured by, e.g., the inference accuracy and latency. In this paper, a task-oriented over-the-air computation scheme is proposed for a multidevice artificial intelligence system. Particularly, a novel tractable inference accuracy metric is proposed for classification tasks, which is called minimum pair-wise discriminant gain. Unlike prior work measuring the average of all class pairs in feature space, it measures the minimum distance of all class pairs. By maximizing the minimum pair-wise discriminant gain instead of its average counterpart, any pair of classes can be better separated in the feature space, and thus leading to a balanced and improved inference accuracy for all classes. Besides, this paper jointly optimizes the minimum discriminant gain of all feature elements instead of separately maximizing that of each element in the existing designs. As a result, the transmit power can be adaptively allocated to the feature elements according to their different contributions to the inference accuracy, opening an extra degree of freedom to improve inference performance. Extensive experiments are conducted using a concrete use case of human motion recognition to verify the superiority of the proposed design over the benchmarking scheme.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Improving Generative Adversarial Networks for Video Super-Resolution
Authors:
Daniel Wen
Abstract:
In this research, we explore different ways to improve generative adversarial networks for video super-resolution tasks from a base single image super-resolution GAN model. Our primary objective is to identify potential techniques that enhance these models and to analyze which of these techniques yield the most significant improvements. We evaluate our results using Peak Signal-to-Noise Ratio (PSN…
▽ More
In this research, we explore different ways to improve generative adversarial networks for video super-resolution tasks from a base single image super-resolution GAN model. Our primary objective is to identify potential techniques that enhance these models and to analyze which of these techniques yield the most significant improvements. We evaluate our results using Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM). Our findings indicate that the most effective techniques include temporal smoothing, long short-term memory (LSTM) layers, and a temporal loss function. The integration of these methods results in an 11.97% improvement in PSNR and an 8% improvement in SSIM compared to the baseline video super-resolution generative adversarial network (GAN) model. This substantial improvement suggests potential further applications to enhance current state-of-the-art models.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Collaborative Edge AI Inference over Cloud-RAN
Authors:
Pengfei Zhang,
Dingzhu Wen,
Guangxu Zhu,
Qimei Chen,
Kaifeng Han,
Yuanming Shi
Abstract:
In this paper, a cloud radio access network (Cloud-RAN) based collaborative edge AI inference architecture is proposed. Specifically, geographically distributed devices capture real-time noise-corrupted sensory data samples and extract the noisy local feature vectors, which are then aggregated at each remote radio head (RRH) to suppress sensing noise. To realize efficient uplink feature aggregatio…
▽ More
In this paper, a cloud radio access network (Cloud-RAN) based collaborative edge AI inference architecture is proposed. Specifically, geographically distributed devices capture real-time noise-corrupted sensory data samples and extract the noisy local feature vectors, which are then aggregated at each remote radio head (RRH) to suppress sensing noise. To realize efficient uplink feature aggregation, we allow each RRH receives local feature vectors from all devices over the same resource blocks simultaneously by leveraging an over-the-air computation (AirComp) technique. Thereafter, these aggregated feature vectors are quantized and transmitted to a central processor (CP) for further aggregation and downstream inference tasks. Our aim in this work is to maximize the inference accuracy via a surrogate accuracy metric called discriminant gain, which measures the discernibility of different classes in the feature space. The key challenges lie on simultaneously suppressing the coupled sensing noise, AirComp distortion caused by hostile wireless channels, and the quantization error resulting from the limited capacity of fronthaul links. To address these challenges, this work proposes a joint transmit precoding, receive beamforming, and quantization error control scheme to enhance the inference accuracy. Extensive numerical experiments demonstrate the effectiveness and superiority of our proposed optimization algorithm compared to various baselines.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Skeleton-Based Human Action Recognition with Noisy Labels
Authors:
Yi Xu,
Kunyu Peng,
Di Wen,
Rui** Liu,
Junwei Zheng,
Yufan Chen,
Jiaming Zhang,
Alina Roitberg,
Kailun Yang,
Rainer Stiefelhagen
Abstract:
Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization and annotation of activity sequences is time-consuming and the resulting labels are often noisy. If not effectively addressed, label noise negatively affects the model's training, resul…
▽ More
Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization and annotation of activity sequences is time-consuming and the resulting labels are often noisy. If not effectively addressed, label noise negatively affects the model's training, resulting in lower recognition quality. Despite its importance, addressing label noise for skeleton-based action recognition has been overlooked so far. In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark. Observations reveal that these baselines yield only marginal performance when dealing with sparse skeleton data. Consequently, we introduce a novel methodology, NoiseEraSAR, which integrates global sample selection, co-teaching, and Cross-Modal Mixture-of-Experts (CM-MOE) strategies, aimed at mitigating the adverse impacts of label noise. Our proposed approach demonstrates better performance on the established benchmark, setting new state-of-the-art standards. The source code for this study will be made accessible at https://github.com/xuyizdby/NoiseEraSAR.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Low-Rank Gradient Compression with Error Feedback for MIMO Wireless Federated Learning
Authors:
Mingzhao Guo,
Dongzhu Liu,
Osvaldo Simeone,
Dingzhu Wen
Abstract:
This paper presents a novel approach to enhance the communication efficiency of federated learning (FL) in multiple input and multiple output (MIMO) wireless systems. The proposed method centers on a low-rank matrix factorization strategy for local gradient compression based on alternating least squares, along with over-the-air computation and error feedback. The proposed protocol, termed over-the…
▽ More
This paper presents a novel approach to enhance the communication efficiency of federated learning (FL) in multiple input and multiple output (MIMO) wireless systems. The proposed method centers on a low-rank matrix factorization strategy for local gradient compression based on alternating least squares, along with over-the-air computation and error feedback. The proposed protocol, termed over-the-air low-rank compression (Ota-LC), is demonstrated to have lower computation cost and lower communication overhead as compared to existing benchmarks while guaranteeing the same inference performance. As an example, when targeting a test accuracy of 80% on the Cifar-10 dataset, Ota-LC achieves a reduction in total communication costs of at least 30% when contrasted with benchmark schemes, while also reducing the computational complexity order by a factor equal to the sum of the dimension of the gradients.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains
Authors:
Kunyu Peng,
Di Wen,
David Schneider,
Jiaming Zhang,
Kailun Yang,
M. Saquib Sarfraz,
Rainer Stiefelhagen,
Alina Roitberg
Abstract:
Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we focus on Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR), which leverag…
▽ More
Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we focus on Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR), which leverages a very small amount of labeled target videos to achieve effective adaptation. This approach is appealing for applications because it only needs a few or even one labeled example per class in the target domain, ideal for recognizing rare but critical activities. However, the existing FSDA-AR works mostly focus on the domain adaptation on sports videos, where the domain diversity is limited. We propose a new FSDA-AR benchmark using five established datasets considering the adaptation on more diverse and challenging domains. Our results demonstrate that FSDA-AR performs comparably to unsupervised domain adaptation with significantly fewer labeled target domain samples. We further propose a novel approach, RelaMiX, to better leverage the few labeled target domain samples as knowledge guidance. RelaMiX encompasses a temporal relational attention network with relation dropout, alongside a cross-domain information alignment mechanism. Furthermore, it integrates a mechanism for mixing features within a latent space by using the few-shot target domain samples. The proposed RelaMiX solution achieves state-of-the-art performance on all datasets within the FSDA-AR benchmark. To encourage future research of few-shot domain adaptation for activity recognition, our code will be publicly available at https://github.com/KPeng9510/RelaMiX.
△ Less
Submitted 27 April, 2024; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Task-Oriented Over-the-Air Computation for Multi-Device Edge AI
Authors:
Dingzhu Wen,
Xiang Jiao,
Peixi Liu,
Guangxu Zhu,
Yuanming Shi,
Kaibin Huang
Abstract:
Departing from the classic paradigm of data-centric designs, the 6G networks for supporting edge AI features task-oriented techniques that focus on effective and efficient execution of AI task. Targeting end-to-end system performance, such techniques are sophisticated as they aim to seamlessly integrate sensing (data acquisition), communication (data transmission), and computation (data processing…
▽ More
Departing from the classic paradigm of data-centric designs, the 6G networks for supporting edge AI features task-oriented techniques that focus on effective and efficient execution of AI task. Targeting end-to-end system performance, such techniques are sophisticated as they aim to seamlessly integrate sensing (data acquisition), communication (data transmission), and computation (data processing). Aligned with the paradigm shift, a task-oriented over-the-air computation (AirComp) scheme is proposed in this paper for multi-device split-inference system. In the considered system, local feature vectors, which are extracted from the real-time noisy sensory data on devices, are aggregated over-the-air by exploiting the waveform superposition in a multiuser channel. Then the aggregated features as received at a server are fed into an inference model with the result used for decision making or control of actuators. To design inference-oriented AirComp, the transmit precoders at edge devices and receive beamforming at edge server are jointly optimized to rein in the aggregation error and maximize the inference accuracy. The problem is made tractable by measuring the inference accuracy using a surrogate metric called discriminant gain, which measures the discernibility of two object classes in the application of object/event classification. It is discovered that the conventional AirComp beamforming design for minimizing the mean square error in generic AirComp with respect to the noiseless case may not lead to the optimal classification accuracy. The reason is due to the overlooking of the fact that feature dimensions have different sensitivity towards aggregation errors and are thus of different importance levels for classification. This issue is addressed in this work via a new task-oriented AirComp scheme designed by directly maximizing the derived discriminant gain.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
What is Semantic Communication? A View on Conveying Meaning in the Era of Machine Intelligence
Authors:
Qiao Lan,
Dingzhu Wen,
Zezhong Zhang,
Qunsong Zeng,
Xu Chen,
Petar Popovski,
Kaibin Huang
Abstract:
In 1940s, Claude Shannon developed the information theory focusing on quantifying the maximum data rate that can be supported by a communication channel. Guided by this, the main theme of wireless system design up until 5G was the data rate maximization. In his theory, the semantic aspect and meaning of messages were treated as largely irrelevant to communication. The classic theory started to rev…
▽ More
In 1940s, Claude Shannon developed the information theory focusing on quantifying the maximum data rate that can be supported by a communication channel. Guided by this, the main theme of wireless system design up until 5G was the data rate maximization. In his theory, the semantic aspect and meaning of messages were treated as largely irrelevant to communication. The classic theory started to reveal its limitations in the modern era of machine intelligence, consisting of the synergy between IoT and AI. By broadening the scope of the classic framework, in this article we present a view of semantic communication (SemCom) and conveying meaning through the communication systems. We address three communication modalities, human-to-human (H2H), human-to-machine (H2M), and machine-to-machine (M2M) communications. The latter two, the main theme of the article, represent the paradigm shift in communication and computing. H2M SemCom refers to semantic techniques for conveying meanings understandable by both humans and machines so that they can interact. M2M SemCom refers to effectiveness techniques for efficiently connecting machines such that they can effectively execute a specific computation task in a wireless network. The first part of the article introduces SemCom principles including encoding, system architecture, and layer-coupling and end-to-end design approaches. The second part focuses on specific techniques for application areas of H2M (human and AI symbiosis, recommendation, etc.) and M2M SemCom (distributed learning, split inference, etc.) Finally, we discuss the knowledge graphs approach for designing SemCom systems. We believe that this comprehensive introduction will provide a useful guide into the emerging area of SemCom that is expected to play an important role in 6G featuring connected intelligence and integrated sensing, computing, communication, and control.
△ Less
Submitted 30 September, 2021;
originally announced October 2021.
-
Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions
Authors:
Tse-Wei Chen,
Wei Tao,
Deyu Wang,
Dongchao Wen,
Kinya Osa,
Masami Kato
Abstract:
In order to handle modern convolutional neural networks (CNNs) efficiently, a hardware architecture of CNN inference accelerator is proposed to handle depthwise convolutions and regular convolutions, which are both essential building blocks for embedded-computer-vision algorithms. Different from related works, the proposed architecture can support filter kernels with different sizes with high flex…
▽ More
In order to handle modern convolutional neural networks (CNNs) efficiently, a hardware architecture of CNN inference accelerator is proposed to handle depthwise convolutions and regular convolutions, which are both essential building blocks for embedded-computer-vision algorithms. Different from related works, the proposed architecture can support filter kernels with different sizes with high flexibility since it does not require extra costs for intra-kernel parallelism, and it can generate convolution results faster than the architecture of the related works. The experimental results show the importance of supporting depthwise convolutions and dilated convolutions with the proposed hardware architecture. In addition to depthwise convolutions with large-kernels, a new structure called DDC layer, which includes the combination of depthwise convolutions and dilated convolutions, is also analyzed in this paper. For face detection, the computational costs decrease by 30%, and the model size decreases by 20% when the DDC layers are applied to the network. For image classification, the accuracy is increased by 1% by simply replacing $3 \times 3$ filters with $5 \times 5$ filters in depthwise convolutions.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Reduced-Dimension Design of MIMO Over-the-Air Computing for Data Aggregation in Clustered IoT Networks
Authors:
Dingzhu Wen,
Guangxu Zhu,
Kaibin Huang
Abstract:
One basic operation of Internet-of-Things (IoT) networks is aggregating distributed sensing-data over wireless-channels for function-computation, called wireless-data-aggregation (WDA). Targeting dense sensors, a recently developed technology called over-the-air computing (AirComp) can dramatically reduce the WDA latency by aggregating distributed data "over-the-air" using the waveform-superpositi…
▽ More
One basic operation of Internet-of-Things (IoT) networks is aggregating distributed sensing-data over wireless-channels for function-computation, called wireless-data-aggregation (WDA). Targeting dense sensors, a recently developed technology called over-the-air computing (AirComp) can dramatically reduce the WDA latency by aggregating distributed data "over-the-air" using the waveform-superposition property of a multi-access channel. In this work, we design multiple-input-multiple-output (MIMO) AirComp for computing a vector-valued function in a clustered IoT network with multi-antenna sensors forming clusters and a multi-antenna access-point (AP) performing WDA. The resultant high-dimensional but low-rank MIMO channels makes it important to reduce channel or signal dimensionality in AirComp to avoid exposure to noise from channel null-spaces. Motivated by this, we develop a framework of reduced-dimension MIMO AirComp, featuring decomposed-aggregation-beamforming (DAB). Consider the case of separable channel-clusters with non-overlap** angle-of-arrival ranges. The optimal DAB has the structure where inner-components extract the dominant eigen-spaces of corresponding channel-clusters and outer-components jointly equalize the resultant low-dimensional channels. Consider the more complex case of inseparable clusters. We propose a suboptimal DAB design where the inner-component performs both dimension-reduction and joint-equalization over clustered-channel covariance matrices and the outer-component jointly equalizes the small-scale fading-channels. Furthermore, efficient algorithms for rank-optimization of individual DAB components and channel-feedback leveraging the AirComp principle are developed.
△ Less
Submitted 6 December, 2018;
originally announced December 2018.