Search | arXiv e-print repository

Task-oriented Over-the-air Computation for Edge-device Co-inference with Balanced Classification Accuracy

Authors: Xiang Jiao, Dingzhu Wen, Guangxu Zhu, Wei Jiang, Wu Luo, Yuanming Shi

Abstract: Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the e… ▽ More Edge-device co-inference, which concerns the cooperation between edge devices and an edge server for completing inference tasks over wireless networks, has been a promising technique for enabling various kinds of intelligent services at the network edge, e.g., auto-driving. In this paradigm, the concerned design objective of the network shifts from the traditional communication throughput to the effective and efficient execution of the inference task underpinned by the network, measured by, e.g., the inference accuracy and latency. In this paper, a task-oriented over-the-air computation scheme is proposed for a multidevice artificial intelligence system. Particularly, a novel tractable inference accuracy metric is proposed for classification tasks, which is called minimum pair-wise discriminant gain. Unlike prior work measuring the average of all class pairs in feature space, it measures the minimum distance of all class pairs. By maximizing the minimum pair-wise discriminant gain instead of its average counterpart, any pair of classes can be better separated in the feature space, and thus leading to a balanced and improved inference accuracy for all classes. Besides, this paper jointly optimizes the minimum discriminant gain of all feature elements instead of separately maximizing that of each element in the existing designs. As a result, the transmit power can be adaptively allocated to the feature elements according to their different contributions to the inference accuracy, opening an extra degree of freedom to improve inference performance. Extensive experiments are conducted using a concrete use case of human motion recognition to verify the superiority of the proposed design over the benchmarking scheme. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This paper was accepted by IEEE Transactions on Vehicular Technology on June 30, 2024

arXiv:2405.15831 [pdf, other]

Transmission Interface Power Flow Adjustment: A Deep Reinforcement Learning Approach based on Multi-task Attribution Map

Authors: Shunyu Liu, Wei Luo, Yanzhen Zhou, Kaixuan Chen, Quan Zhang, Huating Xu, Qinglai Guo, Mingli Song

Abstract: Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coup… ▽ More Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coupling relationship and even leading to conflict decisions. In this paper, we introduce a novel data-driven deep reinforcement learning (DRL) approach, to handle multiple power flow adjustment tasks jointly instead of learning each task from scratch. At the heart of the proposed method is a multi-task attribution map (MAM), which enables the DRL agent to explicitly attribute each transmission interface task to different power system nodes with task-adaptive attention weights. Based on this MAM, the agent can further provide effective strategies to solve the multi-task adjustment problem with a near-optimal operation cost. Simulation results on the IEEE 118-bus system, a realistic 300-bus system in China, and a very large European system with 9241 buses demonstrate that the proposed method significantly improves the performance compared with several baseline methods, and exhibits high interpretability with the learnable MAM. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted by IEEE Transactions on Power Systems

arXiv:2404.15342 [pdf, other]

WaveSleepNet: An Interpretable Network for Expert-like Sleep Staging

Authors: Yan Pei, Wei Luo

Abstract: Although deep learning algorithms have proven their efficiency in automatic sleep staging, the widespread skepticism about their "black-box" nature has limited its clinical acceptance. In this study, we propose WaveSleepNet, an interpretable neural network for sleep staging that reasons in a similar way to sleep experts. In this network, we utilize the latent space representations generated during… ▽ More Although deep learning algorithms have proven their efficiency in automatic sleep staging, the widespread skepticism about their "black-box" nature has limited its clinical acceptance. In this study, we propose WaveSleepNet, an interpretable neural network for sleep staging that reasons in a similar way to sleep experts. In this network, we utilize the latent space representations generated during training to identify characteristic wave prototypes corresponding to different sleep stages. The feature representation of an input signal is segmented into patches within the latent space, each of which is compared against the learned wave prototypes. The proximity between these patches and the wave prototypes is quantified through scores, indicating the prototypes' presence and relative proportion within the signal. The scores are served as the decision-making criteria for final sleep staging. During training, an ensemble of loss functions is employed for the prototypes' diversity and robustness. Furthermore, the learned wave prototypes are visualized by analysing occlusion sensitivity. The efficacy of WaveSleepNet is validated across three public datasets, achieving sleep staging performance that are on par with the state-of-the-art models when several WaveSleepNets are combine into a larger network. A detailed case study examined the decision-making process of the WaveSleepNet which aligns closely with American Academy of Sleep Medicine (AASM) manual guidelines. Another case study systematically explained the misidentified reason behind each sleep stage. WaveSleepNet's transparent process provides specialists with direct access to the physiological significance of its criteria, allowing for future adaptation or enrichment by sleep experts. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 17 pages, 6 figures

arXiv:2403.05906 [pdf, other]

Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration

Authors: **gyun Xue, Tao Wang, Jun Wang, Kaihao Zhang, Wenhan Luo, Wenqi Ren, Zikun Liu, Hyunhee Park, Xiaochun Cao

Abstract: Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing… ▽ More Under-Display Camera (UDC) is an emerging technology that achieves full-screen display via hiding the camera under the display panel. However, the current implementation of UDC causes serious degradation. The incident light required for camera imaging undergoes attenuation and diffraction when passing through the display panel, leading to various artifacts in UDC imaging. Presently, the prevailing UDC image restoration methods predominantly utilize convolutional neural network architectures, whereas Transformer-based methods have exhibited superior performance in the majority of image restoration tasks. This is attributed to the Transformer's capability to sample global features for the local reconstruction of images, thereby achieving high-quality image restoration. In this paper, we observe that when using the Vision Transformer for UDC degraded image restoration, the global attention samples a large amount of redundant information and noise. Furthermore, compared to the ordinary Transformer employing dense attention, the Transformer utilizing sparse attention can alleviate the adverse impact of redundant information and noise. Building upon this discovery, we propose a Segmentation Guided Sparse Transformer method (SGSFormer) for the task of restoring high-quality images from UDC degraded images. Specifically, we utilize sparse self-attention to filter out redundant information and noise, directing the model's attention to focus on the features more relevant to the degraded regions in need of reconstruction. Moreover, we integrate the instance segmentation map as prior information to guide the sparse self-attention in filtering and focusing on the correct regions. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 13 pages, 10 figures, conference or other essential info

arXiv:2402.13629 [pdf, other]

Adversarial Purification and Fine-tuning for Robust UDC Image Restoration

Authors: Zhenbo Song, Zhenyuan Zhang, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu

Abstract: This study delves into the enhancement of Under-Display Camera (UDC) image restoration models, focusing on their robustness against adversarial attacks. Despite its innovative approach to seamless display integration, UDC technology faces unique image degradation challenges exacerbated by the susceptibility to adversarial perturbations. Our research initially conducts an in-depth robustness evalua… ▽ More This study delves into the enhancement of Under-Display Camera (UDC) image restoration models, focusing on their robustness against adversarial attacks. Despite its innovative approach to seamless display integration, UDC technology faces unique image degradation challenges exacerbated by the susceptibility to adversarial perturbations. Our research initially conducts an in-depth robustness evaluation of deep-learning-based UDC image restoration models by employing several white-box and black-box attacking methods. This evaluation is pivotal in understanding the vulnerabilities of current UDC image restoration techniques. Following the assessment, we introduce a defense framework integrating adversarial purification with subsequent fine-tuning processes. First, our approach employs diffusion-based adversarial purification, effectively neutralizing adversarial perturbations. Then, we apply the fine-tuning methodologies to refine the image restoration models further, ensuring that the quality and fidelity of the restored images are maintained. The effectiveness of our proposed approach is validated through extensive experiments, showing marked improvements in resilience against typical adversarial attacks. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2401.02771 [pdf, other]

Powerformer: A Section-adaptive Transformer for Power Flow Adjustment

Authors: Kaixuan Chen, Wei Luo, Shunyu Liu, Yaoquan Wei, Yihe Zhou, Yunpeng Qing, Quan Zhang, Jie Song, Mingli Song

Abstract: In this paper, we present a novel transformer architecture tailored for learning robust power system state representations, which strives to optimize power dispatch for the power flow adjustment across different transmission sections. Specifically, our proposed approach, named Powerformer, develops a dedicated section-adaptive attention mechanism, separating itself from the self-attention used in… ▽ More In this paper, we present a novel transformer architecture tailored for learning robust power system state representations, which strives to optimize power dispatch for the power flow adjustment across different transmission sections. Specifically, our proposed approach, named Powerformer, develops a dedicated section-adaptive attention mechanism, separating itself from the self-attention used in conventional transformers. This mechanism effectively integrates power system states with transmission section information, which facilitates the development of robust state representations. Furthermore, by considering the graph topology of power system and the electrical attributes of bus nodes, we introduce two customized strategies to further enhance the expressiveness: graph neural network propagation and multi-factor attention mechanism. Extensive evaluations are conducted on three power system scenarios, including the IEEE 118-bus system, a realistic 300-bus system in China, and a large-scale European system with 9241 buses, where Powerformer demonstrates its superior performance over several baseline methods. △ Less

Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 8 figures

arXiv:2312.09417 [pdf, other]

doi 10.1109/JBHI.2024.3358917

DTP-Net: Learning to Reconstruct EEG signals in Time-Frequency Domain by Multi-scale Feature Reuse

Authors: Yan Pei, Jiahui Xu, Qianhao Chen, Chenhao Wang, Feng Yu, Lisan Zhang, Wei Luo

Abstract: Electroencephalography (EEG) signals are easily corrupted by various artifacts, making artifact removal crucial for improving signal quality in scenarios such as disease diagnosis and brain-computer interface (BCI). In this paper, we present a fully convolutional neural architecture, called DTP-Net, which consists of a Densely Connected Temporal Pyramid (DTP) sandwiched between a pair of learnable… ▽ More Electroencephalography (EEG) signals are easily corrupted by various artifacts, making artifact removal crucial for improving signal quality in scenarios such as disease diagnosis and brain-computer interface (BCI). In this paper, we present a fully convolutional neural architecture, called DTP-Net, which consists of a Densely Connected Temporal Pyramid (DTP) sandwiched between a pair of learnable time-frequency transformations for end-to-end electroencephalogram (EEG) denoising. The proposed method first transforms a single-channel EEG signal of arbitrary length into the time-frequency domain via an Encoder layer. Then, noises, such as ocular and muscle artifacts, are extracted by DTP in a multi-scale fashion and reduced. Finally, a Decoder layer is employed to reconstruct the artifact-reduced EEG signal. Additionally, we conduct an in-depth analysis of the representation learning behavior of each module in DTP-Net to substantiate its robustness and reliability. Extensive experiments conducted on two public semi-simulated datasets demonstrate the effective artifact removal performance of DTP-Net, which outperforms state-of-art approaches. Experimental results demonstrate cleaner waveforms and significant improvement in Signal-to-Noise Ratio (SNR) and Relative Root Mean Square Error (RRMSE) after denoised by the proposed model. Moreover, the proposed DTP-Net is applied in a specific BCI downstream task, improving the classification accuracy by up to 5.55% compared to that of the raw signals, validating its potential applications in the fields of EEG-based neuroscience and neuro-engineering. △ Less

Submitted 6 March, 2024; v1 submitted 27 November, 2023; originally announced December 2023.

Comments: 18 pages, 10 figures

Journal ref: IEEE Journal of Biomedical and Health Informatics. 2024: 1-12

arXiv:2308.10196 [pdf, other]

Blind Face Restoration for Under-Display Camera via Dictionary Guided Transformer

Authors: **gfan Tan, Xiaoxu Chen, Tao Wang, Kaihao Zhang, Wenhan Luo, Xiaocun Cao

Abstract: By hiding the front-facing camera below the display panel, Under-Display Camera (UDC) provides users with a full-screen experience. However, due to the characteristics of the display, images taken by UDC suffer from significant quality degradation. Methods have been proposed to tackle UDC image restoration and advances have been achieved. There are still no specialized methods and datasets for res… ▽ More By hiding the front-facing camera below the display panel, Under-Display Camera (UDC) provides users with a full-screen experience. However, due to the characteristics of the display, images taken by UDC suffer from significant quality degradation. Methods have been proposed to tackle UDC image restoration and advances have been achieved. There are still no specialized methods and datasets for restoring UDC face images, which may be the most common problem in the UDC scene. To this end, considering color filtering, brightness attenuation, and diffraction in the imaging process of UDC, we propose a two-stage network UDC Degradation Model Network named UDC-DMNet to synthesize UDC images by modeling the processes of UDC imaging. Then we use UDC-DMNet and high-quality face images from FFHQ and CelebA-Test to create UDC face training datasets FFHQ-P/T and testing datasets CelebA-Test-P/T for UDC face restoration. We propose a novel dictionary-guided transformer network named DGFormer. Introducing the facial component dictionary and the characteristics of the UDC image in the restoration makes DGFormer capable of addressing blind face restoration in UDC scenarios. Experiments show that our DGFormer and UDC-DMNet achieve state-of-the-art performance. △ Less

Submitted 1 December, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

Comments: To appear in IEEE TCSVT

arXiv:2307.14659 [pdf, other]

LLDiffusion: Learning Degradation Representations in Diffusion Models for Low-Light Image Enhancement

Authors: Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo, Bjorn Stenger, Tae-Kyun Kim, Wei Liu, Hongdong Li

Abstract: Current deep learning methods for low-light image enhancement (LLIE) typically rely on pixel-wise map** learned from paired data. However, these methods often overlook the importance of considering degradation representations, which can lead to sub-optimal outcomes. In this paper, we address this limitation by proposing a degradation-aware learning scheme for LLIE using diffusion models, which e… ▽ More Current deep learning methods for low-light image enhancement (LLIE) typically rely on pixel-wise map** learned from paired data. However, these methods often overlook the importance of considering degradation representations, which can lead to sub-optimal outcomes. In this paper, we address this limitation by proposing a degradation-aware learning scheme for LLIE using diffusion models, which effectively integrates degradation and image priors into the diffusion process, resulting in improved image enhancement. Our proposed degradation-aware learning scheme is based on the understanding that degradation representations play a crucial role in accurately modeling and capturing the specific degradation patterns present in low-light images. To this end, First, a joint learning framework for both image generation and image enhancement is presented to learn the degradation representations. Second, to leverage the learned degradation representations, we develop a Low-Light Diffusion model (LLDiffusion) with a well-designed dynamic diffusion module. This module takes into account both the color map and the latent degradation representations to guide the diffusion process. By incorporating these conditioning factors, the proposed LLDiffusion can effectively enhance low-light images, considering both the inherent degradation patterns and the desired color fidelity. Finally, we evaluate our proposed method on several well-known benchmark datasets, including synthetic and real-world unpaired datasets. Extensive experiments on public benchmarks demonstrate that our LLDiffusion outperforms state-of-the-art LLIE methods both quantitatively and qualitatively. The source code and pre-trained models are available at https://github.com/TaoWangzj/LLDiffusion. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 16 pages, 9 figures

arXiv:2305.18775 [pdf]

A Depth-Adaptive Filtering Method for Effective GPR Tree Roots Detection in Tropical Area

Authors: Wenhao Luo, Yee Hui Lee, Mohamed Lokman Mohd Yusof, Abdulkadir C. Yucel

Abstract: This study presents a technique for processing Stepfrequency continuous wave (SFCW) ground penetrating radar (GPR) data to detect tree roots. SFCW GPR is portable and enables precise control of energy levels, balancing depth and resolution trade-offs. However, the high-frequency components of the transmission band suffers from poor penetrating capability and generates noise that interferes with ro… ▽ More This study presents a technique for processing Stepfrequency continuous wave (SFCW) ground penetrating radar (GPR) data to detect tree roots. SFCW GPR is portable and enables precise control of energy levels, balancing depth and resolution trade-offs. However, the high-frequency components of the transmission band suffers from poor penetrating capability and generates noise that interferes with root detection. The proposed time-frequency filtering technique uses a short-time Fourier transform (STFT) to track changes in frequency spectrum density over time. To obtain the filter window, a weighted linear regression (WLR) method is used. By adopting a conversion method that is a variant of the chirp Z-Transform (CZT), the timefrequency window filters out frequency samples that are not of interest when doing the frequency-to-time domain data conversion. The proposed depth-adaptive filter window can selfadjust to different scenarios, making it independent of soil information and effectively determines subsurface tree roots. The technique is successfully validated using SFCW GPR data from actual sites in a tropical area with different soil moisture levels, and the two-dimensional (2D) radar map of subsurface root systems is highly improved compared to existing methods. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 10 pages, 12 figures, Accepted by IEEE TIM

arXiv:2303.04603 [pdf, other]

Learning Enhancement From Degradation: A Diffusion Model For Fundus Image Enhancement

Authors: Pui** Cheng, Li Lin, Yi** Huang, Huaqing He, Wenhan Luo, Xiaoying Tang

Abstract: The quality of a fundus image can be compromised by numerous factors, many of which are challenging to be appropriately and mathematically modeled. In this paper, we introduce a novel diffusion model based framework, named Learning Enhancement from Degradation (LED), for enhancing fundus images. Specifically, we first adopt a data-driven degradation framework to learn degradation map**s from unp… ▽ More The quality of a fundus image can be compromised by numerous factors, many of which are challenging to be appropriately and mathematically modeled. In this paper, we introduce a novel diffusion model based framework, named Learning Enhancement from Degradation (LED), for enhancing fundus images. Specifically, we first adopt a data-driven degradation framework to learn degradation map**s from unpaired high-quality to low-quality images. We then apply a conditional diffusion model to learn the inverse enhancement process in a paired manner. The proposed LED is able to output enhancement results that maintain clinically important features with better clarity. Moreover, in the inference phase, LED can be easily and effectively integrated with any existing fundus image enhancement framework. We evaluate the proposed LED on several downstream tasks with respect to various clinically-relevant metrics, successfully demonstrating its superiority over existing state-of-the-art methods both quantitatively and qualitatively. The source code is available at https://github.com/QtacierP/LED. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2302.10272 [pdf, other]

Is Autoencoder Truly Applicable for 3D CT Super-Resolution?

Authors: Weixun Luo, Xiaodan Xing, Guang Yang

Abstract: Featured by a bottleneck structure, autoencoder (AE) and its variants have been largely applied in various medical image analysis tasks, such as segmentation, reconstruction and de-noising. Despite of their promising performances in aforementioned tasks, in this paper, we claim that AE models are not applicable to single image super-resolution (SISR) for 3D CT data. Our hypothesis is that the bott… ▽ More Featured by a bottleneck structure, autoencoder (AE) and its variants have been largely applied in various medical image analysis tasks, such as segmentation, reconstruction and de-noising. Despite of their promising performances in aforementioned tasks, in this paper, we claim that AE models are not applicable to single image super-resolution (SISR) for 3D CT data. Our hypothesis is that the bottleneck architecture that resizes feature maps in AE models degrades the details of input images, thus can sabotage the performance of super-resolution. Although U-Net proposed skip connections that merge information from different levels, we claim that the degrading impact of feature resizing operations could hardly be removed by skip connections. By conducting large-scale ablation experiments and comparing the performance between models with and without the bottleneck design on a public CT lung dataset , we have discovered that AE models, including U-Net, have failed to achieve a compatible SISR result ($p<0.05$ by Student's t-test) compared to the baseline model. Our work is the first comparative study investigating the suitability of AE architecture for 3D CT SISR tasks and brings a rationale for researchers to re-think the choice of model architectures especially for 3D CT SISR tasks. The full implementation and trained models can be found at: https://github.com/Roldbach/Autoencoder-3D-CT-SISR △ Less

Submitted 31 March, 2023; v1 submitted 23 January, 2023; originally announced February 2023.

Comments: ISBI 2023

arXiv:2301.08660 [pdf]

A Big-Data Driven Framework to Estimating Vehicle Volume based on Mobile Device Location Data

Authors: Mofeng Yang, Weiyu Luo, Mohammad Ashoori, **a Mahmoudi, Chenfeng Xiong, Jiawei Lu, Guangchen Zhao, Saeed Saleh Namadi, Songhua Hu, Aliakbar Kabiri

Abstract: Vehicle volume serves as a critical metric and the fundamental basis for traffic signal control, transportation project prioritization, road maintenance plans and more. Traditional methods of quantifying vehicle volume rely on manual counting, video cameras, and loop detectors at a limited number of locations. These efforts require significant labor and cost for expansions. Researchers and private… ▽ More Vehicle volume serves as a critical metric and the fundamental basis for traffic signal control, transportation project prioritization, road maintenance plans and more. Traditional methods of quantifying vehicle volume rely on manual counting, video cameras, and loop detectors at a limited number of locations. These efforts require significant labor and cost for expansions. Researchers and private sector companies have also explored alternative solutions such as probe vehicle data, while still suffering from a low penetration rate. In recent years, along with the technological advancement in mobile sensors and mobile networks, Mobile Device Location Data (MDLD) have been growing dramatically in terms of the spatiotemporal coverage of the population and its mobility. This paper presents a big-data driven framework that can ingest terabytes of MDLD and estimate vehicle volume at a larger geographical area with a larger sample size. The proposed framework first employs a series of cloud-based computational algorithms to extract multimodal trajectories and trip rosters. A scalable map matching and routing algorithm is then applied to snap and route vehicle trajectories to the roadway network. The observed vehicle counts on each roadway segment are weighted and calibrated against ground truth control totals, i.e., Annual Vehicle-Miles of Travel (AVMT), and Annual Average Daily Traffic (AADT). The proposed framework is implemented on the all-street network in the state of Maryland using MDLD for the entire year of 2019. Results indicate that our proposed framework produces reliable vehicle volume estimates and also demonstrate its transferability and the generalization ability. △ Less

Submitted 24 January, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

arXiv:2210.09556 [pdf, other]

Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation

Authors: Chen Wang, Yuchen Liu, Boxing Chen, Jiajun Zhang, Wei Luo, Zhongqiang Huang, Chengqing Zong

Abstract: End-to-end Speech Translation (ST) aims at translating the source language speech into target language text without generating the intermediate transcriptions. However, the training of end-to-end methods relies on parallel ST data, which are difficult and expensive to obtain. Fortunately, the supervised data for automatic speech recognition (ASR) and machine translation (MT) are usually more acces… ▽ More End-to-end Speech Translation (ST) aims at translating the source language speech into target language text without generating the intermediate transcriptions. However, the training of end-to-end methods relies on parallel ST data, which are difficult and expensive to obtain. Fortunately, the supervised data for automatic speech recognition (ASR) and machine translation (MT) are usually more accessible, making zero-shot speech translation a potential direction. Existing zero-shot methods fail to align the two modalities of speech and text into a shared semantic space, resulting in much worse performance compared to the supervised ST methods. In order to enable zero-shot ST, we propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text. Specifically, we introduce a vector quantization module to discretize the continuous representations of speech and text into a finite set of virtual tokens, and use ASR data to map corresponding speech and text to the same virtual token in a shared codebook. This way, source language speech can be embedded in the same semantic space as the source language text, which can be then transformed into target language text with an MT module. Experiments on multiple language pairs demonstrate that our zero-shot ST method significantly improves the SOTA, and even performers on par with the strong supervised ST baselines. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: Accepted by the main conference of EMNLP 2022

arXiv:2210.08493 [pdf, other]

Indoor Smartphone SLAM with Learned Echoic Location Features

Authors: Wenjie Luo, Qun Song, Zhenyu Yan, Rui Tan, Guosheng Lin

Abstract: Indoor self-localization is a highly demanded system function for smartphones. The current solutions based on inertial, radio frequency, and geomagnetic sensing may have degraded performance when their limiting factors take effect. In this paper, we present a new indoor simultaneous localization and map** (SLAM) system that utilizes the smartphone's built-in audio hardware and inertial measureme… ▽ More Indoor self-localization is a highly demanded system function for smartphones. The current solutions based on inertial, radio frequency, and geomagnetic sensing may have degraded performance when their limiting factors take effect. In this paper, we present a new indoor simultaneous localization and map** (SLAM) system that utilizes the smartphone's built-in audio hardware and inertial measurement unit (IMU). Our system uses a smartphone's loudspeaker to emit near-inaudible chirps and then the microphone to record the acoustic echoes from the indoor environment. Our profiling measurements show that the echoes carry location information with sub-meter granularity. To enable SLAM, we apply contrastive learning to construct an echoic location feature (ELF) extractor, such that the loop closures on the smartphone's trajectory can be accurately detected from the associated ELF trace. The detection results effectively regulate the IMU-based trajectory reconstruction. Extensive experiments show that our ELF-based SLAM achieves median localization errors of $0.1\,\text{m}$, $0.53\,\text{m}$, and $0.4\,\text{m}$ on the reconstructed trajectories in a living room, an office, and a shop** mall, and outperforms the Wi-Fi and geomagnetic SLAM systems. △ Less

Submitted 16 October, 2022; originally announced October 2022.

arXiv:2207.14419 [pdf, other]

Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions

Authors: Wenhao Luo, Wen Sun, Ashish Kapoor

Abstract: Reinforcement Learning (RL) and continuous nonlinear control have been successfully deployed in multiple domains of complicated sequential decision-making tasks. However, given the exploration nature of the learning process and the presence of model uncertainty, it is challenging to apply them to safety-critical control tasks due to the lack of safety guarantee. On the other hand, while combining… ▽ More Reinforcement Learning (RL) and continuous nonlinear control have been successfully deployed in multiple domains of complicated sequential decision-making tasks. However, given the exploration nature of the learning process and the presence of model uncertainty, it is challenging to apply them to safety-critical control tasks due to the lack of safety guarantee. On the other hand, while combining control-theoretical approaches with learning algorithms has shown promise in safe RL applications, the sample efficiency of safe data collection process for control is not well addressed. In this paper, we propose a \emph{provably} sample efficient episodic safe learning framework for online control tasks that leverages safe exploration and exploitation in an unknown, nonlinear dynamical system. In particular, the framework 1) extends control barrier functions (CBFs) in a stochastic setting to achieve provable high-probability safety under uncertainty during model learning and 2) integrates an optimism-based exploration strategy to efficiently guide the safe exploration process with learned dynamics for \emph{near optimal} control performance. We provide formal analysis on the episodic regret bound against the optimal controller and probabilistic safety with theoretical guarantees. Simulation results are provided to demonstrate the effectiveness and efficiency of the proposed algorithm. △ Less

Submitted 28 July, 2022; originally announced July 2022.

Comments: The 15th International Workshop on the Algorithmic Foundations of Robotics (WAFR 2022)

arXiv:2206.05949 [pdf, other]

doi 10.1109/JSTSP.2022.3226836

Toward Ambient Intelligence: Federated Edge Learning with Task-Oriented Sensing, Computation, and Communication Integration

Authors: Peixi Liu, Guangxu Zhu, Shuai Wang, Wei Jiang, Wu Luo, H. Vincent Poor, Shuguang Cui

Abstract: In this paper, we address the problem of joint sensing, computation, and communication (SC$^{2}$) resource allocation for federated edge learning (FEEL) via a concrete case study of human motion recognition based on wireless sensing in ambient intelligence. First, by analyzing the wireless sensing process in human motion recognition, we find that there exists a thresholding value for the sensing t… ▽ More In this paper, we address the problem of joint sensing, computation, and communication (SC$^{2}$) resource allocation for federated edge learning (FEEL) via a concrete case study of human motion recognition based on wireless sensing in ambient intelligence. First, by analyzing the wireless sensing process in human motion recognition, we find that there exists a thresholding value for the sensing transmit power, exceeding which yields sensing data samples with approximately the same satisfactory quality. Then, the joint SC$^{2}$ resource allocation problem is cast to maximize the convergence speed of FEEL, under the constraints on training time, energy supply, and sensing quality of each edge device. Solving this problem entails solving two subproblems in order: the first one reduces to determine the joint sensing and communication resource allocation that maximizes the total number of samples that can be sensed during the entire training process; the second one concerns the partition of the attained total number of sensed samples over all the communication rounds to determine the batch size at each round for convergence speed maximization. The first subproblem on joint sensing and communication resource allocation is converted to a single-variable optimization problem by exploiting the derived relation between different control variables (resources), which thus allows an efficient solution via one-dimensional grid search. For the second subproblem, it is found that the number of samples to be sensed (or batch size) at each round is a decreasing function of the loss function value attained at the round. Based on this relationship, the approximate optimal batch size at each communication round is derived in closed-form as a function of the round index. Finally, extensive simulation results are provided to validate the superiority of the proposed joint SC$^{2}$ resource allocation scheme. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: 13 pages, submitted to IEEE for possible publication

arXiv:2205.13976 [pdf, ps, other]

Hybrid Offline-Online Design for Reconfigurable Intelligent Surface Aided UAV Communication

Authors: Kaiyuan Tian, Bin Duo, Xiaojun Yuan, Wu Luo

Abstract: This letter considers the reconfigurable intelligent surface (RIS)-aided unmanned aerial vehicle (UAV) communication systems in urban areas under the general Rician fading channel. A hybrid offline-online design is proposed to improve the system performance by leveraging both the statistical channel state information (S-CSI) and instantaneous channel state information (I-CSI). For the offline phas… ▽ More This letter considers the reconfigurable intelligent surface (RIS)-aided unmanned aerial vehicle (UAV) communication systems in urban areas under the general Rician fading channel. A hybrid offline-online design is proposed to improve the system performance by leveraging both the statistical channel state information (S-CSI) and instantaneous channel state information (I-CSI). For the offline phase, we aim to maximize the expected average achievable rate based on the S-CSI by jointly optimizing the RIS's phase-shift and UAV trajectory. The formulated stochastic optimization problem is difficult to solve due to its non-convexity. To tackle this problem, we propose an efficient algorithm by leveraging the stochastic successive convex approximation (SSCA) techniques. For the online phase, the UAV adaptively adjusts the transmit beamforming and user scheduling according to the effective I-CSI. Numerical results verify that the proposed hybrid design performs better than various bechmark schemes, and also demonstrate a favorable trade-off between system performance and CSI overhead. △ Less

Submitted 27 May, 2022; originally announced May 2022.

arXiv:2205.13731 [pdf]

doi 10.1109/TIM.2022.3181240

Accurate Tree Roots Positioning and Sizing over Undulated Ground Surfaces by Common Offset GPR Measurements

Authors: Wenhao Luo, Yee Hui Lee, Lai Fern Ow, Mohamed Lokman Mohd Yusof, Abdulkadir C. Yucel

Abstract: Tree roots detection is a popular application of the Ground-penetrating radar (GPR). Normally, the ground surface above the tree roots is assumed to be flat, and standard processing methods based on hyperbolic fitting are applied to the hyperbolae reflection patterns of tree roots for detection purposes. When the surface of the land is undulating (not flat), these typical hyperbolic fitting method… ▽ More Tree roots detection is a popular application of the Ground-penetrating radar (GPR). Normally, the ground surface above the tree roots is assumed to be flat, and standard processing methods based on hyperbolic fitting are applied to the hyperbolae reflection patterns of tree roots for detection purposes. When the surface of the land is undulating (not flat), these typical hyperbolic fitting methods becomes inaccurate. This is because, the reflection patterns change with the uneven ground surfaces. When the soil surface is not flat, it is inaccurate to use the peak point of an asymmetric reflection pattern to identify the depth and horizontal position of the underground target. The reflection patterns of the complex shapes due to extreme surface variations results in analysis difficulties. Furthermore, when multiple objects are buried under an undulating ground, it is hard to judge their relative positions based on a B-scan that assumes a flat ground. In this paper, a roots fitting method based on electromagnetic waves (EM) travel time analysis is proposed to take into consideration the realistic undulating ground surface. A wheel-based (WB) GPR and an antenna-height-fixed (AHF) GPR System are presented, and their corresponding fitting models are proposed. The effectiveness of the proposed method is demonstrated and validated through numerical examples and field experiments. △ Less

Submitted 26 May, 2022; originally announced May 2022.

Comments: 11 pages, 6 figures, accepted by IEEE TIM

arXiv:2205.08142 [pdf]

Dual-Cross-Polarized GPR Measurement Method for Detection and Orientation Estimation of Shallowly Buried Elongated Object

Authors: Hai-Han Sun, Yee Hui Lee, Wenhao Luo, Lai Fern Ow, Mohamed Lokman Mohd Yusof, Abdulkadir C. Yucel

Abstract: Detecting a shallowly buried and elongated object and estimating its orientation using a commonly adopted co-polarized GPR system is challenging due to the presence of strong ground clutter that masks the target reflection. A cross-polarized configuration can be used to suppress ground clutter and reveal the object reflection, but it suffers from inconsistent detection capability which significant… ▽ More Detecting a shallowly buried and elongated object and estimating its orientation using a commonly adopted co-polarized GPR system is challenging due to the presence of strong ground clutter that masks the target reflection. A cross-polarized configuration can be used to suppress ground clutter and reveal the object reflection, but it suffers from inconsistent detection capability which significantly varies with different object orientations. To address this issue, we propose a dual-cross-polarized detection (DCPD) method which utilizes two cross-polarized antennas with a special arrangement to detect the object. The signals reflected by the object and collected by the two antennas are combined in a rotationally invariant manner to ensure both effective ground clutter suppression and consistent detection irrespective of the object orientation. In addition, we present a dual-cross-polarized orientation estimation (DCPOE) algorithm to estimate the object orientation from the two cross-polarized data. The proposed DCPOE algorithm is less affected by environmental noise and performs robust and accurate azimuth angle estimation. The effectiveness of the proposed techniques in the detection and orientation estimation and their advantages over the existing method have been demonstrated using experimental data. Comparison results show that the maximum and average errors are 22.3° and 10.9° for the Alford rotation algorithm, while those are 4.9° and 1.8° for the proposed DCPOE algorithm in the demonstrated shallowly buried object cases. The proposed techniques can be unified in a framework to facilitate the investigation and map** of shallowly buried and elongated targets. △ Less

Submitted 17 May, 2022; originally announced May 2022.

arXiv:2204.02594 [pdf]

SFCW GPR tree roots detection enhancement by time frequency analysis in tropical areas

Authors: Wenhao Luo, Yee Hui Lee, Abdulkadir C. Yucel, Genevieve Ow, Mohamed Lokman Mohd Yusof

Abstract: Accurate monitoring of tree roots using ground penetrating radar (GPR) is very useful in assessing the trees health. In high moisture tropical areas such as Singapore, tree fall due to root rot can cause loss of lives and properties. The tropical complex soil characteristics due to the high moisture content tends to affect penetration depth of the signal. This limits the depth range of the GPR. Ty… ▽ More Accurate monitoring of tree roots using ground penetrating radar (GPR) is very useful in assessing the trees health. In high moisture tropical areas such as Singapore, tree fall due to root rot can cause loss of lives and properties. The tropical complex soil characteristics due to the high moisture content tends to affect penetration depth of the signal. This limits the depth range of the GPR. Typically, a wide band signal is used to increase the penetration depth and to improve the resolution of the GPR. However, this broad band frequency tends to be noisy and selective frequency filtering is required for noise reduction. Therefore, in this paper, we adapt the stepped frequency continuous wave (SFCW) GPR and propose the use of a Joint time frequency analysis (JTFA) method called short time Fourier transform (STFT), to reduce noise and enhance tree root detection. The proposed methodology is illustrated and tested with controlled experiments and real tree roots testing. The results show promising prospects of the method for tree roots detection in tropical areas. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: 4 pages, 9 figures, accepted to be presented in 2022 IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2022)

arXiv:2203.03830 [pdf]

Slice-Connection Clustering Algorithm for Tree Roots Recognition in Noisy 3D GPR Data

Authors: Wenhao Luo, Yee Hui Lee, Lai Fern Ow, Mohamed Lokman Mohd Yusof, Abdulkadir C. Yucel

Abstract: 3D map** of tree roots is a popular ground-penetrating radar (GPR) application. In real field tests, the recognition of tree roots suffers due to noisey reflection patterns from subsurface targets that are not of interest, such as rocks, cavities, soil unevenness, etc. A Slice-Connection Clustering Algorithm (SCC) is applied to separate the regions of interest from each other in a reconstructed… ▽ More 3D map** of tree roots is a popular ground-penetrating radar (GPR) application. In real field tests, the recognition of tree roots suffers due to noisey reflection patterns from subsurface targets that are not of interest, such as rocks, cavities, soil unevenness, etc. A Slice-Connection Clustering Algorithm (SCC) is applied to separate the regions of interest from each other in a reconstructed 3D image. The proposed method can successfully recognize the radar signatures of the roots and distinguish roots from other objects. Meanwhile, most noise radar features are ignored through our method. The final 3D map** of the radargram obtained by the method can be used to estimate the location and extension trend of the tree roots. The effectiveness of the proposed system is tested on real GPR data. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: submitted to AP-S URSI 2022, 2 pages with 3 figures

arXiv:2201.08512 [pdf, other]

Vertical Federated Edge Learning with Distributed Integrated Sensing and Communication

Authors: Peixi Liu, Guangxu Zhu, Wei Jiang, Wu Luo, Jie Xu, Shuguang Cui

Abstract: This letter studies a vertical federated edge learning (FEEL) system for collaborative objects/human motion recognition by exploiting the distributed integrated sensing and communication (ISAC). In this system, distributed edge devices first send wireless signals to sense targeted objects/human, and then exchange intermediate computed vectors (instead of raw sensing data) for collaborative recogni… ▽ More This letter studies a vertical federated edge learning (FEEL) system for collaborative objects/human motion recognition by exploiting the distributed integrated sensing and communication (ISAC). In this system, distributed edge devices first send wireless signals to sense targeted objects/human, and then exchange intermediate computed vectors (instead of raw sensing data) for collaborative recognition while preserving data privacy. To boost the spectrum and hardware utilization efficiency for FEEL, we exploit ISAC for both target sensing and data exchange, by employing dedicated frequency-modulated continuous-wave (FMCW) signals at each edge device. Under this setup, we propose a vertical FEEL framework for realizing the recognition based on the collected multi-view wireless sensing data. In this framework, each edge device owns an individual local L-model to transform its sensing data into an intermediate vector with relatively low dimensions, which is then transmitted to a coordinating edge device for final output via a common downstream S-model. By considering a human motion recognition task, experimental results show that our vertical FEEL based approach achieves recognition accuracy up to 98\% with an improvement up to 8\% compared to the benchmarks, including on-device training and horizontal FEEL. △ Less

Submitted 6 June, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: 5 pages, 7 figures, accepted by IEEE Communications Letters

arXiv:2110.01761 [pdf, other]

Proxy-bridged Image Reconstruction Network for Anomaly Detection in Medical Images

Authors: Kang Zhou, **g Li, Weixin Luo, Zhengxin Li, Jianlong Yang, Huazhu Fu, Jun Cheng, Jiang Liu, Shenghua Gao

Abstract: Anomaly detection in medical images refers to the identification of abnormal images with only normal images in the training set. Most existing methods solve this problem with a self-reconstruction framework, which tends to learn an identity map** and reduces the sensitivity to anomalies. To mitigate this problem, in this paper, we propose a novel Proxy-bridged Image Reconstruction Network (Proxy… ▽ More Anomaly detection in medical images refers to the identification of abnormal images with only normal images in the training set. Most existing methods solve this problem with a self-reconstruction framework, which tends to learn an identity map** and reduces the sensitivity to anomalies. To mitigate this problem, in this paper, we propose a novel Proxy-bridged Image Reconstruction Network (ProxyAno) for anomaly detection in medical images. Specifically, we use an intermediate proxy to bridge the input image and the reconstructed image. We study different proxy types, and we find that the superpixel-image (SI) is the best one. We set all pixels' intensities within each superpixel as their average intensity, and denote this image as SI. The proposed ProxyAno consists of two modules, a Proxy Extraction Module and an Image Reconstruction Module. In the Proxy Extraction Module, a memory is introduced to memorize the feature correspondence for normal image to its corresponding SI, while the memorized correspondence does not apply to the abnormal images, which leads to the information loss for abnormal image and facilitates the anomaly detection. In the Image Reconstruction Module, we map an SI to its reconstructed image. Further, we crop a patch from the image and paste it on the normal SI to mimic the anomalies, and enforce the network to reconstruct the normal image even with the pseudo abnormal SI. In this way, our network enlarges the reconstruction error for anomalies. Extensive experiments on brain MR images, retinal OCT images and retinal fundus images verify the effectiveness of our method for both image-level and pixel-level anomaly detection. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: This paper is accepted to IEEE TMI

arXiv:2104.01160 [pdf, other]

doi 10.1145/3412382.3458255

PhyAug: Physics-Directed Data Augmentation for Deep Sensing Model Transfer in Cyber-Physical Systems

Authors: Wenjie Luo, Zhenyu Yan, Qun Song, Rui Tan

Abstract: Run-time domain shifts from training-phase domains are common in sensing systems designed with deep learning. The shifts can be caused by sensor characteristic variations and/or discrepancies between the design-phase model and the actual model of the sensed physical process. To address these issues, existing transfer learning techniques require substantial target-domain data and thus incur high po… ▽ More Run-time domain shifts from training-phase domains are common in sensing systems designed with deep learning. The shifts can be caused by sensor characteristic variations and/or discrepancies between the design-phase model and the actual model of the sensed physical process. To address these issues, existing transfer learning techniques require substantial target-domain data and thus incur high post-deployment overhead. This paper proposes to exploit the first principle governing the domain shift to reduce the demand on target-domain data. Specifically, our proposed approach called PhyAug uses the first principle fitted with few labeled or unlabeled source/target-domain data pairs to transform the existing source-domain training data into augmented data for updating the deep neural networks. In two case studies of keyword spotting and DeepSpeech2-based automatic speech recognition, with 5-second unlabeled data collected from the target microphones, PhyAug recovers the recognition accuracy losses due to microphone characteristic variations by 37% to 72%. In a case study of seismic source localization with TDoA fngerprints, by exploiting the frst principle of signal propagation in uneven media, PhyAug only requires 3% to 8% of labeled TDoA measurements required by the vanilla fingerprinting approach in achieving the same localization accuracy. △ Less

Submitted 19 April, 2021; v1 submitted 31 March, 2021; originally announced April 2021.

arXiv:2012.00291 [pdf]

Effects of Intermediate Frequency Bandwidth on Stepped Frequency Ground Penetrating Radar

Authors: Wenhao Luo, Hai-Han Sun, Yee Hui Lee, Abdulkadir C. Yucel, Genevieve Ow, Mohamed Lokman Mohd Yusof

Abstract: A stepped frequency ground penetrating radar (GPR) system is used for detecting objects buried under high permittivity soil. Different intermediate frequency bandwidth (IFBW) of the mixing receiver is used and measurement results are compared. It is shown that the IFBW can affect the system's signal-to-noise ratio (SNR). Experimental results show that objects of different materials can clearly be… ▽ More A stepped frequency ground penetrating radar (GPR) system is used for detecting objects buried under high permittivity soil. Different intermediate frequency bandwidth (IFBW) of the mixing receiver is used and measurement results are compared. It is shown that the IFBW can affect the system's signal-to-noise ratio (SNR). Experimental results show that objects of different materials can clearly be detected when the appropriate IFBW is used. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2008.03632 [pdf, other]

Encoding Structure-Texture Relation with P-Net for Anomaly Detection in Retinal Images

Authors: Kang Zhou, Yuting Xiao, Jianlong Yang, Jun Cheng, Wen Liu, Weixin Luo, Zaiwang Gu, Jiang Liu, Shenghua Gao

Abstract: Anomaly detection in retinal image refers to the identification of abnormality caused by various retinal diseases/lesions, by only leveraging normal images in training phase. Normal images from healthy subjects often have regular structures (e.g., the structured blood vessels in the fundus image, or structured anatomy in optical coherence tomography image). On the contrary, the diseases and lesion… ▽ More Anomaly detection in retinal image refers to the identification of abnormality caused by various retinal diseases/lesions, by only leveraging normal images in training phase. Normal images from healthy subjects often have regular structures (e.g., the structured blood vessels in the fundus image, or structured anatomy in optical coherence tomography image). On the contrary, the diseases and lesions often destroy these structures. Motivated by this, we propose to leverage the relation between the image texture and structure to design a deep neural network for anomaly detection. Specifically, we first extract the structure of the retinal images, then we combine both the structure features and the last layer features extracted from original health image to reconstruct the original input healthy image. The image feature provides the texture information and guarantees the uniqueness of the image recovered from the structure. In the end, we further utilize the reconstructed image to extract the structure and measure the difference between structure extracted from original and the reconstructed image. On the one hand, minimizing the reconstruction difference behaves like a regularizer to guarantee that the image is corrected reconstructed. On the other hand, such structure difference can also be used as a metric for normality measurement. The whole network is termed as P-Net because it has a ``P'' shape. Extensive experiments on RESC dataset and iSee dataset validate the effectiveness of our approach for anomaly detection in retinal images. Further, our method also generalizes well to novel class discovery in retinal images and anomaly detection in real-world images. △ Less

Submitted 8 August, 2020; originally announced August 2020.

arXiv:2004.02805 [pdf]

Application of Structural Similarity Analysis of Visually Salient Areas and Hierarchical Clustering in the Screening of Similar Wireless Capsule Endoscopic Images

Authors: Rui Nie, Huan Yang, Hejuan Peng, Wenbin Luo, Weiya Fan, Jie Zhang, **g Liao, Fang Huang, Yufeng Xiao

Abstract: Small intestinal capsule endoscopy is the mainstream method for inspecting small intestinal lesions,but a single small intestinal capsule endoscopy will produce 60,000 - 120,000 images, the majority of which are similar and have no diagnostic value. It takes 2 - 3 hours for doctors to identify lesions from these images. This is time-consuming and increase the probability of misdiagnosis and missed… ▽ More Small intestinal capsule endoscopy is the mainstream method for inspecting small intestinal lesions,but a single small intestinal capsule endoscopy will produce 60,000 - 120,000 images, the majority of which are similar and have no diagnostic value. It takes 2 - 3 hours for doctors to identify lesions from these images. This is time-consuming and increase the probability of misdiagnosis and missed diagnosis since doctors are likely to experience visual fatigue while focusing on a large number of similar images for an extended period of time.In order to solve these problems, we proposed a similar wireless capsule endoscope (WCE) image screening method based on structural similarity analysis and the hierarchical clustering of visually salient sub-image blocks. The similarity clustering of images was automatically identified by hierarchical clustering based on the hue,saturation,value (HSV) spatial color characteristics of the images,and the keyframe images were extracted based on the structural similarity of the visually salient sub-image blocks, in order to accurately identify and screen out similar small intestinal capsule endoscopic images. Subsequently, the proposed method was applied to the capsule endoscope imaging workstation. After screening out similar images in the complete data gathered by the Type I OMOM Small Intestinal Capsule Endoscope from 52 cases covering 17 common types of small intestinal lesions, we obtained a lesion recall of 100% and an average similar image reduction ratio of 76%. With similar images screened out, the average play time of the OMOM image workstation was 18 minutes, which greatly reduced the time spent by doctors viewing the images. △ Less

Submitted 1 April, 2020; originally announced April 2020.

arXiv:1912.09957 [pdf, other]

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates

Authors: Wenhao Luo, Wen Sun, Ashish Kapoor

Abstract: Safety in terms of collision avoidance for multi-robot systems is a difficult challenge under uncertainty, non-determinism and lack of complete information. This paper aims to propose a collision avoidance method that accounts for both measurement uncertainty and motion uncertainty. In particular, we propose Probabilistic Safety Barrier Certificates (PrSBC) using Control Barrier Functions to defin… ▽ More Safety in terms of collision avoidance for multi-robot systems is a difficult challenge under uncertainty, non-determinism and lack of complete information. This paper aims to propose a collision avoidance method that accounts for both measurement uncertainty and motion uncertainty. In particular, we propose Probabilistic Safety Barrier Certificates (PrSBC) using Control Barrier Functions to define the space of admissible control actions that are probabilistically safe with formally provable theoretical guarantee. By formulating the chance constrained safety set into deterministic control constraints with PrSBC, the method entails minimally modifying an existing controller to determine an alternative safe controller via quadratic programming constrained to PrSBC constraints. The key advantage of the approach is that no assumptions about the form of uncertainty are required other than finite support, also enabling worst-case guarantees. We demonstrate effectiveness of the approach through experiments on realistic simulation environments. △ Less

Submitted 7 December, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

Comments: NeurIPS 2020

arXiv:1911.11759 [pdf, other]

Password-conditioned Anonymization and Deanonymization with Face Identity Transformers

Authors: Xiuye Gu, Weixin Luo, Michael S. Ryoo, Yong Jae Lee

Abstract: Cameras are prevalent in our daily lives, and enable many useful systems built upon computer vision technologies such as smart cameras and home robots for service applications. However, there is also an increasing societal concern as the captured images/videos may contain privacy-sensitive information (e.g., face identity). We propose a novel face identity transformer which enables automated photo… ▽ More Cameras are prevalent in our daily lives, and enable many useful systems built upon computer vision technologies such as smart cameras and home robots for service applications. However, there is also an increasing societal concern as the captured images/videos may contain privacy-sensitive information (e.g., face identity). We propose a novel face identity transformer which enables automated photo-realistic password-based anonymization as well as deanonymization of human faces appearing in visual data. Our face identity transformer is trained to (1) remove face identity information after anonymization, (2) make the recovery of the original face possible when given the correct password, and (3) return a wrong--but photo-realistic--face given a wrong password. Extensive experiments show that our approach enables multimodal password-conditioned face anonymizations and deanonymizations, without sacrificing privacy compared to existing anonymization approaches. △ Less

Submitted 30 September, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

Comments: ECCV 2020

arXiv:1909.12224 [pdf, other]

Liquid War** GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

Authors: Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, Shenghua Gao

Abstract: We tackle the human motion imitation, appearance transfer, and novel view synthesis within a unified framework, which means that the model once being trained can be used to handle all these tasks. The existing task-specific methods mainly use 2D keypoints (pose) to estimate the human body structure. However, they only expresses the position information with no abilities to characterize the persona… ▽ More We tackle the human motion imitation, appearance transfer, and novel view synthesis within a unified framework, which means that the model once being trained can be used to handle all these tasks. The existing task-specific methods mainly use 2D keypoints (pose) to estimate the human body structure. However, they only expresses the position information with no abilities to characterize the personalized shape of the individual person and model the limbs rotations. In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape, which can not only model the joint location and rotation but also characterize the personalized body shape. To preserve the source information, such as texture, style, color, and face identity, we propose a Liquid War** GAN with Liquid War** Block (LWB) that propagates the source information in both image and feature spaces, and synthesizes an image with respect to the reference. Specifically, the source features are extracted by a denoising convolutional auto-encoder for characterizing the source identity well. Furthermore, our proposed method is able to support a more flexible war** from multiple sources. In addition, we build a new dataset, namely Impersonator (iPER) dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis. Extensive experiments demonstrate the effectiveness of our method in several aspects, such as robustness in occlusion case and preserving face identity, shape consistency and clothes details. All codes and datasets are available on https://svip-lab.github.io/project/impersonator.html △ Less

Submitted 1 October, 2019; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: accepted by ICCV2019

arXiv:1907.09077 [pdf, other]

doi 10.1145/3307650.3322270

A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology

Authors: Ruizhe Cai, Ao Ren, Olivia Chen, Ning Liu, Caiwen Ding, Xuehai Qian, Jie Han, Wenhui Luo, Nobuyuki Yoshikawa, Yanzhi Wang

Abstract: The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implem… ▽ More The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implementing large-scale systems using AQFP. As a result, it will be promising for AQFP in high-performance computing and deep space applications, with Deep Neural Network (DNN) inference acceleration as an important example. Besides ultra-high energy efficiency, AQFP exhibits two unique characteristics: the deep pipelining nature since each AQFP logic gate is connected with an AC clock signal, which increases the difficulty to avoid RAW hazards; the second is the unique opportunity of true random number generation (RNG) using a single AQFP buffer, far more efficient than RNG in CMOS. We point out that these two characteristics make AQFP especially compatible with the \emph{stochastic computing} (SC) technique, which uses a time-independent bit sequence for value representation, and is compatible with the deep pipelining nature. Further, the application of SC has been investigated in DNNs in prior work, and the suitability has been illustrated as SC is more compatible with approximate computations. This work is the first to develop an SC-based DNN acceleration framework using AQFP technology. △ Less

Submitted 21 July, 2019; originally announced July 2019.

arXiv:1906.01516 [pdf, other]

Randomized Channel Sparsifying Hybrid Precoding for FDD Massive MIMO Systems

Authors: Chang Tian, An Liu, Mahdi Barzegar Khalilsarai, Giuseppe Caire, Wu Luo, Minjian Zhao

Abstract: We propose a novel randomized channel sparsifying hybrid precoding (RCSHP) design to reduce the signaling overhead of channel estimation and the hardware cost and power consumption at the base station (BS), in order to fully harvest benefits of frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) systems. RCSHP allows time-sharing among multiple analog precoders, each serv… ▽ More We propose a novel randomized channel sparsifying hybrid precoding (RCSHP) design to reduce the signaling overhead of channel estimation and the hardware cost and power consumption at the base station (BS), in order to fully harvest benefits of frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) systems. RCSHP allows time-sharing among multiple analog precoders, each serving a compatible user group. The analog precoder is adapted to the channel statistics to properly sparsify the channel for the associated user group, such that the resulting effective channel (product of channel and analog precoder) not only has enough spatial degrees of freedom (DoF) to serve this group of users, but also can be accurately estimated under the limited pilot budget. The digital precoder is adapted to the effective channel based on the duality theory to facilitate the power allocation and exploit the spatial multiplexing gain. We formulate the joint optimization of the time-sharing factors and the associated sets of analog precoders and power allocations as a general utility optimization problem, which considers the impact of effective channel estimation error on the system performance. Then we propose an efficient stochastic successive convex approximation algorithm to provably obtain Karush-Kuhn-Tucker (KKT) points of this problem. △ Less

Submitted 12 December, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:1808.10564 [pdf, other]

Multi-Cell Multi-Task Convolutional Neural Networks for Diabetic Retinopathy Grading

Authors: Kang Zhou, Zaiwang Gu, Wen Liu, Weixin Luo, Jun Cheng, Shenghua Gao, Jiang Liu

Abstract: Diabetic Retinopathy (DR) is a non-negligible eye disease among patients with Diabetes Mellitus, and automatic retinal image analysis algorithm for the DR screening is in high demand. Considering the resolution of retinal image is very high, where small pathological tissues can be detected only with large resolution image and large local receptive field are required to identify those late stage di… ▽ More Diabetic Retinopathy (DR) is a non-negligible eye disease among patients with Diabetes Mellitus, and automatic retinal image analysis algorithm for the DR screening is in high demand. Considering the resolution of retinal image is very high, where small pathological tissues can be detected only with large resolution image and large local receptive field are required to identify those late stage disease, but directly training a neural network with very deep architecture and high resolution image is both time computational expensive and difficult because of gradient vanishing/exploding problem, we propose a \textbf{Multi-Cell} architecture which gradually increases the depth of deep neural network and the resolution of input image, which both boosts the training time but also improves the classification accuracy. Further, considering the different stages of DR actually progress gradually, which means the labels of different stages are related. To considering the relationships of images with different stages, we propose a \textbf{Multi-Task} learning strategy which predicts the label with both classification and regression. Experimental results on the Kaggle dataset show that our method achieves a Kappa of 0.841 on test set which is the 4-th rank of all state-of-the-arts methods. Further, our Multi-Cell Multi-Task Convolutional Neural Networks (M$^2$CNN) solution is a general framework, which can be readily integrated with many other deep neural network architectures. △ Less

Submitted 11 October, 2018; v1 submitted 30 August, 2018; originally announced August 2018.

Comments: Accepted by EMBC 2018

arXiv:1805.03787 [pdf, ps, other]

MIMO radar waveform design with practical constraints: A low-complexity approach

Authors: Chenglin Ren, Fan liu, Longfei Zhou, Jianming Zhou, Wu Luo, Shengzhi Yang

Abstract: In this letter, we consider the multiple-input multiple-output (MIMO) radar waveform design in the presence of signal-dependent clutters and additive white Gaussian noise. By imposing the constant modulus constraint (CMC) and waveform similarity constraint (SC), the signal-to-interference-plus-noise (SINR) maximization problem is non-convex and NP-hard in general, which can be transformed into a s… ▽ More In this letter, we consider the multiple-input multiple-output (MIMO) radar waveform design in the presence of signal-dependent clutters and additive white Gaussian noise. By imposing the constant modulus constraint (CMC) and waveform similarity constraint (SC), the signal-to-interference-plus-noise (SINR) maximization problem is non-convex and NP-hard in general, which can be transformed into a sequence of convex quadratically constrained quadratic programming (QCQP) subproblems. Aiming at solving each subproblem efficiently, we propose a low-complexity method termed Accelerated Gradient Projection (AGP). In contrast to the conventional IPM based method, our proposed algorithm achieves the same performance in terms of the receive SINR and the beampattern, while notably reduces computational complexity. △ Less

Submitted 9 May, 2018; originally announced May 2018.

arXiv:1711.04646 [pdf]

doi 10.1364/OE.26.004243

Orbital-angular-momentum mode-group multiplexed transmission over a graded-index ring-core fiber based on receive diversity and maximal ratio combining

Authors: Junwei Zhang, Guoxuan Zhu, Liu Jie, Xiong Wu, Jianbo Zhu, Cheng Du, Wenyong Luo, Siyuan Yu

Abstract: An orbital-angular-momentum (OAM) mode-group multiplexing (MGM) scheme based on a graded-index ring-core fiber (GIRCF) is proposed, in which a single-input two-output (or receive diversity) architecture is designed for each MG channel and simple digital signal processing (DSP) is utilized to adaptively resist the mode partition noise resulting from random intra-group mode crosstalk. There is no ne… ▽ More An orbital-angular-momentum (OAM) mode-group multiplexing (MGM) scheme based on a graded-index ring-core fiber (GIRCF) is proposed, in which a single-input two-output (or receive diversity) architecture is designed for each MG channel and simple digital signal processing (DSP) is utilized to adaptively resist the mode partition noise resulting from random intra-group mode crosstalk. There is no need of complex multiple-input multiple-output (MIMO) equalization in this scheme. Furthermore, the signal-to-noise ratio (SNR) of the received signals can be improved if a simple maximal ratio combining (MRC) technique is employed on the receiver side to efficiently take advantage of the diversity gain of receiver. Intensity-modulated direct-detection (IM-DD) systems transmitting three OAM mode groups with total 100-Gb/s discrete multi-tone (DMT) signals over a 1-km GIRCF and two OAM mode groups with total 40-Gb/s DMT signals over an 18-km GIRCF are experimentally demonstrated, respectively, to confirm the feasibility of our proposed OAM-MGM scheme. △ Less

Submitted 9 November, 2017; originally announced November 2017.

Comments: 13 pages, 6 figures

Showing 1–36 of 36 results for author: Luo, W