Search | arXiv e-print repository

A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

Authors: Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli

Abstract: Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the a… ▽ More Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training a separate model for each task which is expensive. Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space.Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process. Instead of the standard fixed diffusion timestep, we propose applying variable diffusion timesteps across the temporal dimension and across modalities of the inputs. This formulation offers flexibility to introduce variable noise levels for various portions of the input, hence the term mixture of noise levels. We propose a transformer-based audiovisual latent diffusion model and show that it can be trained in a task-agnostic fashion using our approach to enable a variety of audiovisual generation tasks at inference time. Experiments demonstrate the versatility of our method in tackling cross-modal and multimodal interpolation tasks in the audiovisual space. Notably, our proposed approach surpasses baselines in generating temporally and perceptually consistent samples conditioned on the input. Project page: avdit2024.github.io △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.10825 [pdf, other]

Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

Authors: Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili **, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu

Abstract: Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks bas… ▽ More Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks based on human instructions, paving the way to artificial general intelligence (AGI)-enabled 6G. Given the great potential of LLM technologies, this work aims to provide a comprehensive overview of LLM-enabled telecom networks. In particular, we first present LLM fundamentals, including model architecture, pre-training, fine-tuning, inference and utilization, model evaluation, and telecom deployment. Then, we introduce LLM-enabled key techniques and telecom applications in terms of generation, classification, optimization, and prediction problems. Specifically, the LLM-enabled generation applications include telecom domain knowledge, code, and network configuration generation. After that, the LLM-based classification applications involve network security, text, image, and traffic classification problems. Moreover, multiple LLM-enabled optimization techniques are introduced, such as automated reward function design for reinforcement learning and verbal reinforcement learning. Furthermore, for LLM-aided prediction problems, we discussed time-series prediction models and multi-modality prediction problems for telecom. Finally, we highlight the challenges and identify the future directions of LLM-enabled telecom networks. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2404.11168 [pdf]

Microwave photonic short-time Fourier transform based on stabilized period-one nonlinear laser dynamics and stimulated Brillouin scattering

Authors: Sunan Zhang, Taixia Shi, Lizhong Jiang, Yang Chen

Abstract: A microwave photonic short-time Fourier transform (STFT) system based on stabilized period-one (P1) nonlinear laser dynamics and stimulated Brillouin scattering (SBS) is proposed. By using an optoelectronic feedback loop, the frequency-sweep optical signal generated by the P1 nonlinear laser dynamics is stabilized, which is further used in conjunction with an optical bandpass filter implemented by… ▽ More A microwave photonic short-time Fourier transform (STFT) system based on stabilized period-one (P1) nonlinear laser dynamics and stimulated Brillouin scattering (SBS) is proposed. By using an optoelectronic feedback loop, the frequency-sweep optical signal generated by the P1 nonlinear laser dynamics is stabilized, which is further used in conjunction with an optical bandpass filter implemented by stimulated Brillouin scattering (SBS) to achieve the frequency-to-time map** of microwave signals and the final STFT. By comparing the experimental results with and without optoelectronic feedback, it is found that the time-frequency diagram of the signal under test (SUT) obtained by STFT is clearer and more regular, and the frequency of the SUT measured in each frequency-sweep period is more accurate. The mean absolute error is reduced by 50% under the optimal filter bandwidth. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 9 pages, 6 figures

arXiv:2404.07577 [pdf, other]

Generating Comprehensive Lithium Battery Charging Data with Generative AI

Authors: Lidang Jiang, Changyan Hu, Sibei Ji, Hang Zhao, Junxiong Chen, Ge He

Abstract: In optimizing performance and extending the lifespan of lithium batteries, accurate state prediction is pivotal. Traditional regression and classification methods have achieved some success in battery state prediction. However, the efficacy of these data-driven approaches heavily relies on the availability and quality of public datasets. Additionally, generating electrochemical data predominantly… ▽ More In optimizing performance and extending the lifespan of lithium batteries, accurate state prediction is pivotal. Traditional regression and classification methods have achieved some success in battery state prediction. However, the efficacy of these data-driven approaches heavily relies on the availability and quality of public datasets. Additionally, generating electrochemical data predominantly through battery experiments is a lengthy and costly process, making it challenging to acquire high-quality electrochemical data. This difficulty, coupled with data incompleteness, significantly impacts prediction accuracy. Addressing these challenges, this study introduces the End of Life (EOL) and Equivalent Cycle Life (ECL) as conditions for generative AI models. By integrating an embedding layer into the CVAE model, we developed the Refined Conditional Variational Autoencoder (RCVAE). Through preprocessing data into a quasi-video format, our study achieves an integrated synthesis of electrochemical data, including voltage, current, temperature, and charging capacity, which is then processed by the RCVAE model. Coupled with customized training and inference algorithms, this model can generate specific electrochemical data for EOL and ECL under supervised conditions. This method provides users with a comprehensive electrochemical dataset, pioneering a new research domain for the artificial synthesis of lithium battery data. Furthermore, based on the detailed synthetic data, various battery state indicators can be calculated, offering new perspectives and possibilities for lithium battery performance prediction. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.09062 [pdf]

TBI Image/Text (TBI-IT): Comprehensive Text and Image Datasets for Traumatic Brain Injury Research

Authors: Jie Li, Jiaying Wen, Tongxin Yang, Fenglin Cai, Miao Wei, Zhiwei Zhang, Li Jiang

Abstract: In this paper, we introduce a new dataset in the medical field of Traumatic Brain Injury (TBI), called TBI-IT, which includes both electronic medical records (EMRs) and head CT images. This dataset is designed to enhance the accuracy of artificial intelligence in the diagnosis and treatment of TBI. This dataset, built upon the foundation of standard text and image data, incorporates specific annot… ▽ More In this paper, we introduce a new dataset in the medical field of Traumatic Brain Injury (TBI), called TBI-IT, which includes both electronic medical records (EMRs) and head CT images. This dataset is designed to enhance the accuracy of artificial intelligence in the diagnosis and treatment of TBI. This dataset, built upon the foundation of standard text and image data, incorporates specific annotations within the EMRs, extracting key content from the text information, and categorizes the annotation content of imaging data into five types: brain midline, hematoma, left cerebral ventricle, right cerebral ventricle and fracture. TBI-IT aims to be a foundational dataset for feature learning in image segmentation tasks and named entity recognition. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2401.15934

arXiv:2402.18070 [pdf, other]

A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing

Authors: Limin Jiang, Yi Shi, Haiqin Hu, Qingyu Deng, Siyi Xu, Yintao Liu, Feng Yuan, Si Wang, Yihao Shen, Fangfang Ye, Shan Cao, Zhiyuan Jiang

Abstract: Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and… ▽ More Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and consecutive character of WBP. Furthermore, the large amount of data in WBPs cannot be processed quickly in symmetric multiprocessors (SMPs) due to the unpredictability of memory latency. To address this issue, we propose a hierarchical dataflow-driven architecture to accelerate WBP. A pack-and-ship approach is presented under a non-uniform memory access (NUMA) architecture to allow the subordinate tiles to operate in a bundled access and execute manner. We also propose a multi-level dataflow model and the related scheduling scheme to manage and allocate the heterogeneous hardware resources. Experiment results demonstrate that our prototype achieves $2\times$ and $2.3\times$ speedup in terms of normalized throughput and single-tile clock cycles compared with GPU and DSP counterparts in several critical WBP benchmarks. Additionally, a link-level throughput of $288$ Mbps can be achieved with a $45$-core configuration. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 7 pages, 7 figures, conference

arXiv:2311.11969 [pdf, other]

SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks

Authors: ** Ye, Junlong Cheng, Jianpin Chen, Zhongying Deng, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Jilong Chen, Lei Jiang, Hui Sun, Min Zhu, Shaoting Zhang, Junjun He, Yu Qiao

Abstract: Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowled… ▽ More Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowledge into SAM, we introduce SA-Med2D-20M, a large-scale segmentation dataset of 2D medical images built upon numerous public and private datasets. It consists of 4.6 million 2D medical images and 19.7 million corresponding masks, covering almost the whole body and showing significant diversity. This paper describes all the datasets collected in SA-Med2D-20M and details how to process these datasets. Furthermore, comprehensive statistics of SA-Med2D-20M are presented to facilitate the better use of our dataset, which can help the researchers build medical vision foundation models or apply their models to downstream medical applications. We hope that the large scale and diversity of SA-Med2D-20M can be leveraged to develop medical artificial intelligence for enhancing diagnosis, medical image analysis, knowledge sharing, and education. The data with the redistribution license is publicly available at https://github.com/OpenGVLab/SAM-Med2D. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.04049 [pdf, other]

3D EAGAN: 3D edge-aware attention generative adversarial network for prostate segmentation in transrectal ultrasound images

Authors: Mengqing Liu, Xiao Shao, Li** Jiang, Kaizhi Wu

Abstract: Automatic prostate segmentation in TRUS images has always been a challenging problem, since prostates in TRUS images have ambiguous boundaries and inhomogeneous intensity distribution. Although many prostate segmentation methods have been proposed, they still need to be improved due to the lack of sensibility to edge information. Consequently, the objective of this study is to devise a highly effe… ▽ More Automatic prostate segmentation in TRUS images has always been a challenging problem, since prostates in TRUS images have ambiguous boundaries and inhomogeneous intensity distribution. Although many prostate segmentation methods have been proposed, they still need to be improved due to the lack of sensibility to edge information. Consequently, the objective of this study is to devise a highly effective prostate segmentation method that overcomes these limitations and achieves accurate segmentation of prostates in TRUS images. A 3D edge-aware attention generative adversarial network (3D EAGAN)-based prostate segmentation method is proposed in this paper, which consists of an edge-aware segmentation network (EASNet) that performs the prostate segmentation and a discriminator network that distinguishes predicted prostates from real prostates. The proposed EASNet is composed of an encoder-decoder-based U-Net backbone network, a detail compensation module, four 3D spatial and channel attention modules, an edge enhance module, and a global feature extractor. The detail compensation module is proposed to compensate for the loss of detailed information caused by the down-sampling process of the encoder. The features of the detail compensation module are selectively enhanced by the 3D spatial and channel attention module. Furthermore, an edge enhance module is proposed to guide shallow layers in the EASNet to focus on contour and edge information in prostates. Finally, features from shallow layers and hierarchical features from the decoder module are fused through the global feature extractor to predict the segmentation prostates. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.13541 [pdf, ps, other]

Distributed Adaptive Time-Varying Convex Optimization for Multi-agent Systems

Authors: Liangze Jiang, Zhengguang Wu, Lei Wang

Abstract: This paper focus on the time-varying convex optimization problems with uncertain parameters. A new class of adaptive algorithms are proposed to solve time-varying convex optimization problems. Under the mild assumption of Hessian and partial derivative of the gradient with respect to time, the dependence on them is reduced through appropriate adaptive law design. By integrating the new adaptive op… ▽ More This paper focus on the time-varying convex optimization problems with uncertain parameters. A new class of adaptive algorithms are proposed to solve time-varying convex optimization problems. Under the mild assumption of Hessian and partial derivative of the gradient with respect to time, the dependence on them is reduced through appropriate adaptive law design. By integrating the new adaptive optimization algorithm and a modified consensus algorithms, the time-varying optimization problems can be solved in a distributed manner. Then, they are extended from the agents with single-integrator dynamics to double-integrator dynamics, which describes more practical systems. As an example, the source seeking problem is used to verify the proposed design. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 12 pages,2 figures

arXiv:2310.00854 [pdf, other]

Regulating CPU Temperature With Thermal-Aware Scheduling Using a Reduced Order Learning Thermal Model

Authors: Anthony Dowling, Lin Jiang, Ming-Cheng Cheng, Yu Liu

Abstract: Modern real-time systems utilize considerable amounts of power while executing computation-intensive tasks. The execution of these tasks leads to significant power dissipation and heating of the device. It therefore results in severe thermal issues like temperature escalation, high thermal gradients, and excessive hot spot formation, which may result in degrading chip performance, accelerating dev… ▽ More Modern real-time systems utilize considerable amounts of power while executing computation-intensive tasks. The execution of these tasks leads to significant power dissipation and heating of the device. It therefore results in severe thermal issues like temperature escalation, high thermal gradients, and excessive hot spot formation, which may result in degrading chip performance, accelerating device aging, and premature failure. Thermal-Aware Scheduling (TAS) enables optimization of thermal dissipation to maintain a safe thermal state. In this work, we implement a new TAS algorithm, POD-TAS, which manages the thermal behavior of a multi-core CPU based on a defined set of states and their transitions. We compare the performances of a dynamic RC thermal circuit simulator (HotSpot) and a reduced order Proper Orthogonal Decomposition (POD)-based thermal model and we select the latter for use in our POD-TAS algorithm. We implement a novel simulation-based evaluation methodology to compare TAS algorithms. This methodology is used to evaluate the performance of the proposed POD-TAS algorithm. Additionally, we compare the performance of a state of the art TAS algorithm, RT-TAS, to our proposed POD-TAS algorithm. Furthermore, we utilize the COMBS benchmark suite to provide CPU workloads for task scheduling. Our experimental results on a multi-core processor using a set of 4 benchmarks demonstrate that the proposed POD-TAS method can improve thermal performance by decreasing the peak thermal variance by 53.0% and the peak chip temperature of 29.01%. Using a set of 8 benchmarks, the comparison of the two algorithms shows a decrease of 29.57% in the peak spatial variance of the chip temperature and 26.26% in the peak chip temperature. We also identify several potential future research directions. △ Less

Submitted 6 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: This version includes revisions to the previous version to improve the clarity and presentation of the work

arXiv:2308.06891 [pdf]

Viia-hand: a Reach-and-grasp Restoration System Integrating Voice interaction, Computer vision and Auditory feedback for Blind Amputees

Authors: Chunhao Peng, Dapeng Yang, Ming Cheng, **ghui Dai, Deyu Zhao, Li Jiang

Abstract: Visual feedback plays a crucial role in the process of amputation patients completing gras** in the field of prosthesis control. However, for blind and visually impaired (BVI) amputees, the loss of both visual and gras** abilities makes the "easy" reach-and-grasp task a feasible challenge. In this paper, we propose a novel multi-sensory prosthesis system hel** BVI amputees with sensing, navi… ▽ More Visual feedback plays a crucial role in the process of amputation patients completing gras** in the field of prosthesis control. However, for blind and visually impaired (BVI) amputees, the loss of both visual and gras** abilities makes the "easy" reach-and-grasp task a feasible challenge. In this paper, we propose a novel multi-sensory prosthesis system hel** BVI amputees with sensing, navigation and grasp operations. It combines modules of voice interaction, environmental perception, grasp guidance, collaborative control, and auditory/tactile feedback. In particular, the voice interaction module receives user instructions and invokes other functional modules according to the instructions. The environmental perception and grasp guidance module obtains environmental information through computer vision, and feedbacks the information to the user through auditory feedback modules (voice prompts and spatial sound sources) and tactile feedback modules (vibration stimulation). The prosthesis collaborative control module obtains the context information of the grasp guidance process and completes the collaborative control of grasp gestures and wrist angles of prosthesis in conjunction with the user's control intention in order to achieve stable grasp of various objects. This paper details a prototy** design (named viia-hand) and presents its preliminary experimental verification on healthy subjects completing specific reach-and-grasp tasks. Our results showed that, with the help of our new design, the subjects were able to achieve a precise reach and reliable grasp of the target objects in a relatively cluttered environment. Additionally, the system is extremely user-friendly, as users can quickly adapt to it with minimal training. △ Less

Submitted 13 August, 2023; originally announced August 2023.

arXiv:2306.17717 [pdf, other]

Content-Preserving Diffusion Model for Unsupervised AS-OCT image Despeckling

Authors: Li Sanqian, Higashita Risa, Fu Huazhu, Li Heng, Niu **gxuan, Liu Jiang

Abstract: Anterior segment optical coherence tomography (AS-OCT) is a non-invasive imaging technique that is highly valuable for ophthalmic diagnosis. However, speckles in AS-OCT images can often degrade the image quality and affect clinical analysis. As a result, removing speckles in AS-OCT images can greatly benefit automatic ophthalmology analysis. Unfortunately, challenges still exist in deploying effec… ▽ More Anterior segment optical coherence tomography (AS-OCT) is a non-invasive imaging technique that is highly valuable for ophthalmic diagnosis. However, speckles in AS-OCT images can often degrade the image quality and affect clinical analysis. As a result, removing speckles in AS-OCT images can greatly benefit automatic ophthalmology analysis. Unfortunately, challenges still exist in deploying effective AS-OCT image denoising algorithms, including collecting sufficient paired training data and the requirement to preserve consistent content in medical images. To address these practical issues, we propose an unsupervised AS-OCT despeckling algorithm via Content Preserving Diffusion Model (CPDM) with statistical knowledge. At the training stage, a Markov chain transforms clean images to white Gaussian noise by repeatedly adding random noise and removes the predicted noise in a reverse procedure. At the inference stage, we first analyze the statistical distribution of speckles and convert it into a Gaussian distribution, aiming to match the fast truncated reverse diffusion process. We then explore the posterior distribution of observed images as a fidelity term to ensure content consistency in the iterative procedure. Our experimental results show that CPDM significantly improves image quality compared to competitive methods. Furthermore, we validate the benefits of CPDM for subsequent clinical analysis, including ciliary muscle (CM) segmentation and scleral spur (SS) localization. △ Less

Submitted 30 June, 2023; originally announced June 2023.

arXiv:2303.14081 [pdf, other]

CoLa-Diff: Conditional Latent Diffusion Model for Multi-Modal MRI Synthesis

Authors: Lan Jiang, Ye Mao, Xi Chen, Xiangfeng Wang, Chao Li

Abstract: MRI synthesis promises to mitigate the challenge of missing MRI modality in clinical practice. Diffusion model has emerged as an effective technique for image synthesis by modelling complex and variable data distributions. However, most diffusion-based MRI synthesis models are using a single modality. As they operate in the original image domain, they are memory-intensive and less feasible for mul… ▽ More MRI synthesis promises to mitigate the challenge of missing MRI modality in clinical practice. Diffusion model has emerged as an effective technique for image synthesis by modelling complex and variable data distributions. However, most diffusion-based MRI synthesis models are using a single modality. As they operate in the original image domain, they are memory-intensive and less feasible for multi-modal synthesis. Moreover, they often fail to preserve the anatomical structure in MRI. Further, balancing the multiple conditions from multi-modal MRI inputs is crucial for multi-modal synthesis. Here, we propose the first diffusion-based multi-modality MRI synthesis model, namely Conditioned Latent Diffusion Model (CoLa-Diff). To reduce memory consumption, we design CoLa-Diff to operate in the latent space. We propose a novel network architecture, e.g., similar cooperative filtering, to solve the possible compression and noise in latent space. To better maintain the anatomical structure, brain region masks are introduced as the priors of density distributions to guide diffusion process. We further present auto-weight adaptation to employ multi-modal information effectively. Our experiments demonstrate that CoLa-Diff outperforms other state-of-the-art MRI synthesis methods, promising to serve as an effective tool for multi-modal MRI synthesis. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: 8 pages

ACM Class: I.3.3; I.4.10

arXiv:2303.13933 [pdf, other]

DisC-Diff: Disentangled Conditional Diffusion Model for Multi-Contrast MRI Super-Resolution

Authors: Ye Mao, Lan Jiang, Xi Chen, Chao Li

Abstract: Multi-contrast magnetic resonance imaging (MRI) is the most common management tool used to characterize neurological disorders based on brain tissue contrasts. However, acquiring high-resolution MRI scans is time-consuming and infeasible under specific conditions. Hence, multi-contrast super-resolution methods have been developed to improve the quality of low-resolution contrasts by leveraging com… ▽ More Multi-contrast magnetic resonance imaging (MRI) is the most common management tool used to characterize neurological disorders based on brain tissue contrasts. However, acquiring high-resolution MRI scans is time-consuming and infeasible under specific conditions. Hence, multi-contrast super-resolution methods have been developed to improve the quality of low-resolution contrasts by leveraging complementary information from multi-contrast MRI. Current deep learning-based super-resolution methods have limitations in estimating restoration uncertainty and avoiding mode collapse. Although the diffusion model has emerged as a promising approach for image enhancement, capturing complex interactions between multiple conditions introduced by multi-contrast MRI super-resolution remains a challenge for clinical applications. In this paper, we propose a disentangled conditional diffusion model, DisC-Diff, for multi-contrast brain MRI super-resolution. It utilizes the sampling-based generation and simple objective function of diffusion models to estimate uncertainty in restorations effectively and ensure a stable optimization process. Moreover, DisC-Diff leverages a disentangled multi-stream network to fully exploit complementary information from multi-contrast MRI, improving model interpretation under multiple conditions of multi-contrast inputs. We validated the effectiveness of DisC-Diff on two datasets: the IXI dataset, which contains 578 normal brains, and a clinical dataset with 316 pathological brains. Our experimental results demonstrate that DisC-Diff outperforms other state-of-the-art methods both quantitatively and visually. △ Less

Submitted 6 June, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: Early Accepted by MICCAI 2023

arXiv:2303.12249 [pdf, other]

State-of-the-art optical-based physical adversarial attacks for deep learning computer vision systems

Authors: Junbin Fang, You Jiang, Canjian Jiang, Zoe L. Jiang, Siu-Ming Yiu, Chuanyi Liu

Abstract: Adversarial attacks can mislead deep learning models to make false predictions by implanting small perturbations to the original input that are imperceptible to the human eye, which poses a huge security threat to the computer vision systems based on deep learning. Physical adversarial attacks, which is more realistic, as the perturbation is introduced to the input before it is being captured and… ▽ More Adversarial attacks can mislead deep learning models to make false predictions by implanting small perturbations to the original input that are imperceptible to the human eye, which poses a huge security threat to the computer vision systems based on deep learning. Physical adversarial attacks, which is more realistic, as the perturbation is introduced to the input before it is being captured and converted to a binary image inside the vision system, when compared to digital adversarial attacks. In this paper, we focus on physical adversarial attacks and further classify them into invasive and non-invasive. Optical-based physical adversarial attack techniques (e.g. using light irradiation) belong to the non-invasive category. As the perturbations can be easily ignored by humans as the perturbations are very similar to the effects generated by a natural environment in the real world. They are highly invisibility and executable and can pose a significant or even lethal threats to real systems. This paper focuses on optical-based physical adversarial attack techniques for computer vision systems, with emphasis on the introduction and discussion of optical-based physical adversarial attack techniques. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2301.04889 [pdf]

Artificial intelligence for diagnosing and predicting survival of patients with renal cell carcinoma: Retrospective multi-center study

Authors: Siteng Chen, Xiyue Wang, Jun Zhang, Liren Jiang, Ning Zhang, Feng Gao, Wei Yang, **xi Xiang, Sen Yang, Junhua Zheng, Xiao Han

Abstract: Background: Clear cell renal cell carcinoma (ccRCC) is the most common renal-related tumor with high heterogeneity. There is still an urgent need for novel diagnostic and prognostic biomarkers for ccRCC. Methods: We proposed a weakly-supervised deep learning strategy using conventional histology of 1752 whole slide images from multiple centers. Our study was demonstrated through internal cross-val… ▽ More Background: Clear cell renal cell carcinoma (ccRCC) is the most common renal-related tumor with high heterogeneity. There is still an urgent need for novel diagnostic and prognostic biomarkers for ccRCC. Methods: We proposed a weakly-supervised deep learning strategy using conventional histology of 1752 whole slide images from multiple centers. Our study was demonstrated through internal cross-validation and external validations for the deep learning-based models. Results: Automatic diagnosis for ccRCC through intelligent subty** of renal cell carcinoma was proved in this study. Our graderisk achieved aera the curve (AUC) of 0.840 (95% confidence interval: 0.805-0.871) in the TCGA cohort, 0.840 (0.805-0.871) in the General cohort, and 0.840 (0.805-0.871) in the CPTAC cohort for the recognition of high-grade tumor. The OSrisk for the prediction of 5-year survival status achieved AUC of 0.784 (0.746-0.819) in the TCGA cohort, which was further verified in the independent General cohort and the CPTAC cohort, with AUC of 0.774 (0.723-0.820) and 0.702 (0.632-0.765), respectively. Cox regression analysis indicated that graderisk, OSrisk, tumor grade, and tumor stage were found to be independent prognostic factors, which were further incorporated into the competing-risk nomogram (CRN). Kaplan-Meier survival analyses further illustrated that our CRN could significantly distinguish patients with high survival risk, with hazard ratio of 5.664 (3.893-8.239, p < 0.0001) in the TCGA cohort, 35.740 (5.889-216.900, p < 0.0001) in the General cohort and 6.107 (1.815 to 20.540, p < 0.0001) in the CPTAC cohort. Comparison analyses conformed that our CRN outperformed current prognosis indicators in the prediction of survival status, with higher concordance index for clinical prognosis. △ Less

Submitted 12 January, 2023; originally announced January 2023.

arXiv:2210.13761 [pdf, other]

Streaming Parrotron for on-device speech-to-speech conversion

Authors: Oleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal

Abstract: We present a fully on-device streaming Speech2Speech conversion model that normalizes a given input speech directly to synthesized output speech. Deploying such a model on mobile devices pose significant challenges in terms of memory footprint and computation requirements. We present a streaming-based approach to produce an acceptable delay, with minimal loss in speech conversion quality, when com… ▽ More We present a fully on-device streaming Speech2Speech conversion model that normalizes a given input speech directly to synthesized output speech. Deploying such a model on mobile devices pose significant challenges in terms of memory footprint and computation requirements. We present a streaming-based approach to produce an acceptable delay, with minimal loss in speech conversion quality, when compared to a reference state of the art non-streaming approach. Our method consists of first streaming the encoder in real time while the speaker is speaking. Then, as soon as the speaker stops speaking, we run the spectrogram decoder in streaming mode along the side of a streaming vocoder to generate output speech. To achieve an acceptable delay-quality trade-off, we propose a novel hybrid approach for look-ahead in the encoder which combines a look-ahead feature stacker with a look-ahead self-attention. We show that our streaming approach is almost 2x faster than real time on the Pixel4 CPU. △ Less

Submitted 24 May, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

arXiv:2209.10675 [pdf, other]

A Validation Approach to Over-parameterized Matrix and Image Recovery

Authors: Lijun Ding, Zhen Qin, Liwei Jiang, **xin Zhou, Zhihui Zhu

Abstract: In this paper, we study the problem of recovering a low-rank matrix from a number of noisy random linear measurements. We consider the setting where the rank of the ground-truth matrix is unknown a prior and use an overspecified factored representation of the matrix variable, where the global optimal solutions overfit and do not correspond to the underlying ground-truth. We then solve the associat… ▽ More In this paper, we study the problem of recovering a low-rank matrix from a number of noisy random linear measurements. We consider the setting where the rank of the ground-truth matrix is unknown a prior and use an overspecified factored representation of the matrix variable, where the global optimal solutions overfit and do not correspond to the underlying ground-truth. We then solve the associated nonconvex problem using gradient descent with small random initialization. We show that as long as the measurement operators satisfy the restricted isometry property (RIP) with its rank parameter scaling with the rank of ground-truth matrix rather than scaling with the overspecified matrix variable, gradient descent iterations are on a particular trajectory towards the ground-truth matrix and achieve nearly information-theoretically optimal recovery when stop appropriately. We then propose an efficient early stop** strategy based on the common hold-out method and show that it detects nearly optimal estimator provably. Moreover, experiments show that the proposed validation approach can also be efficiently used for image restoration with deep image prior which over-parameterizes an image with a deep network. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: 29 pages and 9 figures

arXiv:2208.05122 [pdf, other]

Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech

Authors: Kaitao Song, Teng Wan, Bixia Wang, Huiqiang Jiang, Luna Qiu, Jiahang Xu, Li** Jiang, Qun Lou, Yuqing Yang, Dongsheng Li, Xudong Wang, Lili Qiu

Abstract: Hypernasality is an abnormal resonance in human speech production, especially in patients with craniofacial anomalies such as cleft palate. In clinical application, hypernasality estimation is crucial in cleft palate diagnosis, as its results determine the subsequent surgery and additional speech therapy. Therefore, designing an automatic hypernasality assessment method will facilitate speech-lang… ▽ More Hypernasality is an abnormal resonance in human speech production, especially in patients with craniofacial anomalies such as cleft palate. In clinical application, hypernasality estimation is crucial in cleft palate diagnosis, as its results determine the subsequent surgery and additional speech therapy. Therefore, designing an automatic hypernasality assessment method will facilitate speech-language pathologists to make precise diagnoses. Existing methods for hypernasality estimation only conduct acoustic analysis based on low-resource cleft palate dataset, by using statistical or neural network-based features. In this paper, we propose a novel approach that uses automatic speech recognition model to improve hypernasality estimation. Specifically, we first pre-train an encoder-decoder framework in an automatic speech recognition (ASR) objective by using speech-to-text dataset, and then fine-tune ASR encoder on the cleft palate dataset for hypernasality estimation. Benefiting from such design, our model for hypernasality estimation can enjoy the advantages of ASR model: 1) compared with low-resource cleft palate dataset, the ASR task usually includes large-scale speech data in the general domain, which enables better model generalization; 2) the text annotations in ASR dataset guide model to extract better acoustic features. Experimental results on two cleft palate datasets demonstrate that our method achieves superior performance compared with previous approaches. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted by InterSpeech 2022

arXiv:2207.13882 [pdf, other]

SuperVessel: Segmenting High-resolution Vessel from Low-resolution Retinal Image

Authors: Yan Hu, Zhongxi Qiu, Dan Zeng, Li Jiang, Chen Lin, Jiang Liu

Abstract: Vascular segmentation extracts blood vessels from images and serves as the basis for diagnosing various diseases, like ophthalmic diseases. Ophthalmologists often require high-resolution segmentation results for analysis, which leads to super-computational load by most existing methods. If based on low-resolution input, they easily ignore tiny vessels or cause discontinuity of segmented vessels. T… ▽ More Vascular segmentation extracts blood vessels from images and serves as the basis for diagnosing various diseases, like ophthalmic diseases. Ophthalmologists often require high-resolution segmentation results for analysis, which leads to super-computational load by most existing methods. If based on low-resolution input, they easily ignore tiny vessels or cause discontinuity of segmented vessels. To solve these problems, the paper proposes an algorithm named SuperVessel, which gives out high-resolution and accurate vessel segmentation using low-resolution images as input. We first take super-resolution as our auxiliary branch to provide potential high-resolution detail features, which can be deleted in the test phase. Secondly, we propose two modules to enhance the features of the interested segmentation region, including an upsampling with feature decomposition (UFD) module and a feature interaction module (FIM) with a constraining loss to focus on the interested features. Extensive experiments on three publicly available datasets demonstrate that our proposed SuperVessel can segment more tiny vessels with higher segmentation accuracy IoU over 6%, compared with other state-of-the-art algorithms. Besides, the stability of SuperVessel is also stronger than other algorithms. We will release the code after the paper is published. △ Less

Submitted 28 July, 2022; originally announced July 2022.

Comments: Accepted by PRCV2022

arXiv:2206.04948 [pdf, other]

A Holistic Robust Motion Controller Framework for Autonomous Platooning

Authors: Hong Wang, Li-Ming Peng, Zi-Chun Wei, Kai Yang, Xian-Xu Bai, Luo Jiang, Ehsan Hashemi

Abstract: Safety is the foremost concern for autonomous platooning. The vehicle-to-vehicle (V2V) communication delay and the sudden appearance of obstacles will trigger the safety of the intended functionality (SOTIF) issues for autonomous platooning. This research proposes a holistic robust motion controller framework (MCF) for an intelligent and connected vehicle platoon system. The MCF utilizes a hierarc… ▽ More Safety is the foremost concern for autonomous platooning. The vehicle-to-vehicle (V2V) communication delay and the sudden appearance of obstacles will trigger the safety of the intended functionality (SOTIF) issues for autonomous platooning. This research proposes a holistic robust motion controller framework (MCF) for an intelligent and connected vehicle platoon system. The MCF utilizes a hierarchical structure to resolve the longitudinal string stability and the lateral control problem under the complex driving environment and time-varying communication delay. Firstly, the H-infinity feedback controller is developed to ensure the robustness of the platoon under time-varying communication delay in the upper-level coordination layer (UCL). The output from UCL will be delivered to the lower-level motion-planning layer (LML) as reference signals. Secondly, the model predictive control (MPC) algorithm is implemented in the LML to achieve multi-objective control, which comprehensively considers the reference signals, the artificial potential field, and multiple vehicle dynamics constraints. Furthermore, three critical scenarios are co-simulated for case studies, including platooning under time-varying communication delay, merging, and obstacle avoidance scenarios. The simulation results indicate that, compared with single-structure MPC, the proposed MCF can offer a better suppression on position error propagation, and get improvements on maximum position error in the three scenarios by $19.2\%$, $59.8\%$, and $15.3\%$, respectively. Last, the practicability and effectiveness of the proposed MCF are verified via hardware-in-the-loop experiment. The average conducting time of the proposed method on Speedgoat real-time target machine is 1.1 milliseconds, which meets the real-time requirements. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: 13 pages, 20 figures

arXiv:2205.14523 [pdf, other]

Risk of Stochastic Systems for Temporal Logic Specifications

Authors: Lars Lindemann, Lejun Jiang, Nikolai Matni, George J. Pappas

Abstract: The wide availability of data coupled with the computational advances in artificial intelligence and machine learning promise to enable many future technologies such as autonomous driving. While there has been a variety of successful demonstrations of these technologies, critical system failures have repeatedly been reported. Even if rare, such system failures pose a serious barrier to adoption wi… ▽ More The wide availability of data coupled with the computational advances in artificial intelligence and machine learning promise to enable many future technologies such as autonomous driving. While there has been a variety of successful demonstrations of these technologies, critical system failures have repeatedly been reported. Even if rare, such system failures pose a serious barrier to adoption without a rigorous risk assessment. This paper presents a framework for the systematic and rigorous risk verification of systems. We consider a wide range of system specifications formulated in signal temporal logic (STL) and model the system as a stochastic process, permitting discrete-time and continuous-time stochastic processes. We then define the STL robustness risk as the risk of lacking robustness against failure. This definition is motivated as system failures are often caused by missing robustness to modeling errors, system disturbances, and distribution shifts in the underlying data generating process. Within the definition, we permit general classes of risk measures and focus on tail risk measures such as the value-at-risk and the conditional value-at-risk. While the STL robustness risk is in general hard to compute, we propose the approximate STL robustness risk as a more tractable notion that upper bounds the STL robustness risk. We show how the approximate STL robustness risk can accurately be estimated from system trajectory data. For discrete-time stochastic processes, we show under which conditions the approximate STL robustness risk can even be computed exactly. We illustrate our verification algorithm in the autonomous driving simulator CARLA and show how a least risky controller can be selected among four neural network lane kee** controllers for five meaningful system specifications. △ Less

Submitted 8 October, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

arXiv:2205.01550 [pdf, other]

Point Cloud Semantic Segmentation using Multi Scale Sparse Convolution Neural Network

Authors: Yunzheng Su, Lei Jiang, Jie Cao

Abstract: In recent years, with the development of computing resources and LiDAR, point cloud semantic segmentation has attracted many researchers. For the sparsity of point clouds, although there is already a way to deal with sparse convolution, multi-scale features are not considered. In this letter, we propose a feature extraction module based on multi-scale sparse convolution and a feature selection mod… ▽ More In recent years, with the development of computing resources and LiDAR, point cloud semantic segmentation has attracted many researchers. For the sparsity of point clouds, although there is already a way to deal with sparse convolution, multi-scale features are not considered. In this letter, we propose a feature extraction module based on multi-scale sparse convolution and a feature selection module based on channel attention and build a point cloud segmentation network framework based on this. By introducing multi-scale sparse convolution, the network could capture richer feature information based on convolution kernels with different sizes, improving the segmentation result of point cloud segmentation. Experimental results on Stanford large-scale 3-D Indoor Spaces(S3DIS) dataset and outdoor dataset(SemanticKITTI), demonstrate effectiveness and superiority of the proposed mothod. △ Less

Submitted 29 June, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2203.00756 [pdf, other]

Real time spectrogram inversion on mobile phone

Authors: Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy

Abstract: We present two methods of real time magnitude spectrogram inversion: streaming Griffin Lim(GL) and streaming MelGAN. We demonstrate the impact of looking ahead on perceptual quality of MelGAN. As little as one hop size (12.5ms) of lookahead is able to significantly improve perceptual quality in comparison to its causal version. We compare streaming GL with the streaming MelGAN and show different t… ▽ More We present two methods of real time magnitude spectrogram inversion: streaming Griffin Lim(GL) and streaming MelGAN. We demonstrate the impact of looking ahead on perceptual quality of MelGAN. As little as one hop size (12.5ms) of lookahead is able to significantly improve perceptual quality in comparison to its causal version. We compare streaming GL with the streaming MelGAN and show different trade-offs in terms of perceptual quality, on-device latency, algorithmic delay, memory footprint and noise sensitivity. For fair quality assessment of the GL approach, we use input log magnitude spectrogram without mel transformation. We evaluate presented real time spectrogram inversion approaches on clean, noisy and atypical speech. We specified conditions when streaming GL has comparable quality with MelGAN: noisy audio and no mel transformation. Streaming GL is 2.4x faster than real time on the ARM CPU of a Pixel4 and it uses 4.5x times less memory than MelGAN. △ Less

Submitted 24 May, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

arXiv:2111.09971 [pdf, other]

Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations

Authors: Lars Lindemann, Alexander Robey, Lejun Jiang, Satyajeet Das, Stephen Tu, Nikolai Matni

Abstract: This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations. We assume that a model of the system dynamics and a state estimator are available along with corresponding error bounds, e.g., estimated from data in practice. We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety, as defined through control… ▽ More This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations. We assume that a model of the system dynamics and a state estimator are available along with corresponding error bounds, e.g., estimated from data in practice. We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety, as defined through controlled forward invariance of a safe set. We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior, e.g., data collected from a human operator or an expert controller. When the parametrization of the ROCBF is linear, then we show that, under mild assumptions, the optimization problem is convex. Along with the optimization problem, we provide verifiable conditions in terms of the density of the data, smoothness of the system model and state estimator, and the size of the error bounds that guarantee validity of the obtained ROCBF. Towards obtaining a practical control algorithm, we propose an algorithmic implementation of our theoretical framework that accounts for assumptions made in our framework in practice. We validate our algorithm in the autonomous driving simulator CARLA and demonstrate how to learn safe control laws from simulated RGB camera images. △ Less

Submitted 2 April, 2024; v1 submitted 18 November, 2021; originally announced November 2021.

Comments: Journal paper

arXiv:2110.12857 [pdf]

doi 10.1364/AO.450247

Photonics-assisted microwave pulse detection and frequency measurement based on pulse replication and frequency-to-time map**

Authors: Pengcheng Zuo, Dong Ma, Qingbo Liu, Lizhong Jiang, Yang Chen

Abstract: A photonics-assisted microwave pulse detection and frequency measurement scheme is proposed. The unknown microwave pulse is converted to the optical domain and then injected into a fiber loop for pulse replication, which makes it easier to identify the microwave pulse with large pulse repetition interval (PRI), whereas stimulated Brillouin scattering-based frequency-to-time map** (FTTM) is utili… ▽ More A photonics-assisted microwave pulse detection and frequency measurement scheme is proposed. The unknown microwave pulse is converted to the optical domain and then injected into a fiber loop for pulse replication, which makes it easier to identify the microwave pulse with large pulse repetition interval (PRI), whereas stimulated Brillouin scattering-based frequency-to-time map** (FTTM) is utilized to measure the carrier frequency of the microwave pulse. A sweep optical carrier is generated and modulated by the unknown microwave pulse and a continuous-wave single-frequency reference, generating two different frequency sweep optical signals, which are combined and used as the probe wave to detect a fixed Brillouin gain spectrum. When the optical signal is detected in a photodetector, FTTM is realized and the frequency of the microwave pulse can be determined. An experiment is performed. For a fiber loop containing a 210-m fiber, pulse replication and FTTM of the pulses with a PRI of 20 μs and pulse width of 1.20, 1.00, 0.85, and 0.65 μs are realized. Under a certain sweep frequency chirp rate of 0.978 THz/s, the measurement errors are below {\pm}12 and {\pm}5 MHz by using one pair of pulses and multiple pairs of pulses, respectively. The influence of the sweep frequency chirp rate and pulse width on the measurement error has also been studied. To a certain extent, the faster the frequency sweep, the greater the frequency measurement error. For a specific sweep frequency chirp rate, the measurement error is almost unaffected by the pulse width to be measured. △ Less

Submitted 25 September, 2021; originally announced October 2021.

Comments: 13 pages, 8 figures

arXiv:2109.13322 [pdf, other]

doi 10.1073/pnas.2012982118

Induced transparency: interference or polarization?

Authors: Changqing Wang, Xuefeng Jiang, William R. Sweeney, Chia Wei Hsu, Yiming Liu, Guangming Zhao, Bo Peng, Mengzhen Zhang, Liang Jiang, A. Douglas Stone, Lan Yang

Abstract: The polarization of optical fields is a crucial degree of freedom in the all-optical analogue of electromagnetically induced transparency (EIT). However, the physical origins of EIT and polarization induced phenomena have not been well distinguished, which can lead to confusion in associated applications such as slow light and optical/quantum storage. Here we study the polarization effects in vari… ▽ More The polarization of optical fields is a crucial degree of freedom in the all-optical analogue of electromagnetically induced transparency (EIT). However, the physical origins of EIT and polarization induced phenomena have not been well distinguished, which can lead to confusion in associated applications such as slow light and optical/quantum storage. Here we study the polarization effects in various optical EIT systems. We find that a polarization mismatch between whispering gallery modes in two indirectly coupled resonators can induce a narrow transparency window in the transmission spectrum resembling the EIT lineshape. However, such polarization induced transparency (PIT) is distinct from EIT: it originates from strong polarization rotation effects and shows unidirectional feature. The coexistence of PIT and EIT provides new routes for the manipulation of light flow in optical resonator systems. △ Less

Submitted 27 September, 2021; originally announced September 2021.

Comments: 8 pages, 4 figures, 57 references. The published version can be found via ULR: https://www.pnas.org/content/118/3/e2012982118

Journal ref: Proceedings of the National Academy of Sciences Vol. 118 No. 3 e2012982118 (19 Jan 2021)

arXiv:2107.04589 [pdf, other]

ViTGAN: Training GANs with Vision Transformers

Authors: Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

Abstract: Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such performance can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). For ViT discriminators, we observe that existing regularization methods f… ▽ More Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such performance can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). For ViT discriminators, we observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce several novel regularization techniques for training GANs with ViTs. For ViT generators, we examine architectural choices for latent and pixel map** layers to facilitate convergence. Empirically, our approach, named ViTGAN, achieves comparable performance to the leading CNN-based GAN models on three datasets: CIFAR-10, CelebA, and LSUN bedroom. △ Less

Submitted 29 May, 2024; v1 submitted 9 July, 2021; originally announced July 2021.

Comments: Accepted to ICLR 2022 (Spotlight)

arXiv:2106.11172 [pdf]

doi 10.1109/JLT.2021.3101312

Multi-functional microwave photonic radar system for simultaneous distance and velocity measurement and high-resolution microwave imaging

Authors: Dingding Liang, Lizhong Jiang, Yang Chen

Abstract: A photonic-assisted multi-functional radar system for simultaneous distance and velocity measurement and high-resolution microwave imaging is proposed and experimentally demonstrated by using a composite transmitted microwave signal of a single-chirped linearly frequency-modulated (LFM) signal and a single-tone microwave signal. In the system, the transmitted signal is generated via photonic frequ… ▽ More A photonic-assisted multi-functional radar system for simultaneous distance and velocity measurement and high-resolution microwave imaging is proposed and experimentally demonstrated by using a composite transmitted microwave signal of a single-chirped linearly frequency-modulated (LFM) signal and a single-tone microwave signal. In the system, the transmitted signal is generated via photonic frequency up-conversion based on a single integrated dual-polarization dual-parallel Mach-Zehnder modulator (DPol-DPMZM), whereas the echo signals scattered from the target are de-chirped to two low-frequency signals using a microwave photonic frequency mixer. By using the two low-frequency de-chirped signals, the real-time distance and radial velocity of the moving target can be measured accurately according to the round-trip time of the echo signal and its Doppler frequency shift. Compared with the previous reported distance and velocity measurement methods, where two LFM signals with opposite chirps are used, these parameters can be obtained using only a single-chirped LFM signal and a single-tone microwave signal. Meanwhile, high-resolution inverse synthetic aperture radar (ISAR) imaging can also be realized using ISAR imaging algorithms. An experiment is performed to verify the proposed multi-functional microwave photonic radar system. An up-chirped LFM signal from 8.5 to 12.5 GHz and an 8.0 GHz single-tone microwave signal are used as the transmitted signal. The results show that the absolute measurement errors of distance and radial velocity are less than 5.9 cm and 2.8 cm/s, respectively. ISAR imaging results are also demonstrated, which proves the high-resolution and real-time ISAR imaging ability of the proposed system. △ Less

Submitted 27 May, 2021; originally announced June 2021.

Comments: 16 pages, 9 figures

arXiv:2105.05701 [pdf, other]

Deep Multi-agent Reinforcement Learning for Highway On-Ramp Merging in Mixed Traffic

Authors: Dong Chen, Mohammad Hajidavalloo, Zhaojian Li, Kaian Chen, Yongqiang Wang, Longsheng Jiang, Yue Wang

Abstract: On-ramp merging is a challenging task for autonomous vehicles (AVs), especially in mixed traffic where AVs coexist with human-driven vehicles (HDVs). In this paper, we formulate the mixed-traffic highway on-ramp merging problem as a multi-agent reinforcement learning (MARL) problem, where the AVs (on both merge lane and through lane) collaboratively learn a policy to adapt to HDVs to maximize the… ▽ More On-ramp merging is a challenging task for autonomous vehicles (AVs), especially in mixed traffic where AVs coexist with human-driven vehicles (HDVs). In this paper, we formulate the mixed-traffic highway on-ramp merging problem as a multi-agent reinforcement learning (MARL) problem, where the AVs (on both merge lane and through lane) collaboratively learn a policy to adapt to HDVs to maximize the traffic throughput. We develop an efficient and scalable MARL framework that can be used in dynamic traffic where the communication topology could be time-varying. Parameter sharing and local rewards are exploited to foster inter-agent cooperation while achieving great scalability. An action masking scheme is employed to improve learning efficiency by filtering out invalid/unsafe actions at each step. In addition, a novel priority-based safety supervisor is developed to significantly reduce collision rate and greatly expedite the training process. A gym-like simulation environment is developed and open-sourced with three different levels of traffic densities. We exploit curriculum learning to efficiently learn harder tasks from trained models under simpler settings. Comprehensive experimental results show the proposed MARL framework consistently outperforms several state-of-the-art benchmarks. △ Less

Submitted 5 November, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

Comments: 15 figures

arXiv:2104.10781 [pdf, other]

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

Authors: Ren Yang, Radu Timofte, **g Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay , et al. (47 additional authors not shown)

Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at… ▽ More This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh △ Less

Submitted 31 August, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: Corrected the MOS values in Table 2, and corrected some minor typos

arXiv:2103.14236 [pdf, other]

Subspace-based compressive sensing algorithm for raypath separation in a shallow-water waveguide

Authors: Longyu Jiang, Zhe Zhang, Rui **, Xiao Zhou, Philippe Roux

Abstract: Compressive sensing (CS) has been applied to estimate the direction of arrival (DOA) in underwater acoustics. However, the key problem needed to be resolved in a {multipath} propagation environment is to suppress the interferences between the raypaths. Thus, in this paper, {a subspace-based compressive sensing algorithm that formulates the statistic information of the signal subspace in a CS frame… ▽ More Compressive sensing (CS) has been applied to estimate the direction of arrival (DOA) in underwater acoustics. However, the key problem needed to be resolved in a {multipath} propagation environment is to suppress the interferences between the raypaths. Thus, in this paper, {a subspace-based compressive sensing algorithm that formulates the statistic information of the signal subspace in a CS framework is proposed.} The experiment results show that (1) the proposed algorithm enables the separation of raypaths that arrive closely at the {receiver} array and (2) the existing algorithms fail, especially in a low signal-to-noise ratio (SNR) environment. △ Less

Submitted 25 March, 2021; originally announced March 2021.

arXiv:2012.12821 [pdf, other]

Focal Frequency Loss for Image Reconstruction and Synthesis

Authors: Liming Jiang, Bo Dai, Wayne Wu, Chen Change Loy

Abstract: Image reconstruction and synthesis have witnessed remarkable progress thanks to the development of generative models. Nonetheless, gaps could still exist between the real and generated images, especially in the frequency domain. In this study, we show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency lo… ▽ More Image reconstruction and synthesis have witnessed remarkable progress thanks to the development of generative models. Nonetheless, gaps could still exist between the real and generated images, especially in the frequency domain. In this study, we show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize by down-weighting the easy ones. This objective function is complementary to existing spatial losses, offering great impedance against the loss of important frequency information due to the inherent bias of neural networks. We demonstrate the versatility and effectiveness of focal frequency loss to improve popular models, such as VAE, pix2pix, and SPADE, in both perceptual quality and quantitative performance. We further show its potential on StyleGAN2. △ Less

Submitted 23 August, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

Comments: ICCV 2021. GitHub: https://github.com/EndlessSora/focal-frequency-loss Project page: https://www.mmlab-ntu.com/project/ffl/index.html

arXiv:2012.08698 [pdf, other]

Edge Entropy as an Indicator of the Effectiveness of GNNs over CNNs for Node Classification

Authors: Lavender Yao Jiang, John Shi, Mark Cheung, Oren Wright, José M. F. Moura

Abstract: Graph neural networks (GNNs) extend convolutional neural networks (CNNs) to graph-based data. A question that arises is how much performance improvement does the underlying graph structure in the GNN provide over the CNN (that ignores this graph structure). To address this question, we introduce edge entropy and evaluate how good an indicator it is for possible performance improvement of GNNs over… ▽ More Graph neural networks (GNNs) extend convolutional neural networks (CNNs) to graph-based data. A question that arises is how much performance improvement does the underlying graph structure in the GNN provide over the CNN (that ignores this graph structure). To address this question, we introduce edge entropy and evaluate how good an indicator it is for possible performance improvement of GNNs over CNNs. Our results on node classification with synthetic and real datasets show that lower values of edge entropy predict larger expected performance gains of GNNs over CNNs, and, conversely, higher edge entropy leads to expected smaller improvement gains. △ Less

Submitted 15 December, 2020; originally announced December 2020.

arXiv:2012.06091 [pdf, other]

Single-pixel Tracking and Imaging under Weak Illumination

Authors: Shuai Sun, Hong-Kang Hu, Yao-Kun Xu, Hui-Zu Lin, Er-Feng Zhang, Liang Jiang, Wei-Tao Liu

Abstract: Under weak illumination, tracking and imaging moving object turns out to be hard. By spatially collecting the signal, single pixel imaging schemes promise the capability of image reconstruction from low photon flux. However, due to the requirement on large number of samplings, how to clearly image moving objects is an essential problem for such schemes. Here we present a principle of single pixel… ▽ More Under weak illumination, tracking and imaging moving object turns out to be hard. By spatially collecting the signal, single pixel imaging schemes promise the capability of image reconstruction from low photon flux. However, due to the requirement on large number of samplings, how to clearly image moving objects is an essential problem for such schemes. Here we present a principle of single pixel tracking and imaging method. Velocity vector of the object is obtained from temporal correlation of the bucket signals in a typical computational ghost imaging system. Then the illumination beam is steered accordingly. Taking the velocity into account, both trajectory and clear image of the object are achieved during its evolution. Since tracking is achieved with bucket signals independently, this scheme is valid for capturing moving object as fast as its displacement within the interval of every sampling keeps larger than the resolution of the optical system. Experimentally, our method works well with the average number of detected photons down to 1.88 photons/speckle. △ Less

Submitted 10 December, 2020; originally announced December 2020.

arXiv:2010.12041 [pdf]

doi 10.1016/j.pacs.2021.100266

Deep image prior for undersampling high-speed photoacoustic microscopy

Authors: Tri Vu, Anthony DiSpirito III, Daiwei Li, Zixuan Zhang, Xiaoyi Zhu, Maomao Chen, Laiming Jiang, Dong Zhang, Jianwen Luo, Yu Shrike Zhang, Qifa Zhou, Roarke Horstmeyer, Junjie Yao

Abstract: Photoacoustic microscopy (PAM) is an emerging imaging method combining light and sound. However, limited by the laser's repetition rate, state-of-the-art high-speed PAM technology often sacrifices spatial sampling density (i.e., undersampling) for increased imaging speed over a large field-of-view. Deep learning (DL) methods have recently been used to improve sparsely sampled PAM images; however,… ▽ More Photoacoustic microscopy (PAM) is an emerging imaging method combining light and sound. However, limited by the laser's repetition rate, state-of-the-art high-speed PAM technology often sacrifices spatial sampling density (i.e., undersampling) for increased imaging speed over a large field-of-view. Deep learning (DL) methods have recently been used to improve sparsely sampled PAM images; however, these methods often require time-consuming pre-training and large training dataset with ground truth. Here, we propose the use of deep image prior (DIP) to improve the image quality of undersampled PAM images. Unlike other DL approaches, DIP requires neither pre-training nor fully-sampled ground truth, enabling its flexible and fast implementation on various imaging targets. Our results have demonstrated substantial improvement in PAM images with as few as 1.4$\%$ of the fully sampled pixels on high-speed PAM. Our approach outperforms interpolation, is competitive with pre-trained supervised DL method, and is readily translated to other high-speed, undersampling imaging modalities. △ Less

Submitted 7 April, 2021; v1 submitted 15 October, 2020; originally announced October 2020.

arXiv:2008.01247 [pdf, other]

doi 10.1109/MSP.2020.3014594

Graph Signal Processing and Deep Learning: Convolution, Pooling, and Topology

Authors: Mark Cheung, John Shi, Oren Wright, Lavender Y. Jiang, Xu** Liu, José M. F. Moura

Abstract: Deep learning, particularly convolutional neural networks (CNNs), have yielded rapid, significant improvements in computer vision and related domains. But conventional deep learning architectures perform poorly when data have an underlying graph structure, as in social, biological, and many other domains. This paper explores 1)how graph signal processing (GSP) can be used to extend CNN components… ▽ More Deep learning, particularly convolutional neural networks (CNNs), have yielded rapid, significant improvements in computer vision and related domains. But conventional deep learning architectures perform poorly when data have an underlying graph structure, as in social, biological, and many other domains. This paper explores 1)how graph signal processing (GSP) can be used to extend CNN components to graphs in order to improve model performance; and 2)how to design the graph CNN architecture based on the topology or structure of the data graph. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: To be published on IEEE Signal Processing Magazine

arXiv:2007.12072 [pdf, other]

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

Authors: Liming Jiang, Changxu Zhang, Mingyang Huang, Chunxiao Liu, Jian** Shi, Chen Change Loy

Abstract: We introduce a simple and versatile framework for image-to-image translation. We unearth the importance of normalization layers, and provide a carefully designed two-stream generative model with newly proposed feature transformations in a coarse-to-fine fashion. This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network, perm… ▽ More We introduce a simple and versatile framework for image-to-image translation. We unearth the importance of normalization layers, and provide a carefully designed two-stream generative model with newly proposed feature transformations in a coarse-to-fine fashion. This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network, permitting our method to scale to various tasks in both unsupervised and supervised settings. No additional constraints (e.g., cycle consistency) are needed, contributing to a very clean and simple method. Multi-modal image synthesis with arbitrary style control is made possible. A systematic study compares the proposed method with several state-of-the-art task-specific baselines, verifying its effectiveness in both perceptual quality and quantitative evaluations. △ Less

Submitted 25 July, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: ECCV 2020 (Spotlight). Table 2 is updated. GitHub: https://github.com/EndlessSora/TSIT

arXiv:2007.00322 [pdf, other]

Kernel Learning for High-Resolution Time-Frequency Distribution

Authors: Lei Jiang, Haijian Zhang, Lei Yu, Guang Hua

Abstract: The design of high-resolution and cross-term (CT) free time-frequency distributions (TFDs) has been an open problem. Classical kernel based methods are limited by the trade-off between TFD resolution and CT suppression, even under optimally derived parameters. To break the current limitation, we propose a data-driven kernel learning model directly based on Wigner-Ville distribution (WVD). The prop… ▽ More The design of high-resolution and cross-term (CT) free time-frequency distributions (TFDs) has been an open problem. Classical kernel based methods are limited by the trade-off between TFD resolution and CT suppression, even under optimally derived parameters. To break the current limitation, we propose a data-driven kernel learning model directly based on Wigner-Ville distribution (WVD). The proposed kernel learning based TFD (KL-TFD) model includes several stacked multi-channel learning convolutional kernels. Specifically, a skip** operator is utilized to maintain correct information transmission, and a weighted block is employed to exploit spatial and channel dependencies. These two designs simultaneously achieve high TFD resolution and CT elimination. Numerical experiments on both synthetic and real-world data confirm the superiority of the proposed KL-TFD over traditional kernel function methods. △ Less

Submitted 16 July, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

arXiv:2005.08245 [pdf]

Dampen the Stop-and-Go Traffic with Connected and Automated Vehicles -- A Deep Reinforcement Learning Approach

Authors: Liming Jiang, Yuanchang Xie, Danjue Chen, Tienan Li, Nicholas G. Evans

Abstract: Stop-and-go traffic poses many challenges to tranportation system, but its formation and mechanism are still under exploration.however, it has been proved that by introducing Connected Automated Vehicles(CAVs) with carefully designed controllers one could dampen the stop-and-go waves in the vehicle fleet. Instead of using analytical model, this study adopts reinforcement learning to control the be… ▽ More Stop-and-go traffic poses many challenges to tranportation system, but its formation and mechanism are still under exploration.however, it has been proved that by introducing Connected Automated Vehicles(CAVs) with carefully designed controllers one could dampen the stop-and-go waves in the vehicle fleet. Instead of using analytical model, this study adopts reinforcement learning to control the behavior of CAV and put a single CAV at the 2nd position of a vehicle fleet with the purpose to dampen the speed oscillation from the fleet leader and help following human drivers adopt more smooth driving behavior. The result show that our controller could decrease the spped oscillation of the CAV by 54% and 8%-28% for those following human-driven vehicles. Significant fuel consumption savings are also observed. Additionally, the result suggest that CAVs may act as a traffic stabilizer if they choose to behave slightly altruistically. △ Less

Submitted 17 May, 2020; originally announced May 2020.

arXiv:2004.14820 [pdf, other]

Robust Time-Frequency Reconstruction by Learning Structured Sparsity

Authors: Lei Jiang, Haijian Zhang, Lei Yu

Abstract: Time-frequency distributions (TFDs) play a vital role in providing descriptive analysis of non-stationary signals involved in realistic scenarios. It is well known that low time-frequency (TF) resolution and the emergency of cross-terms (CTs) are two main issues, which make it difficult to analyze and interpret practical signals using TFDs. In order to address these issues, we propose the U-Net ai… ▽ More Time-frequency distributions (TFDs) play a vital role in providing descriptive analysis of non-stationary signals involved in realistic scenarios. It is well known that low time-frequency (TF) resolution and the emergency of cross-terms (CTs) are two main issues, which make it difficult to analyze and interpret practical signals using TFDs. In order to address these issues, we propose the U-Net aided iterative shrinkage-thresholding algorithm (U-ISTA) for reconstructing a near-ideal TFD by exploiting structured sparsity in signal TF domain. Specifically, the signal ambiguity function is firstly compressed, followed by unfolding the ISTA as a recurrent neural network. To consider continuously distributed characteristics of signals, a structured sparsity constraint is incorporated into the unfolded ISTA by regarding the U-Net as an adaptive threshold block, in which structure-aware thresholds are learned from enormous training data to exploit the underlying dependencies among neighboring TF coefficients. The proposed U-ISTA model is trained by both non-overlapped and overlapped synthetic signals including closely and far located non-stationary components. Experimental results demonstrate that the robust U-ISTA achieves superior performance compared with state-of-the-art algorithms, and gains a high TF resolution with CTs greatly eliminated even in low signal-to-noise ratio (SNR) environments. △ Less

Submitted 30 April, 2020; originally announced April 2020.

arXiv:2004.03519 [pdf, other]

Pooling in Graph Convolutional Neural Networks

Authors: Mark Cheung, John Shi, Lavender Yao Jiang, Oren Wright, José M. F. Moura

Abstract: Graph convolutional neural networks (GCNNs) are a powerful extension of deep learning techniques to graph-structured data problems. We empirically evaluate several pooling methods for GCNNs, and combinations of those graph pooling methods with three different architectures: GCN, TAGCN, and GraphSAGE. We confirm that graph pooling, especially DiffPool, improves classification accuracy on popular gr… ▽ More Graph convolutional neural networks (GCNNs) are a powerful extension of deep learning techniques to graph-structured data problems. We empirically evaluate several pooling methods for GCNNs, and combinations of those graph pooling methods with three different architectures: GCN, TAGCN, and GraphSAGE. We confirm that graph pooling, especially DiffPool, improves classification accuracy on popular graph classification datasets and find that, on average, TAGCN achieves comparable or better accuracy than GCN and GraphSAGE, particularly for datasets with larger and sparser graph structures. △ Less

Submitted 7 April, 2020; originally announced April 2020.

Comments: 5 pages, 2 figures, 2019 Asilomar Conference paper

arXiv:2002.08587 [pdf, other]

Cross-stained Segmentation from Renal Biopsy Images Using Multi-level Adversarial Learning

Authors: Ke Mei, Chuang Zhu, Lei Jiang, Jun Liu, Yuanyuan Qiao

Abstract: Segmentation from renal pathological images is a key step in automatic analyzing the renal histological characteristics. However, the performance of models varies significantly in different types of stained datasets due to the appearance variations. In this paper, we design a robust and flexible model for cross-stained segmentation. It is a novel multi-level deep adversarial network architecture t… ▽ More Segmentation from renal pathological images is a key step in automatic analyzing the renal histological characteristics. However, the performance of models varies significantly in different types of stained datasets due to the appearance variations. In this paper, we design a robust and flexible model for cross-stained segmentation. It is a novel multi-level deep adversarial network architecture that consists of three sub-networks: (i) a segmentation network; (ii) a pair of multi-level mirrored discriminators for guiding the segmentation network to extract domain-invariant features; (iii) a shape discriminator that is utilized to further identify the output of the segmentation network and the ground truth. Experimental results on glomeruli segmentation from renal biopsy images indicate that our network is able to improve segmentation performance on target type of stained images and use unlabeled data to achieve similar accuracy to labeled data. In addition, this method can be easily applied to other tasks. △ Less

Submitted 20 February, 2020; originally announced February 2020.

Comments: Accepted by ICASSP2020

arXiv:2002.00179 [pdf]

AdvJND: Generating Adversarial Examples with Just Noticeable Difference

Authors: Zifei Zhang, Kai Qiao, Lingyun Jiang, Linyuan Wang, Bin Yan

Abstract: Compared with traditional machine learning models, deep neural networks perform better, especially in image classification tasks. However, they are vulnerable to adversarial examples. Adding small perturbations on examples causes a good-performance model to misclassify the crafted examples, without category differences in the human eyes, and fools deep models successfully. There are two requiremen… ▽ More Compared with traditional machine learning models, deep neural networks perform better, especially in image classification tasks. However, they are vulnerable to adversarial examples. Adding small perturbations on examples causes a good-performance model to misclassify the crafted examples, without category differences in the human eyes, and fools deep models successfully. There are two requirements for generating adversarial examples: the attack success rate and image fidelity metrics. Generally, perturbations are increased to ensure the adversarial examples' high attack success rate; however, the adversarial examples obtained have poor concealment. To alleviate the tradeoff between the attack success rate and image fidelity, we propose a method named AdvJND, adding visual model coefficients, just noticeable difference coefficients, in the constraint of a distortion function when generating adversarial examples. In fact, the visual subjective feeling of the human eyes is added as a priori information, which decides the distribution of perturbations, to improve the image quality of adversarial examples. We tested our method on the FashionMNIST, CIFAR10, and MiniImageNet datasets. Adversarial examples generated by our AdvJND algorithm yield gradient distributions that are similar to those of the original inputs. Hence, the crafted noise can be hidden in the original inputs, thus improving the attack concealment significantly. △ Less

Submitted 23 June, 2020; v1 submitted 1 February, 2020; originally announced February 2020.

arXiv:2001.11954 [pdf, other]

MindReading: An Ultra-Low-Power Photonic Accelerator for EEG-based Human Intention Recognition

Authors: Qian Lou, Wenyang Liu, Weichen Liu, Feng Guo, Lei Jiang

Abstract: A scalp-recording electroencephalography (EEG)-based brain-computer interface (BCI) system can greatly improve the quality of life for people who suffer from motor disabilities. Deep neural networks consisting of multiple convolutional, LSTM and fully-connected layers are created to decode EEG signals to maximize the human intention recognition accuracy. However, prior FPGA, ASIC, ReRAM and photon… ▽ More A scalp-recording electroencephalography (EEG)-based brain-computer interface (BCI) system can greatly improve the quality of life for people who suffer from motor disabilities. Deep neural networks consisting of multiple convolutional, LSTM and fully-connected layers are created to decode EEG signals to maximize the human intention recognition accuracy. However, prior FPGA, ASIC, ReRAM and photonic accelerators cannot maintain sufficient battery lifetime when processing real-time intention recognition. In this paper, we propose an ultra-low-power photonic accelerator, MindReading, for human intention recognition by only low bit-width addition and shift operations. Compared to prior neural network accelerators, to maintain the real-time processing throughput, MindReading reduces the power consumption by 62.7\% and improves the throughput per Watt by 168\%. △ Less

Submitted 30 January, 2020; originally announced January 2020.

Comments: 6 pages, 8 figures

arXiv:2001.08581 [pdf]

Cooperative Highway Work Zone Merge Control based on Reinforcement Learning in A Connected and Automated Environment

Authors: Tianzhu Ren, Yuanchang Xie, Liming Jiang

Abstract: Given the aging infrastructure and the anticipated growing number of highway work zones in the United States, it is important to investigate work zone merge control, which is critical for improving work zone safety and capacity. This paper proposes and evaluates a novel highway work zone merge control strategy based on cooperative driving behavior enabled by artificial intelligence. The proposed m… ▽ More Given the aging infrastructure and the anticipated growing number of highway work zones in the United States, it is important to investigate work zone merge control, which is critical for improving work zone safety and capacity. This paper proposes and evaluates a novel highway work zone merge control strategy based on cooperative driving behavior enabled by artificial intelligence. The proposed method assumes that all vehicles are fully automated, connected and cooperative. It inserts two metering zones in the open lane to make space for merging vehicles in the closed lane. In addition, each vehicle in the closed lane learns how to optimally adjust its longitudinal position to find a safe gap in the open lane using an off-policy soft actor critic (SAC) reinforcement learning (RL) algorithm, considering the traffic conditions in its surrounding. The learning results are captured in convolutional neural networks and used to control individual vehicles in the testing phase. By adding the metering zones and taking the locations, speeds, and accelerations of surrounding vehicles into account, cooperation among vehicles is implicitly considered. This RL-based model is trained and evaluated using a microscopic traffic simulator. The results show that this cooperative RL-based merge control significantly outperforms popular strategies such as late merge and early merge in terms of both mobility and safety measures. △ Less

Submitted 21 January, 2020; originally announced January 2020.

Comments: 17pages, 6 figures, TRB 2020

arXiv:2001.03257 [pdf]

A Deep Neural Networks Approach for Pixel-Level Runway Pavement Crack Segmentation Using Drone-Captured Images

Authors: Liming Jiang, Yuanchang Xie, Tianzhu Ren

Abstract: Pavement conditions are a critical aspect of asset management and directly affect safety. This study introduces a deep neural network method called U-Net for pavement crack segmentation based on drone-captured images to reduce the cost and time needed for airport runway inspection. The proposed approach can also be used for highway pavement conditions assessment during off-peak periods when there… ▽ More Pavement conditions are a critical aspect of asset management and directly affect safety. This study introduces a deep neural network method called U-Net for pavement crack segmentation based on drone-captured images to reduce the cost and time needed for airport runway inspection. The proposed approach can also be used for highway pavement conditions assessment during off-peak periods when there are few vehicles on the road. In this study, runway pavement images are collected using drone at various heights from the Fitchburg Municipal Airport (FMA) in Massachusetts to evaluate their quality and applicability for crack segmentation, from which an optimal height is determined. Drone images captured at the optimal height are then used to evaluate the crack segmentation performance of the U-Net model. Deep learning methods typically require a huge set of annotated training datasets for model development, which can be a major obstacle for their applications. An online annotated pavement image dataset is used together with the FMA data to train the U-Net model. The results show that U-Net performs well on the FMA testing data even with limited FMA training images, suggesting that it has good generalization ability and great potential to be used for both airport runways and highway pavements. △ Less

Submitted 9 January, 2020; originally announced January 2020.

Comments: 13 pages, 5 figures

arXiv:1912.13192 [pdf, other]

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

Authors: Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jian** Shi, Xiaogang Wang, Hongsheng Li

Abstract: We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds. Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction to learn more discriminative point cloud features. It takes advantages of efficient learning and high-quality proposals of… ▽ More We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds. Our proposed method deeply integrates both 3D voxel Convolutional Neural Network (CNN) and PointNet-based set abstraction to learn more discriminative point cloud features. It takes advantages of efficient learning and high-quality proposals of the 3D voxel CNN and the flexible receptive fields of the PointNet-based networks. Specifically, the proposed framework summarizes the 3D scene with a 3D voxel CNN into a small set of keypoints via a novel voxel set abstraction module to save follow-up computations and also to encode representative scene features. Given the high-quality 3D proposals generated by the voxel CNN, the RoI-grid pooling is proposed to abstract proposal-specific features from the keypoints to the RoI-grid points via keypoint set abstraction with multiple receptive fields. Compared with conventional pooling operations, the RoI-grid feature points encode much richer context information for accurately estimating object confidences and locations. Extensive experiments on both the KITTI dataset and the Waymo Open dataset show that our proposed PV-RCNN surpasses state-of-the-art 3D detection methods with remarkable margins by using only point clouds. Code is available at https://github.com/open-mmlab/OpenPCDet. △ Less

Submitted 9 April, 2021; v1 submitted 31 December, 2019; originally announced December 2019.

Comments: Accepted by CVPR 2020. arXiv admin note: substantial text overlap with arXiv:2102.00463

arXiv:1912.09859 [pdf, ps, other]

Lightweight and Unobtrusive Data Obfuscation at IoT Edge for Remote Inference

Authors: Dixing Xu, Mengyao Zheng, Linshan Jiang, Chaojie Gu, Rui Tan, Peng Cheng

Abstract: Executing deep neural networks for inference on the server-class or cloud backend based on data generated at the edge of Internet of Things is desirable due primarily to the limited compute power of edge devices and the need to protect the confidentiality of the inference neural networks. However, such a remote inference scheme incurs concerns regarding the privacy of the inference data transmitte… ▽ More Executing deep neural networks for inference on the server-class or cloud backend based on data generated at the edge of Internet of Things is desirable due primarily to the limited compute power of edge devices and the need to protect the confidentiality of the inference neural networks. However, such a remote inference scheme incurs concerns regarding the privacy of the inference data transmitted by the edge devices to the curious backend. This paper presents a lightweight and unobtrusive approach to obfuscate the inference data at the edge devices. It is lightweight in that the edge device only needs to execute a small-scale neural network; it is unobtrusive in that the edge device does not need to indicate whether obfuscation is applied. Extensive evaluation by three case studies of free spoken digit recognition, handwritten digit recognition, and American sign language recognition shows that our approach effectively protects the confidentiality of the raw forms of the inference data while effectively preserving the backend's inference accuracy. △ Less

Submitted 25 March, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

Comments: This paper has been accepted by IEEE Internet of Things Journal, Special Issue on Artificial Intelligence Powered Edge Computing for Internet of Things

arXiv:1912.04979 [pdf, other]

Advances in Online Audio-Visual Meeting Transcription

Authors: Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao , et al. (1 additional authors not shown)

Abstract: This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved problem in realistic settings for over a decade. We show that this problem can be addressed by using a continuous speech separation approach. In addition, we desc… ▽ More This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved problem in realistic settings for over a decade. We show that this problem can be addressed by using a continuous speech separation approach. In addition, we describe an online audio-visual speaker diarization method that leverages face tracking and identification, sound source localization, speaker identification, and, if available, prior speaker information for robustness to various real world challenges. All components are integrated in a meeting transcription framework called SRD, which stands for "separate, recognize, and diarize". Experimental results using recordings of natural meetings involving up to 11 attendees are reported. The continuous speech separation improves a word error rate (WER) by 16.1% compared with a highly tuned beamformer. When a complete list of meeting attendees is available, the discrepancy between WER and speaker-attributed WER is only 1.0%, indicating accurate word-to-speaker association. This increases marginally to 1.6% when 50% of the attendees are unknown to the system. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: To appear in Proc. IEEE ASRU Workshop 2019

Showing 1–50 of 58 results for author: Jiang, L