Search | arXiv e-print repository

Lightening Anything in Medical Images

Authors: Ben Fei, Yixuan Li, Weidong Yang, Hengjun Gao, **gyi Xu, Lipeng Ma, Yatian Yang, **hong Zhou

Abstract: The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous… ▽ More The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous paired images, presenting challenges in data collection and training costs, all while lacking the ability to generalize effectively. Here, we introduce a pioneering training-free Diffusion Model for Universal Medical Image Enhancement, named UniMIE. UniMIE demonstrates its unsupervised enhancement capabilities across various medical image modalities without the need for any fine-tuning. It accomplishes this by relying solely on a single pre-trained model from ImageNet. We conduct a comprehensive evaluation on 13 imaging modalities and over 15 medical types, demonstrating better qualities, robustness, and accuracy than other modality-specific and data-inefficient models. By delivering high-quality enhancement and corresponding accuracy downstream tasks across a wide range of tasks, UniMIE exhibits considerable potential to accelerate the advancement of diagnostic tools and customized treatment plans. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: 23 pages, 6 figures

arXiv:2402.19020 [pdf, other]

Unsupervised Learning of High-resolution Light Field Imaging via Beam Splitter-based Hybrid Lenses

Authors: Jianxin Lei, Chengcai Xu, Langqing Shi, Junhui Hou, ** Zhou

Abstract: In this paper, we design a beam splitter-based hybrid light field imaging prototype to record 4D light field image and high-resolution 2D image simultaneously, and make a hybrid light field dataset. The 2D image could be considered as the high-resolution ground truth corresponding to the low-resolution central sub-aperture image of 4D light field image. Subsequently, we propose an unsupervised lea… ▽ More In this paper, we design a beam splitter-based hybrid light field imaging prototype to record 4D light field image and high-resolution 2D image simultaneously, and make a hybrid light field dataset. The 2D image could be considered as the high-resolution ground truth corresponding to the low-resolution central sub-aperture image of 4D light field image. Subsequently, we propose an unsupervised learning-based super-resolution framework with the hybrid light field dataset, which adaptively settles the light field spatial super-resolution problem with a complex degradation model. Specifically, we design two loss functions based on pre-trained models that enable the super-resolution network to learn the detailed features and light field parallax structure with only one ground truth. Extensive experiments demonstrate the same superiority of our approach with supervised learning-based state-of-the-art ones. To our knowledge, it is the first end-to-end unsupervised learning-based spatial super-resolution approach in light field imaging research, whose input is available from our beam splitter-based hybrid light field system. The hardware and software together may help promote the application of light field super-resolution to a great extent. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2401.06788 [pdf, other]

The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

Authors: He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie

Abstract: This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task. In terms of data processing, we leverage the lip motion extractor from the baseline1 to produce… ▽ More This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task. In terms of data processing, we leverage the lip motion extractor from the baseline1 to produce multi-scale video data. Besides, various augmentation techniques are applied during training, encompassing speed perturbation, random rotation, horizontal flip**, and color transformation. The VSR model adopts an end-to-end architecture with joint CTC/attention loss, comprising a ResNet3D visual frontend, an E-Branchformer encoder, and a Transformer decoder. Experiments show that our system achieves 34.76% CER for the Single-Speaker Task and 41.06% CER for the Multi-Speaker Task after multi-system fusion, ranking first place in all three tracks we participate. △ Less

Submitted 29 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

Comments: Included in CNVSRC Workshop 2023, NCMMSC 2023

arXiv:2401.03473 [pdf, ps, other]

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

Authors: He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

Abstract: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours… ▽ More To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an absolute improvement of 13.08% and 51.4% compared to our challenge baseline, respectively. △ Less

Submitted 20 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

Comments: Accepted at ICASSP 2024

arXiv:2401.03424 [pdf, other]

doi 10.1109/ICASSP48485.2024.10446769

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

Authors: He Wang, Pengcheng Guo, Pan Zhou, Lei Xie

Abstract: While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness. However, current studies mainly focus on fusing the well-learned modality features, like the output of modality-specific encoders, without considering the… ▽ More While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness. However, current studies mainly focus on fusing the well-learned modality features, like the output of modality-specific encoders, without considering the contextual relationship during the modality feature learning. In this study, we propose a multi-layer cross-attention fusion based AVSR (MLCA-AVSR) approach that promotes representation learning of each modality by fusing them at different levels of audio/visual encoders. Experimental results on the MISP2022-AVSR Challenge dataset show the efficacy of our proposed system, achieving a concatenated minimum permutation character error rate (cpCER) of 30.57% on the Eval set and yielding up to 3.17% relative improvement compared with our previous system which ranked the second place in the challenge. Following the fusion of multiple systems, our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset. △ Less

Submitted 8 April, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

Comments: 5 pages, 3 figures Accepted at ICASSP 2024

arXiv:2312.09760 [pdf, other]

U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

Authors: Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

Abstract: Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabu… ▽ More Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41% compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by ASRU2023

arXiv:2312.09746 [pdf, other]

Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies

Authors: Bingshen Mu, Pengcheng Guo, Dake Guo, Pan Zhou, Wei Chen, Lei Xie

Abstract: Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces challenges in real-world distant scenarios across various array topologies each with multiple recording devices. The focal point of the CHiME-7 Distant ASR task is to devise a unified system capable of generalizing various array topologies that have multiple recording devices and offering reliable recognition perf… ▽ More Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces challenges in real-world distant scenarios across various array topologies each with multiple recording devices. The focal point of the CHiME-7 Distant ASR task is to devise a unified system capable of generalizing various array topologies that have multiple recording devices and offering reliable recognition performance in real-world environments. Addressing this task, we introduce an ASR system that demonstrates exceptional performance across various array topologies. First of all, we propose two attention-based automatic channel selection modules to select the most advantageous subset of multi-channel signals from multiple recording devices for each utterance. Furthermore, we introduce inter-channel spatial features to augment the effectiveness of multi-frame cross-channel attention, aiding it in improving the capability of spatial information awareness. Finally, we propose a multi-layer convolution fusion module drawing inspiration from the U-Net architecture to integrate the multi-channel output into a single-channel output. Experimental results on the CHiME-7 corpus with oracle segmentation demonstrate that the improvements introduced in our proposed ASR system lead to a relative reduction of 40.1% in the Macro Diarization Attributed Word Error Rates (DA-WER) when compared to the baseline ASR system on the Eval sets. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2309.07185 [pdf]

A Health Monitoring System Based on Flexible Triboelectric Sensors for Intelligence Medical Internet of Things and its Applications in Virtual Reality

Authors: Junqi Mao, Puen Zhou, Xiaoyao Wang, Hongbo Yao, Liuyang Liang, Yiqiao Zhao, Jiawei Zhang, Dayan Ban, Haiwu Zheng

Abstract: The Internet of Medical Things (IoMT) is a platform that combines Internet of Things (IoT) technology with medical applications, enabling the realization of precision medicine, intelligent healthcare, and telemedicine in the era of digitalization and intelligence. However, the IoMT faces various challenges, including sustainable power supply, human adaptability of sensors and the intelligence of s… ▽ More The Internet of Medical Things (IoMT) is a platform that combines Internet of Things (IoT) technology with medical applications, enabling the realization of precision medicine, intelligent healthcare, and telemedicine in the era of digitalization and intelligence. However, the IoMT faces various challenges, including sustainable power supply, human adaptability of sensors and the intelligence of sensors. In this study, we designed a robust and intelligent IoMT system through the synergistic integration of flexible wearable triboelectric sensors and deep learning-assisted data analytics. We embedded four triboelectric sensors into a wristband to detect and analyze limb movements in patients suffering from Parkinson's Disease (PD). By further integrating deep learning-assisted data analytics, we actualized an intelligent healthcare monitoring system for the surveillance and interaction of PD patients, which includes location/trajectory tracking, heart monitoring and identity recognition. This innovative approach enabled us to accurately capture and scrutinize the subtle movements and fine motor of PD patients, thus providing insightful feedback and comprehensive assessment of the patients conditions. This monitoring system is cost-effective, easily fabricated, highly sensitive, and intelligent, consequently underscores the immense potential of human body sensing technology in a Health 4.0 society. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2305.01790 [pdf]

Cascaded Logic Gates Based on High-Performance Ambipolar Dual-Gate WSe2 Thin Film Transistors

Authors: Xintong Li, Peng Zhou, Xuan Hu, Ethan Rivers, Kenji Watanabe, Takashi Taniguchi, Deji Akinwande, Joseph S. Friedman, Jean Anne C. Incorvia

Abstract: Ambipolar dual-gate transistors based on two-dimensional (2D) materials, such as graphene, carbon nanotubes, black phosphorus, and certain transition metal dichalcogenides (TMDs), enable reconfigurable logic circuits with suppressed off-state current. These circuits achieve the same logical output as CMOS with fewer transistors and offer greater flexibility in design. The primary challenge lies in… ▽ More Ambipolar dual-gate transistors based on two-dimensional (2D) materials, such as graphene, carbon nanotubes, black phosphorus, and certain transition metal dichalcogenides (TMDs), enable reconfigurable logic circuits with suppressed off-state current. These circuits achieve the same logical output as CMOS with fewer transistors and offer greater flexibility in design. The primary challenge lies in the cascadability and power consumption of these logic gates with static CMOS-like connections. In this article, high-performance ambipolar dual-gate transistors based on tungsten diselenide (WSe2) are fabricated. A high on-off ratio of 10^8 and 10^6, a low off-state current of 100 to 300 fA, a negligible hysteresis, and an ideal subthreshold swing of 62 and 63 mV/dec are measured in the p- and n-type transport, respectively. For the first time, we demonstrate cascadable and cascaded logic gates using ambipolar TMD transistors with minimal static power consumption, including inverters, XOR, NAND, NOR, and buffers made by cascaded inverters. A thorough study of both the control gate and polarity gate behavior is conducted, which has previously been lacking. The noise margin of the logic gates is measured and analyzed. The large noise margin enables the implementation of VT-drop circuits, a type of logic with reduced transistor number and simplified circuit design. Finally, the speed performance of the VT-drop and other circuits built by dual-gate devices are qualitatively analyzed. This work lays the foundation for future developments in the field of ambipolar dual-gate TMD transistors, showing their potential for low-power, high-speed and more flexible logic circuits. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2301.01933 [pdf]

Online Decomposition of Surface Electromyogram into Individual Motor Unit Activities Using Progressive FastICA Peel-off

Authors: Haowen Zhao, Xu Zhang, Maoqi Chen, ** Zhou

Abstract: Surface electromyogram (SEMG) decomposition provides a promising tool for decoding and understanding neural drive information non-invasively. In contrast to previous SEMG decomposition methods mainly developed in offline conditions, there are few studies on online SEMG decomposition. A novel method for online decomposition of SEMG data is presented using the progressive FastICA peel-off (PFP) algo… ▽ More Surface electromyogram (SEMG) decomposition provides a promising tool for decoding and understanding neural drive information non-invasively. In contrast to previous SEMG decomposition methods mainly developed in offline conditions, there are few studies on online SEMG decomposition. A novel method for online decomposition of SEMG data is presented using the progressive FastICA peel-off (PFP) algorithm. The online method consists of an offline prework stage and an online decomposition stage. More specifically, a series of separation vectors are first initialized by the originally offline version of the PFP algorithm from SEMG data recorded in advance. Then they are applied to online SEMG data to extract motor unit spike trains precisely. The performance of the proposed online SEMG decomposition method was evaluated by both simulation and experimental approaches. It achieved an online decomposition accuracy of 98.53% when processing simulated SEMG data. For decomposing experimental SEMG data, the proposed online method was able to extract an average of 12.00 +- 3.46 MUs per trial, with a matching rate of 90.38% compared with results from the expert-guided offline decomposition. Our study provides a valuable way of online decomposition of SEMG data with advanced applications in movement control and health. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: 11 pages, 8 figures, 56 references. Submitted to IEEE Transactions on Biomedical Engineering

arXiv:2207.01287 [pdf, other]

FFCNet: Fourier Transform-Based Frequency Learning and Complex Convolutional Network for Colon Disease Classification

Authors: Kai-Ni Wang, Yuting He, Shuaishuai Zhuang, Juzheng Miao, Xiaopu He, ** Zhou, Guanyu Yang, Guang-Quan Zhou, Shuo Li

Abstract: Reliable automatic classification of colonoscopy images is of great significance in assessing the stage of colonic lesions and formulating appropriate treatment plans. However, it is challenging due to uneven brightness, location variability, inter-class similarity, and intra-class dissimilarity, affecting the classification accuracy. To address the above issues, we propose a Fourier-based Frequen… ▽ More Reliable automatic classification of colonoscopy images is of great significance in assessing the stage of colonic lesions and formulating appropriate treatment plans. However, it is challenging due to uneven brightness, location variability, inter-class similarity, and intra-class dissimilarity, affecting the classification accuracy. To address the above issues, we propose a Fourier-based Frequency Complex Network (FFCNet) for colon disease classification in this study. Specifically, FFCNet is a novel complex network that enables the combination of complex convolutional networks with frequency learning to overcome the loss of phase information caused by real convolution operations. Also, our Fourier transform transfers the average brightness of an image to a point in the spectrum (the DC component), alleviating the effects of uneven brightness by decoupling image content and brightness. Moreover, the image patch scrambling module in FFCNet generates random local spectral blocks, empowering the network to learn long-range and local diseasespecific features and improving the discriminative ability of hard samples. We evaluated the proposed FFCNet on an in-house dataset with 2568 colonoscopy images, showing our method achieves high performance outperforming previous state-of-the art methods with an accuracy of 86:35% and an accuracy of 4.46% higher than the backbone. The project page with code is available at https://github.com/soleilssss/FFCNet. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: Accepted for publication at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention - MICCAI 2022

arXiv:2206.12759 [pdf, other]

Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective

Authors: Qingcheng Zeng, Dading Chong, Peilin Zhou, Jie Yang

Abstract: Accented speech recognition and accent classification are relatively under-explored research areas in speech technology. Recently, deep learning-based methods and Transformer-based pretrained models have achieved superb performances in both areas. However, most accent classification tasks focused on classifying different kinds of English accents and little attention was paid to geographically-prox… ▽ More Accented speech recognition and accent classification are relatively under-explored research areas in speech technology. Recently, deep learning-based methods and Transformer-based pretrained models have achieved superb performances in both areas. However, most accent classification tasks focused on classifying different kinds of English accents and little attention was paid to geographically-proximate accent classification, especially under a low-resource setting where forensic speech science tasks usually encounter. In this paper, we explored three main accent modelling methods combined with two different classifiers based on 105 speaker recordings retrieved from five urban varieties in Northern England. Although speech representations generated from pretrained models generally have better performances in downstream classification, traditional methods like Mel Frequency Cepstral Coefficients (MFCCs) and formant measurements are equipped with specific strengths. These results suggest that in forensic phonetics scenario where data are relatively scarce, a simple modelling method and classifier could be competitive with state-of-the-art pretrained speech models as feature extractors, which could enhance a sooner estimation for the accent information in practices. Besides, our findings also cross-validated a new methodology in quantifying sociophonetic changes. △ Less

Submitted 28 June, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

Comments: INTERSPEECH 2022

arXiv:2205.11008 [pdf, other]

Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection

Authors: Peilin Zhou, Dading Chong, Helin Wang, Qingcheng Zeng

Abstract: The past ten years have witnessed the rapid development of text-based intent detection, whose benchmark performances have already been taken to a remarkable level by deep learning techniques. However, automatic speech recognition (ASR) errors are inevitable in real-world applications due to the environment noise, unique speech patterns and etc, leading to sharp performance drop in state-of-the-art… ▽ More The past ten years have witnessed the rapid development of text-based intent detection, whose benchmark performances have already been taken to a remarkable level by deep learning techniques. However, automatic speech recognition (ASR) errors are inevitable in real-world applications due to the environment noise, unique speech patterns and etc, leading to sharp performance drop in state-of-the-art text-based intent detection models. Essentially, this phenomenon is caused by the semantic drift brought by ASR errors and most existing works tend to focus on designing new model structures to reduce its impact, which is at the expense of versatility and flexibility. Different from previous one-piece model, in this paper, we propose a novel and agile framework called CR-ID for ASR error robust intent detection with two plug-and-play modules, namely semantic drift calibration module (SDCM) and phonemic refinement module (PRM), which are both model-agnostic and thus could be easily integrated to any existing intent detection models without modifying their structures. Experimental results on SNIPS dataset show that, our proposed CR-ID framework achieves competitive performance and outperform all the baseline methods on ASR outputs, which verifies that CR-ID can effectively alleviate the semantic drift caused by ASR errors. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Comments: Submit to INTERSPEECH 2022

arXiv:2205.09987 [pdf, other]

Model Predictive Manipulation of Compliant Objects with Multi-Objective Optimizer and Adversarial Network for Occlusion Compensation

Authors: Jiaming Qi, Dongyu Li, Yufeng Gao, Peng Zhou, David Navarro-Alarcon

Abstract: The robotic manipulation of compliant objects is currently one of the most active problems in robotics due to its potential to automate many important applications. Despite the progress achieved by the robotics community in recent years, the 3D sha** of these types of materials remains an open research problem. In this paper, we propose a new vision-based controller to automatically regulate the… ▽ More The robotic manipulation of compliant objects is currently one of the most active problems in robotics due to its potential to automate many important applications. Despite the progress achieved by the robotics community in recent years, the 3D sha** of these types of materials remains an open research problem. In this paper, we propose a new vision-based controller to automatically regulate the shape of compliant objects with robotic arms. Our method uses an efficient online surface/curve fitting algorithm that quantifies the object's geometry with a compact vector of features; This feedback-like vector enables to establish an explicit shape servo-loop. To coordinate the motion of the robot with the computed shape features, we propose a receding-time estimator that approximates the system's sensorimotor model while satisfying various performance criteria. A deep adversarial network is developed to robustly compensate for visual occlusions in the camera's field of view, which enables to guide the sha** task even with partial observations of the object. Model predictive control is utilized to compute the robot's sha** motions subject to workspace and saturation constraints. A detailed experimental study is presented to validate the effectiveness of the proposed control framework. △ Less

Submitted 20 May, 2022; originally announced May 2022.

arXiv:2204.12768 [pdf, other]

Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training

Authors: Dading Chong, Helin Wang, Peilin Zhou, Qingcheng Zeng

Abstract: Transformer-based models attain excellent results and generalize well when trained on sufficient amounts of data. However, constrained by the limited data available in the audio domain, most transformer-based models for audio tasks are finetuned from pre-trained models in other domains (e.g. image), which has a notable gap with the audio domain. Other methods explore the self-supervised learning a… ▽ More Transformer-based models attain excellent results and generalize well when trained on sufficient amounts of data. However, constrained by the limited data available in the audio domain, most transformer-based models for audio tasks are finetuned from pre-trained models in other domains (e.g. image), which has a notable gap with the audio domain. Other methods explore the self-supervised learning approaches directly in the audio domain but currently do not perform well in the downstream tasks. In this paper, we present a novel self-supervised learning method for transformer-based audio models, called masked spectrogram prediction (MaskSpec), to learn powerful audio representations from unlabeled audio data (AudioSet used in this paper). Our method masks random patches of the input spectrogram and reconstructs the masked regions with an encoder-decoder architecture. Without using extra model weights or supervision, experimental results on multiple downstream datasets demonstrate MaskSpec achieves a significant performance gain against the supervised methods and outperforms the previous pre-trained models. In particular, our best model reaches the performance of 0.471 (mAP) on AudioSet, 0.854 (mAP) on OpenMIC2018, 0.982 (accuracy) on ESC-50, 0.976 (accuracy) on SCV2, and 0.823 (accuracy) on DCASE2019 Task1A respectively. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Comments: Submit to INTERSPEECH 2022

arXiv:2202.09003 [pdf, ps, other]

End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system

Authors: Zhengyi Zhang, Pan Zhou

Abstract: End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model. Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns. In this work, we propose to add a contextual bias attention (CBA) module to attention based encoder deco… ▽ More End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model. Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns. In this work, we propose to add a contextual bias attention (CBA) module to attention based encoder decoder (AED) model to improve its ability of recognizing the contextual phrases. Specifically, CBA utilizes the context vector of source attention in decoder to attend to a specific bias embedding. Jointly learned with the basic AED parameters, CBA can tell the model when and where to bias its output probability distribution. At inference stage, a list of bias phrases is preloaded and we adapt the posterior distributions of both CTC and attention decoder according to the attended bias phrase of CBA. We evaluate the proposed method on GigaSpeech and achieve a consistent relative improvement on recall rate of bias phrases ranging from 15% to 28% compared to the baseline model. Meanwhile, our method shows a strong anti-bias ability as the performance on general tests only degrades 1.7% even 2,000 bias phrases are present. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: 5 pages, 5 tabels, 1 figure

arXiv:2201.09163 [pdf]

Pulmonary Fissure Segmentation in CT Images Based on ODoS Filter and Shape Features

Authors: Yuanyuan Peng, Pengpeng Luan, Hongbin Tu, Xiong Li, ** Zhou

Abstract: Priori knowledge of pulmonary anatomy plays a vital role in diagnosis of lung diseases. In CT images, pulmonary fissure segmentation is a formidable mission due to various of factors. To address the challenge, an useful approach based on ODoS filter and shape features is presented for pulmonary fissure segmentation. Here, we adopt an ODoS filter by merging the orientation information and magnitude… ▽ More Priori knowledge of pulmonary anatomy plays a vital role in diagnosis of lung diseases. In CT images, pulmonary fissure segmentation is a formidable mission due to various of factors. To address the challenge, an useful approach based on ODoS filter and shape features is presented for pulmonary fissure segmentation. Here, we adopt an ODoS filter by merging the orientation information and magnitude information to highlight structure features for fissure enhancement, which can effectively distinguish between pulmonary fissures and clutters. Motivated by the fact that pulmonary fissures appear as linear structures in 2D space and planar structures in 3D space in orientation field, an orientation curvature criterion and an orientation partition scheme are fused to separate fissure patches and other structures in different orientation partition, which can suppress parts of clutters. Considering the shape difference between pulmonary fissures and tubular structures in magnitude field, a shape measure approach and a 3D skeletonization model are combined to segment pulmonary fissures for clutters removal. When applying our scheme to 55 chest CT scans which acquired from a publicly available LOLA11 datasets, the median F1-score, False Discovery Rate (FDR), and False Negative Rate (FNR) respectively are 0.896, 0.109, and 0.100, which indicates that the presented method has a satisfactory pulmonary fissure segmentation performance. △ Less

Submitted 22 January, 2022; originally announced January 2022.

arXiv:2201.05344 [pdf, other]

AWSnet: An Auto-weighted Supervision Attention Network for Myocardial Scar and Edema Segmentation in Multi-sequence Cardiac Magnetic Resonance Images

Authors: Kai-Ni Wang, Xin Yang, Juzheng Miao, Lei Li, **g Yao, ** Zhou, Wufeng Xue, Guang-Quan Zhou, Xiahai Zhuang, Dong Ni

Abstract: Multi-sequence cardiac magnetic resonance (CMR) provides essential pathology information (scar and edema) to diagnose myocardial infarction. However, automatic pathology segmentation can be challenging due to the difficulty of effectively exploring the underlying information from the multi-sequence CMR data. This paper aims to tackle the scar and edema segmentation from multi-sequence CMR with a n… ▽ More Multi-sequence cardiac magnetic resonance (CMR) provides essential pathology information (scar and edema) to diagnose myocardial infarction. However, automatic pathology segmentation can be challenging due to the difficulty of effectively exploring the underlying information from the multi-sequence CMR data. This paper aims to tackle the scar and edema segmentation from multi-sequence CMR with a novel auto-weighted supervision framework, where the interactions among different supervised layers are explored under a task-specific objective using reinforcement learning. Furthermore, we design a coarse-to-fine framework to boost the small myocardial pathology region segmentation with shape prior knowledge. The coarse segmentation model identifies the left ventricle myocardial structure as a shape prior, while the fine segmentation model integrates a pixel-wise attention strategy with an auto-weighted supervision model to learn and extract salient pathological structures from the multi-sequence CMR data. Extensive experimental results on a publicly available dataset from Myocardial pathology segmentation combining multi-sequence CMR (MyoPS 2020) demonstrate our method can achieve promising performance compared with other state-of-the-art methods. Our method is promising in advancing the myocardial pathology assessment on multi-sequence CMR data. To motivate the community, we have made our code publicly available via https://github.com/soleilssss/AWSnet/tree/master. △ Less

Submitted 14 January, 2022; originally announced January 2022.

Comments: 19 pages, 10 figures, accepted by Medical Image Analysis

arXiv:2110.09788 [pdf, other]

CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis

Authors: Peng Zhou, Lingxi Xie, Bingbing Ni, Qi Tian

Abstract: The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses. The recently proposed NeRF-based GANs made great progress towards 3D-aware generators, but they are unable to generate high-quality images yet. This paper presents CIPS-3D, a style-based, 3D-aware generator that is composed o… ▽ More The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses. The recently proposed NeRF-based GANs made great progress towards 3D-aware generators, but they are unable to generate high-quality images yet. This paper presents CIPS-3D, a style-based, 3D-aware generator that is composed of a shallow NeRF network and a deep implicit neural representation (INR) network. The generator synthesizes each pixel value independently without any spatial convolution or upsampling operation. In addition, we diagnose the problem of mirror symmetry that implies a suboptimal solution and solve it by introducing an auxiliary discriminator. Trained on raw, single-view images, CIPS-3D sets new records for 3D-aware image synthesis with an impressive FID of 6.97 for images at the $256\times256$ resolution on FFHQ. We also demonstrate several interesting directions for CIPS-3D such as transfer learning and 3D-aware face stylization. The synthesis results are best viewed as videos, so we recommend the readers to check our github project at https://github.com/PeterouZh/CIPS-3D △ Less

Submitted 19 October, 2021; originally announced October 2021.

Comments: 3D-aware GANs based on NeRF, https://github.com/PeterouZh/CIPS-3D

arXiv:2109.09161 [pdf, other]

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

Authors: Guolin Zheng, Yubei Xiao, Ke Gong, Pan Zhou, Xiaodan Liang, Liang Lin

Abstract: Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply cascade pre-trained acoustic and language models to learn the transfer from speech to text. However, how to solve the representation discrepancy of speech and text i… ▽ More Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply cascade pre-trained acoustic and language models to learn the transfer from speech to text. However, how to solve the representation discrepancy of speech and text is unexplored, which hinders the utilization of acoustic and linguistic information. Moreover, previous works simply replace the embedding layer of the pre-trained language model with the acoustic features, which may cause the catastrophic forgetting problem. In this work, we introduce Wav-BERT, a cooperative acoustic and linguistic representation learning method to fuse and utilize the contextual information of speech and text. Specifically, we unify a pre-trained acoustic model (wav2vec 2.0) and a language model (BERT) into an end-to-end trainable framework. A Representation Aggregation Module is designed to aggregate acoustic and linguistic representation, and an Embedding Attention Module is introduced to incorporate acoustic information into BERT, which can effectively facilitate the cooperation of two pre-trained models and thus boost the representation learning. Extensive experiments show that our Wav-BERT significantly outperforms the existing approaches and achieves state-of-the-art performance on low-resource speech recognition. △ Less

Submitted 9 October, 2021; v1 submitted 19 September, 2021; originally announced September 2021.

arXiv:2108.11763 [pdf, other]

doi 10.1109/PESGM46819.2021.9637992

Attention-based Neural Load Forecasting: A Dynamic Feature Selection Approach

Authors: **g Xiong, Pengyang Zhou, Alan Chen, Yu Zhang

Abstract: Encoder-decoder-based recurrent neural network (RNN) has made significant progress in sequence-to-sequence learning tasks such as machine translation and conversational models. Recent works have shown the advantage of this type of network in dealing with various time series forecasting tasks. The present paper focuses on the problem of multi-horizon short-term load forecasting, which plays a key r… ▽ More Encoder-decoder-based recurrent neural network (RNN) has made significant progress in sequence-to-sequence learning tasks such as machine translation and conversational models. Recent works have shown the advantage of this type of network in dealing with various time series forecasting tasks. The present paper focuses on the problem of multi-horizon short-term load forecasting, which plays a key role in the power system's planning and operation. Leveraging the encoder-decoder RNN, we develop an attention model to select the relevant features and similar temporal information adaptively. First, input features are assigned with different weights by a feature selection attention layer, while the updated historical features are encoded by a bi-directional long short-term memory (BiLSTM) layer. Then, a decoder with hierarchical temporal attention enables a similar day selection, which re-evaluates the importance of historical information at each time step. Numerical results tested on the dataset of the global energy forecasting competition 2014 show that our proposed model significantly outperforms some existing forecasting schemes. △ Less

Submitted 24 August, 2021; originally announced August 2021.

arXiv:2107.05190 [pdf]

doi 10.1117/1.JEI.30.5.053014

Deep-learning-based Hyperspectral imaging through a RGB camera

Authors: Xinyu Gao, Tianlang Wang, **g Yang, **chao Tao, Yanqing Qiu, Yanlong Meng, Banging Mao, Pengwei Zhou, Yi Li

Abstract: Hyperspectral image (HSI) contains both spatial pattern and spectral information which has been widely used in food safety, remote sensing, and medical detection. However, the acquisition of hyperspectral images is usually costly due to the complicated apparatus for the acquisition of optical spectrum. Recently, it has been reported that HSI can be reconstructed from single RGB image using convolu… ▽ More Hyperspectral image (HSI) contains both spatial pattern and spectral information which has been widely used in food safety, remote sensing, and medical detection. However, the acquisition of hyperspectral images is usually costly due to the complicated apparatus for the acquisition of optical spectrum. Recently, it has been reported that HSI can be reconstructed from single RGB image using convolution neural network (CNN) algorithms. Compared with the traditional hyperspectral cameras, the method based on CNN algorithms is simple, portable and low cost. In this study, we focused on the influence of the RGB camera spectral sensitivity (CSS) on the HSI. A Xenon lamp incorporated with a monochromator were used as the standard light source to calibrate the CSS. And the experimental results show that the CSS plays a significant role in the reconstruction accuracy of an HSI. In addition, we proposed a new HSI reconstruction network where the dimensional structure of the original hyperspectral datacube was modified by 3D matrix transpose to improve the reconstruction accuracy. △ Less

Submitted 12 July, 2021; originally announced July 2021.

arXiv:2106.06256 [pdf]

An RF-source-free microwave photonic radar with an optically injected semiconductor laser for high-resolution detection and imaging

Authors: Pei Zhou, Rengheng Zhang, Nianqiang Li, Zhidong Jiang, Shilong Pan

Abstract: This paper presents a novel microwave photonic (MWP) radar scheme that is capable of optically generating and processing broadband linear frequency-modulated (LFM) microwave signals without using any radio-frequency (RF) sources. In the transmitter, a broadband LFM microwave signal is generated by controlling the period-one (P1) oscillation of an optically injected semiconductor laser. After targe… ▽ More This paper presents a novel microwave photonic (MWP) radar scheme that is capable of optically generating and processing broadband linear frequency-modulated (LFM) microwave signals without using any radio-frequency (RF) sources. In the transmitter, a broadband LFM microwave signal is generated by controlling the period-one (P1) oscillation of an optically injected semiconductor laser. After targets reflection, photonic de-chir** is implemented based on a dual-drive Mach-Zehnder modulator (DMZM), which is followed by a low-speed analog-to-digital converter (ADC) and digital signal processer (DSP) to reconstruct target information. Without the limitations of external RF sources, the proposed radar has an ultra-flexible tunability, and the main operating parameters are adjustable, including central frequency, bandwidth, frequency band, and temporal period. In the experiment, a fully photonics-based Ku-band radar with a bandwidth of 4 GHz is established for high-resolution detection and inverse synthetic aperture radar (ISAR) imaging. Results show that a high range resolution reaching ~1.88 cm, and a two-dimensional (2D) imaging resolution as high as ~1.88 cm x ~2.00 cm are achieved with a sampling rate of 100 MSa/s in the receiver. The flexible tunability of the radar is also experimentally investigated. The proposed radar scheme features low cost, simple structure, and high reconfigurability, which, hopefully, is to be used in future multifunction adaptive and miniaturized radars. △ Less

Submitted 11 June, 2021; originally announced June 2021.

arXiv:2106.06143 [pdf, other]

Monotonic Neural Network: combining Deep Learning with Domain Knowledge for Chiller Plants Energy Optimization

Authors: Fanhe Ma, Faen Zhang, Shenglan Ben, Shuxin Qin, Pengcheng Zhou, Changsheng Zhou, Fengyi Xu

Abstract: In this paper, we are interested in building a domain knowledge based deep learning framework to solve the chiller plants energy optimization problems. Compared to the hotspot applications of deep learning (e.g. image classification and NLP), it is difficult to collect enormous data for deep network training in real-world physical systems. Most existing methods reduce the complex systems into line… ▽ More In this paper, we are interested in building a domain knowledge based deep learning framework to solve the chiller plants energy optimization problems. Compared to the hotspot applications of deep learning (e.g. image classification and NLP), it is difficult to collect enormous data for deep network training in real-world physical systems. Most existing methods reduce the complex systems into linear model to facilitate the training on small samples. To tackle the small sample size problem, this paper considers domain knowledge in the structure and loss design of deep network to build a nonlinear model with lower redundancy function space. Specifically, the energy consumption estimation of most chillers can be physically viewed as an input-output monotonic problem. Thus, we can design a Neural Network with monotonic constraints to mimic the physical behavior of the system. We verify the proposed method in a cooling system of a data center, experimental results show the superiority of our framework in energy optimization compared to the existing ones. △ Less

Submitted 10 June, 2021; originally announced June 2021.

arXiv:2106.02424 [pdf, other]

Contour Moments Based Manipulation of Composite Rigid-Deformable Objects with Finite Time Model Estimation and Shape/Position Control

Authors: Jiaming Qi, Guangfu Ma, Jihong Zhu, Peng Zhou, Yueyong Lyu, Haibo Zhang, David Navarro-Alarcon

Abstract: The robotic manipulation of composite rigid-deformable objects (i.e. those with mixed non-homogeneous stiffness properties) is a challenging problem with clear practical applications that, despite the recent progress in the field, it has not been sufficiently studied in the literature. To deal with this issue, in this paper we propose a new visual servoing method that has the capability to manipul… ▽ More The robotic manipulation of composite rigid-deformable objects (i.e. those with mixed non-homogeneous stiffness properties) is a challenging problem with clear practical applications that, despite the recent progress in the field, it has not been sufficiently studied in the literature. To deal with this issue, in this paper we propose a new visual servoing method that has the capability to manipulate this broad class of objects (which varies from soft to rigid) with the same adaptive strategy. To quantify the object's infinite-dimensional configuration, our new approach computes a compact feedback vector of 2D contour moments features. A sliding mode control scheme is then designed to simultaneously ensure the finite-time convergence of both the feedback shape error and the model estimation error. The stability of the proposed framework (including the boundedness of all the signals) is rigorously proved with Lyapunov theory. Detailed simulations and experiments are presented to validate the effectiveness of the proposed approach. To the best of the author's knowledge, this is the first time that contour moments along with finite-time control have been used to solve this difficult manipulation problem. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2104.03587 [pdf, other]

WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition

Authors: Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen

Abstract: Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR). AED models, however, still have drawbacks when deploying in commercial applications. Autoregressive beam search decoding makes it inefficient for high-concurrency applications. It is also inconvenient to integrate external word-level languag… ▽ More Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR). AED models, however, still have drawbacks when deploying in commercial applications. Autoregressive beam search decoding makes it inefficient for high-concurrency applications. It is also inconvenient to integrate external word-level language models. The most important thing is that AED models are difficult for streaming recognition due to global attention mechanism. In this paper, we propose a novel framework, namely WNARS, using hybrid CTC-attention AED models and weighted finite-state transducers (WFST) to solve these problems together. We switch from autoregressive beam search to CTC branch decoding, which performs first-pass decoding with WFST in chunk-wise streaming way. The decoder branch then performs second-pass rescoring on the generated hypotheses non-autoregressively. On the AISHELL-1 task, our WNARS achieves a character error rate of 5.22% with 640ms latency, to the best of our knowledge, which is the state-of-the-art performance for online ASR. Further experiments on our 10,000-hour Mandarin task show the proposed method achieves more than 20% improvements with 50% latency compared to a strong TDNN-BLSTM lattice-free MMI baseline. △ Less

Submitted 20 April, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

arXiv:2104.02868 [pdf, other]

Darts-Conformer: Towards Efficient Gradient-Based Neural Architecture Search For End-to-End ASR

Authors: Xian Shi, Pan Zhou, Wei Chen, Lei Xie

Abstract: Neural architecture search (NAS) has been successfully applied to tasks like image classification and language modeling for finding efficient high-performance network architectures. In ASR field especially end-to-end ASR, the related research is still in its infancy. In this work, we focus on applying NAS on the most popular manually designed model: Conformer, and then propose an efficient ASR mod… ▽ More Neural architecture search (NAS) has been successfully applied to tasks like image classification and language modeling for finding efficient high-performance network architectures. In ASR field especially end-to-end ASR, the related research is still in its infancy. In this work, we focus on applying NAS on the most popular manually designed model: Conformer, and then propose an efficient ASR model searching method that benefits from the natural advantage of differentiable architecture search (Darts) in reducing computational overheads. We fuse Darts mutator and Conformer blocks to form a complete search space, within which a modified architecture called Darts-Conformer cell is found automatically. The entire searching process on AISHELL-1 dataset costs only 0.7 GPU days. Replacing the Conformer encoder by stacking searched cell, we get an end-to-end ASR model (named as Darts-Conformner) that outperforms the Conformer baseline by 4.7\% on the open-source AISHELL-1 dataset. Besides, we verify the transferability of the architecture searched on a small dataset to a larger 2k-hour dataset. To the best of our knowledge, this is the first successful attempt to apply gradient-based architecture search in the attention-based encoder-decoder ASR model. △ Less

Submitted 10 August, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: Submitted to ASRU 2021

arXiv:2012.11896 [pdf, other]

Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

Authors: Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin

Abstract: Low-resource automatic speech recognition (ASR) is challenging, as the low-resource target language data cannot well train an ASR model. To solve this issue, meta-learning formulates ASR for each source language into many small ASR tasks and meta-learns a model initialization on all tasks from different source languages to access fast adaptation on unseen target languages. However, for different s… ▽ More Low-resource automatic speech recognition (ASR) is challenging, as the low-resource target language data cannot well train an ASR model. To solve this issue, meta-learning formulates ASR for each source language into many small ASR tasks and meta-learns a model initialization on all tasks from different source languages to access fast adaptation on unseen target languages. However, for different source languages, the quantity and difficulty vary greatly because of their different data scales and diverse phonological systems, which leads to task-quantity and task-difficulty imbalance issues and thus a failure of multilingual meta-learning ASR (MML-ASR). In this work, we solve this problem by develo** a novel adversarial meta sampling (AMS) approach to improve MML-ASR. When sampling tasks in MML-ASR, AMS adaptively determines the task sampling probability for each source language. Specifically, for each source language, if the query loss is large, it means that its tasks are not well sampled to train ASR model in terms of its quantity and difficulty and thus should be sampled more frequently for extra learning. Inspired by this fact, we feed the historical task query loss of all source language domain into a network to learn a task sampling policy for adversarially increasing the current query loss of MML-ASR. Thus, the learnt task sampling policy can master the learning situation of each language and thus predicts good task sampling probability for each language for more effective learning. Finally, experiment results on two multilingual datasets show significant performance improvement when applying our AMS on MML-ASR, and also demonstrate the applicability of AMS to other low-resource speech tasks and transfer learning ASR approaches. △ Less

Submitted 12 April, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

Comments: accepted in AAAI2021

arXiv:2009.01502 [pdf, other]

doi 10.1109/TITS.2020.3035841

DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control in the IoV

Authors: Pengyuan Zhou, Xianfu Chen, Zhi Liu, Tristan Braud, Pan Hui, Jussi Kangasharju

Abstract: The Internet of Vehicles (IoV) enables real-time data exchange among vehicles and roadside units and thus provides a promising solution to alleviate traffic jams in the urban area. Meanwhile, better traffic management via efficient traffic light control can benefit the IoV as well by enabling a better communication environment and decreasing the network load. As such, IoV and efficient traffic lig… ▽ More The Internet of Vehicles (IoV) enables real-time data exchange among vehicles and roadside units and thus provides a promising solution to alleviate traffic jams in the urban area. Meanwhile, better traffic management via efficient traffic light control can benefit the IoV as well by enabling a better communication environment and decreasing the network load. As such, IoV and efficient traffic light control can formulate a virtuous cycle. Edge computing, an emerging technology to provide low-latency computation capabilities at the edge of the network, can further improve the performance of this cycle. However, while the collected information is valuable, an efficient solution for better utilization and faster feedback has yet to be developed for edge-empowered IoV. To this end, we propose a Decentralized Reinforcement Learning at the Edge for traffic light control in the IoV (DRLE). DRLE exploits the ubiquity of the IoV to accelerate the collection of traffic data and its interpretation towards alleviating congestion and providing better traffic light control. DRLE operates within the coverage of the edge servers and uses aggregated data from neighboring edge servers to provide city-scale traffic light control. DRLE decomposes the highly complex problem of large area control. into a decentralized multi-agent problem. We prove its global optima with concrete mathematical reasoning. The proposed decentralized reinforcement learning algorithm running at each edge node adapts the traffic lights in real time. We conduct extensive evaluations and demonstrate the superiority of this approach over several state-of-the-art algorithms. △ Less

Submitted 5 January, 2021; v1 submitted 3 September, 2020; originally announced September 2020.

Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems

arXiv:2008.06896 [pdf, other]

Adaptive Shape Servoing of Elastic Rods using Parameterized Regression Features and Auto-Tuning Motion Controls

Authors: Jiaming Qi, Guangtao Ran, Bohui Wang, Jian Liu, Wanyu Ma, Peng Zhou, David Navarro-Alarcon

Abstract: The robotic manipulation of deformable linear objects has shown great potential in a wide range of real-world applications. However, it presents many challenges due to the objects' complex nonlinearity and high-dimensional configuration. In this paper, we propose a new shape servoing framework to automatically manipulate elastic rods through visual feedback. Our new method uses parameterized regre… ▽ More The robotic manipulation of deformable linear objects has shown great potential in a wide range of real-world applications. However, it presents many challenges due to the objects' complex nonlinearity and high-dimensional configuration. In this paper, we propose a new shape servoing framework to automatically manipulate elastic rods through visual feedback. Our new method uses parameterized regression features to compute a compact (low-dimensional) feature vector that quantifies the object's shape, thus, enabling to establish an explicit shape servo-loop. To automatically deform the rod into a desired shape, the proposed adaptive controller iteratively estimates the differential transformation between the robot's motion and the relative shape changes; This valuable capability allows to effectively manipulate objects with unknown mechanical models. An auto-tuning algorithm is introduced to adjust the robot's sha** motions in real-time based on optimal performance criteria. To validate the proposed framework, a detailed experimental study with vision-guided robotic manipulators is presented. △ Less

Submitted 9 September, 2023; v1 submitted 16 August, 2020; originally announced August 2020.

Comments: 8 pages, 12 figures

arXiv:2004.00799 [pdf, ps, other]

Cost-efficient and Skew-aware Data Scheduling for Incremental Learning in 5G Network

Authors: Lingjun Pu, Xin**g Yuan, Xiaohang Xu, Xu Chen, Pan Zhou, **gdong Xu

Abstract: To facilitate the emerging applications in 5G networks, mobile network operators will provide many network functions in terms of control and prediction. Recently, they have recognized the power of machine learning (ML) and started to explore its potential to facilitate those network functions. Nevertheless, the current ML models for network functions are often derived in an offline manner, which i… ▽ More To facilitate the emerging applications in 5G networks, mobile network operators will provide many network functions in terms of control and prediction. Recently, they have recognized the power of machine learning (ML) and started to explore its potential to facilitate those network functions. Nevertheless, the current ML models for network functions are often derived in an offline manner, which is inefficient due to the excessive overhead for transmitting a huge volume of dataset to remote ML training clouds and failing to provide the incremental learning capability for the continuous model updating. As an alternative solution, we propose Cocktail, an incremental learning framework within a reference 5G network architecture. To achieve cost efficiency while increasing trained model accuracy, an efficient online data scheduling policy is essential. To this end, we formulate an online data scheduling problem to optimize the framework cost while alleviating the data skew issue caused by the capacity heterogeneity of training workers from the long-term perspective. We exploit the stochastic gradient descent to devise an online asymptotically optimal algorithm, including two optimal policies based on novel graph constructions for skew-aware data collection and data training. Small-scale testbed and large-scale simulations validate the superior performance of our proposed framework. △ Less

Submitted 12 September, 2021; v1 submitted 1 April, 2020; originally announced April 2020.

arXiv:2001.00149 [pdf]

Simulation of Skin Stretching around the Forehead Wrinkles in Rhytidectomy

Authors: ** Zhou, Shuo Huang, Qiang Chen, Siyuan He, Guochao Cai

Abstract: Objective: Skin stretching around the forehead wrinkles is an important method in rhytidectomy. Proper parameters are required to evaluate the surgical effect. In this paper, a simulation method was proposed to obtain the parameters. Methods: Three-dimensional point cloud data with a resolution of 50 μm were employed. First, a smooth supporting contour under the wrinkled forehead was generated via… ▽ More Objective: Skin stretching around the forehead wrinkles is an important method in rhytidectomy. Proper parameters are required to evaluate the surgical effect. In this paper, a simulation method was proposed to obtain the parameters. Methods: Three-dimensional point cloud data with a resolution of 50 μm were employed. First, a smooth supporting contour under the wrinkled forehead was generated via b-spline interpolation and extrapolation to constrain the deformation of the wrinkled zone. Then, based on the vector formed intrinsic finite element (VFIFE) algorithm, the simulation was implemented in Matlab for the deformation of wrinkled forehead skin in the stretching process. Finally, the stress distribution and the residual wrinkles of forehead skin were employed to evaluate the surgical effect. Results: Although the residual wrinkles are similar when forehead wrinkles are finitely stretched, their stress distribution changes greatly. This indicates that the stress distribution in the skin is effective to evaluate the surgical effect, and the forehead wrinkles are easily to be overstretched, which may lead to potential skin injuries. Conclusion: The simulation method can predict stress distribution and residual wrinkles after forehead wrinkle stretching surgery, which can be potentially used to control the surgical process and further reduce risks of skin injury. △ Less

Submitted 1 January, 2020; originally announced January 2020.

arXiv:1911.09275 [pdf, other]

A Machine Learning-enhanced Robust P-Phase Picker for Real-time Seismic Monitoring

Authors: Dazhong Shen, Qi Zhang, Tong Xu, Hengshu Zhu, Wenjia Zhao, Zikai Yin, Peilun Zhou, Lihua Fang, Enhong Chen, Hui Xiong

Abstract: Identifying the arrival times of seismic P-phases plays a significant role in real-time seismic monitoring, which provides critical guidance for emergency response activities. While considerable research has been conducted on this topic, efficiently capturing the arrival times of seismic P-phases hidden within intensively distributed and noisy seismic waves, such as those generated by the aftersho… ▽ More Identifying the arrival times of seismic P-phases plays a significant role in real-time seismic monitoring, which provides critical guidance for emergency response activities. While considerable research has been conducted on this topic, efficiently capturing the arrival times of seismic P-phases hidden within intensively distributed and noisy seismic waves, such as those generated by the aftershocks of destructive earthquakes, remains a real challenge since most common existing methods in seismology rely on laborious expert supervision. To this end, in this paper, we present a machine learning-enhanced framework based on ensemble learning strategy, EL-Picker, for the automatic identification of seismic P-phase arrivals on continuous and massive waveforms. More specifically, EL-Picker consists of three modules, namely, Trigger, Classifier, and Refiner, and an ensemble learning strategy is exploited to integrate several machine learning classifiers. An evaluation of the aftershocks following the MS 8.0 Wenchuan earthquake demonstrates that EL-Picker can not only achieve the best identification performance but also identify 120% more seismic P-phase arrivals as complementary data. Meanwhile, experimental results also reveal both the applicability of different machine learning models for waveforms collected from different seismic stations and the regularities of seismic P-phase arrivals that might be neglected during manual inspection. These findings clearly validate the effectiveness, efficiency, flexibility and stability of EL-Picker. △ Less

Submitted 20 August, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: Note that this paper is the English version of our work published in SCIENTIA SINICA Informationis (http://engine.scichina.com/doi/10.1360/SSI-2020-0214), which is suggested to be cited if needed

arXiv:1911.00203 [pdf, other]

Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding

Authors: Pan Zhou, Ruchao Fan, Wei Chen, Jia Jia

Abstract: Transformer has shown promising results in many sequence to sequence transformation tasks recently. It utilizes a number of feed-forward self-attention layers to replace the recurrent neural networks (RNN) in attention-based encoder decoder (AED) architecture. Self-attention layer learns temporal dependence by incorporating sinusoidal positional embedding of tokens in a sequence for parallel compu… ▽ More Transformer has shown promising results in many sequence to sequence transformation tasks recently. It utilizes a number of feed-forward self-attention layers to replace the recurrent neural networks (RNN) in attention-based encoder decoder (AED) architecture. Self-attention layer learns temporal dependence by incorporating sinusoidal positional embedding of tokens in a sequence for parallel computing. Quicker iteration speed in training than sequential operation of RNN can be obtained. Deeper layers of the transformer also make it perform better than RNN-based AED. However, this parallelization ability is lost when applying scheduled sampling training. Self-attention with sinusoidal positional embedding may cause performance degradations for longer sequences that have similar acoustic or semantic information at different positions as well. To address these problems, we propose to use parallel scheduled sampling (PSS) and relative positional embedding (RPE) to help the transformer generalize to unseen data. Our proposed methods achieve a 7% relative improvement for short utterances and a 70% relative gain for long utterances on a 10,000-hour Mandarin ASR task. △ Less

Submitted 30 November, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

arXiv:1908.10959 [pdf, other]

Short-and-Sparse Deconvolution -- A Geometric Approach

Authors: Yenson Lau, Qing Qu, Han-Wen Kuo, Pengcheng Zhou, Yuqian Zhang, John Wright

Abstract: Short-and-sparse deconvolution (SaSD) is the problem of extracting localized, recurring motifs in signals with spatial or temporal structure. Variants of this problem arise in applications such as image deblurring, microscopy, neural spike sorting, and more. The problem is challenging in both theory and practice, as natural optimization formulations are nonconvex. Moreover, practical deconvolution… ▽ More Short-and-sparse deconvolution (SaSD) is the problem of extracting localized, recurring motifs in signals with spatial or temporal structure. Variants of this problem arise in applications such as image deblurring, microscopy, neural spike sorting, and more. The problem is challenging in both theory and practice, as natural optimization formulations are nonconvex. Moreover, practical deconvolution problems involve smooth motifs (kernels) whose spectra decay rapidly, resulting in poor conditioning and numerical challenges. This paper is motivated by recent theoretical advances, which characterize the optimization landscape of a particular nonconvex formulation of SaSD. This is used to derive a $provable$ algorithm which exactly solves certain non-practical instances of the SaSD problem. We leverage the key ideas from this theory (sphere constraints, data-driven initialization) to develop a $practical$ algorithm, which performs well on data arising from a range of application areas. We highlight key additional challenges posed by the ill-conditioning of real SaSD problems, and suggest heuristics (acceleration, continuation, reweighting) to mitigate them. Experiments demonstrate both the performance and generality of the proposed method. △ Less

Submitted 1 October, 2019; v1 submitted 28 August, 2019; originally announced August 2019.

Comments: *YL and QQ contributed equally to this work; 30 figures, 45 pages; This version: added an experiment comparing with other methods, corrected typos and added references

arXiv:1907.08363 [pdf, other]

Joint Coverage and Power Control in Highly Dynamic and Massive UAV Networks: An Aggregative Game-theoretic Learning Approach

Authors: Zhuoying Li, Pan Zhou, Yanru Zhang, Lin Gao

Abstract: Unmanned aerial vehicles (UAV) ad-hoc network is a significant contingency plan for communication after a natural disaster, such as typhoon and earthquake. To achieve efficient and rapid networks deployment, we employ noncooperative game theory and amended binary log-linear algorithm (BLLA) seeking for the Nash equilibrium which achieves the optimal network performance. We not only take channel ov… ▽ More Unmanned aerial vehicles (UAV) ad-hoc network is a significant contingency plan for communication after a natural disaster, such as typhoon and earthquake. To achieve efficient and rapid networks deployment, we employ noncooperative game theory and amended binary log-linear algorithm (BLLA) seeking for the Nash equilibrium which achieves the optimal network performance. We not only take channel overlap and power control into account but also consider coverage and the complexity of interference. However, extensive UAV game theoretical models show limitations in post-disaster scenarios which require large-scale UAV network deployments. Besides, the highly dynamic post-disaster scenarios cause strategies updating constraint and strategy-deciding error on UAV ad-hoc networks. To handle these problems, we employ aggregative game which could capture and cover those characteristics. Moreover, we propose a novel synchronous payoff-based binary log-linear learning algorithm (SPBLLA) to lessen information exchange and reduce time consumption. Ultimately, the experiments indicate that, under the same strategy-deciding error rate, SPBLLA's learning rate is manifestly faster than that of the revised BLLA. Hence, the new model and algorithm are more suitable and promising for large-scale highly dynamic scenarios. △ Less

Submitted 13 April, 2024; v1 submitted 18 July, 2019; originally announced July 2019.

arXiv:1907.06081 [pdf]

Preliminary study on the modal decomposition of Hermite Gaussian beams via deep learning

Authors: Yi An, Tianyue Hou, Jun Li, Liang** Huang, **yong Leng, Lijia Yang, Pu Zhou

Abstract: The Hermite-Gaussian (HG) modes make up a complete and orthonormal basis, which have been extensively used to describe optical fields. Here, we demonstrate, for the first time to our knowledge, deep learning-based modal decomposition (MD) of HG beams. This method offers a fast, economical and robust way to acquire both the power content and phase information through a single-shot beam intensity im… ▽ More The Hermite-Gaussian (HG) modes make up a complete and orthonormal basis, which have been extensively used to describe optical fields. Here, we demonstrate, for the first time to our knowledge, deep learning-based modal decomposition (MD) of HG beams. This method offers a fast, economical and robust way to acquire both the power content and phase information through a single-shot beam intensity image, which will be beneficial for the beam sha**, beam quality assessment, studies of resonator perturbations, and other further research on the HG beams. △ Less

Submitted 16 July, 2019; v1 submitted 13 July, 2019; originally announced July 2019.

Comments: 6 figures

arXiv:1906.06972 [pdf, other]

EnlightenGAN: Deep Light Enhancement without Paired Supervision

Authors: Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang

Abstract: Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data? As one such example, this paper explores the low-light image enhancement problem, where in practice it is extremely challenging to simultaneously take a low-light and a normal-light photo of the same visual scene. We propose… ▽ More Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data? As one such example, this paper explores the low-light image enhancement problem, where in practice it is extremely challenging to simultaneously take a low-light and a normal-light photo of the same visual scene. We propose a highly effective unsupervised generative adversarial network, dubbed EnlightenGAN, that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images. Instead of supervising the learning using ground truth data, we propose to regularize the unpaired training using the information extracted from the input itself, and benchmark a series of innovations for the low-light image enhancement problem, including a global-local discriminator structure, a self-regularized perceptual loss fusion, and attention mechanism. Through extensive experiments, our proposed approach outperforms recent methods under a variety of metrics in terms of visual quality and subjective user study. Thanks to the great flexibility brought by unpaired training, EnlightenGAN is demonstrated to be easily adaptable to enhancing real-world images from various domains. The code is available at \url{https://github.com/yueruchen/EnlightenGAN} △ Less

Submitted 24 January, 2021; v1 submitted 17 June, 2019; originally announced June 2019.

arXiv:1906.01895 [pdf, ps, other]

AI-Skin : Skin Disease Recognition based on Self-learning and Wide Data Collection through a Closed Loop Framework

Authors: Min Chen, ** Zhou, Di Wu, Long Hu, Mohammad Mehedi Hassan, Atif Alamri

Abstract: There are a lot of hidden dangers in the change of human skin conditions, such as the sunburn caused by long-time exposure to ultraviolet radiation, which not only has aesthetic impact causing psychological depression and lack of self-confidence, but also may even be life-threatening due to skin canceration. Current skin disease researches adopt the auto-classification system for improving the acc… ▽ More There are a lot of hidden dangers in the change of human skin conditions, such as the sunburn caused by long-time exposure to ultraviolet radiation, which not only has aesthetic impact causing psychological depression and lack of self-confidence, but also may even be life-threatening due to skin canceration. Current skin disease researches adopt the auto-classification system for improving the accuracy rate of skin disease classification. However, the excessive dependence on the image sample database is unable to provide individualized diagnosis service for different population groups. To overcome this problem, a medical AI framework based on data width evolution and self-learning is put forward in this paper to provide skin disease medical service meeting the requirement of real time, extendibility and individualization. First, the wide collection of data in the close-loop information flow of user and remote medical data center is discussed. Next, a data set filter algorithm based on information entropy is given, to lighten the load of edge node and meanwhile improve the learning ability of remote cloud analysis model. In addition, the framework provides an external algorithm load module, which can be compatible with the application requirements according to the model selected. Three kinds of deep learning model, i.e. LeNet-5, AlexNet and VGG16, are loaded and compared, which have verified the universality of the algorithm load module. The experiment platform for the proposed real-time, individualized and extensible skin disease recognition system is built. And the system's computation and communication delay under the interaction scenario between tester and remote data center are analyzed. It is demonstrated that the system we put forward is reliable and effective. △ Less

Submitted 5 June, 2019; originally announced June 2019.

arXiv:1904.11983 [pdf]

doi 10.1364/OE.27.018683

Deep learning enabled superfast and accurate M^2 evaluation for fiber beams

Authors: Yi An, Jun Li, Liang** Huang, **yong Leng, Lijia Yang, Pu Zhou

Abstract: We introduce deep learning technique to predict the beam propagation factor M^2 of the laser beams emitting from few-mode fiber for the first time, to the best of our knowledge. The deep convolutional neural network (CNN) is trained with paired data of simulated near-field beam patterns and their calculated M^2 value, aiming at learning a fast and accurate map** from the former to the latter. Th… ▽ More We introduce deep learning technique to predict the beam propagation factor M^2 of the laser beams emitting from few-mode fiber for the first time, to the best of our knowledge. The deep convolutional neural network (CNN) is trained with paired data of simulated near-field beam patterns and their calculated M^2 value, aiming at learning a fast and accurate map** from the former to the latter. The trained deep CNN can then be utilized to evaluate M^2 of the fiber beams from single beam patterns. The results of simulated testing samples have shown that our scheme can achieve an averaged prediction error smaller than 2% even when up to 10 eigenmodes are involved in the fiber. The error becomes slightly larger when heavy noises are added into the input beam patterns but still smaller than 2.5%, which further proves the accuracy and robustness of our method. Furthermore, the M^2 estimation takes only about 5 ms for a prepared beam pattern with one forward pass, which can be adopted for real-time M^2 determination with only one supporting Charge-Coupled Device (CCD). The experimental results further prove the feasibility of our scheme. Moreover, the method we proposed can be confidently extended to other kinds of beams provided that adequate training samples are accessible. Deep learning paves the way to superfast and accurate M^2 evaluation with very low experimental efforts. △ Less

Submitted 13 July, 2019; v1 submitted 26 April, 2019; originally announced April 2019.

Comments: 12 pages, 10 figures

Journal ref: Optics Express, 27, 18683-18694 (2019)

arXiv:1811.05250 [pdf, ps, other]

Modality Attention for End-to-End Audio-visual Speech Recognition

Authors: Pan Zhou, Wenwen Yang, Wei Chen, Yanfeng Wang, Jia Jia

Abstract: Audio-visual speech recognition (AVSR) system is thought to be one of the most promising solutions for robust speech recognition, especially in noisy environment. In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance. Our method is realized using… ▽ More Audio-visual speech recognition (AVSR) system is thought to be one of the most promising solutions for robust speech recognition, especially in noisy environment. In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance. Our method is realized using state-of-the-art sequence-to-sequence (Seq2seq) architectures. Experimental results show that relative improvements from 2% up to 36% over the auditory modality alone are obtained depending on the different signal-to-noise-ratio (SNR). Compared to the traditional feature concatenation methods, our proposed approach can achieve better recognition performance under both clean and noisy conditions. We believe modality attention based end-to-end method can be easily generalized to other multimodal tasks with correlated information. △ Less

Submitted 23 April, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

Comments: accepted by ICASSP2019

arXiv:1811.05247 [pdf, other]

An Online Attention-based Model for Speech Recognition

Authors: Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu

Abstract: Attention-based end-to-end models such as Listen, Attend and Spell (LAS), simplify the whole pipeline of traditional automatic speech recognition (ASR) systems and become popular in the field of speech recognition. In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global… ▽ More Attention-based end-to-end models such as Listen, Attend and Spell (LAS), simplify the whole pipeline of traditional automatic speech recognition (ASR) systems and become popular in the field of speech recognition. In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism. However, bidirectional encoder and GSA are two obstacles for real-time speech recognition. In this work, we aim to stream LAS baseline by removing the above two obstacles. On the encoder side, we use a latency-controlled (LC) bidirectional structure to reduce the delay of forward computation. Meanwhile, an adaptive monotonic chunk-wise attention (AMoChA) mechanism is proposed to replace GSA for the calculation of attention weight distribution. Furthermore, we propose two methods to alleviate the huge performance degradation when combining LC and AMoChA. Finally, we successfully acquire an online LAS model, LC-AMoChA, which has only 3.5% relative performance reduction to LAS baseline on our internal Mandarin corpus. △ Less

Submitted 25 April, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

arXiv:1811.05097 [pdf, other]

Exploring RNN-Transducer for Chinese Speech Recognition

Authors: Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie

Abstract: End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system. RNN transducer (RNN-T) is one of the popular end-to-end methods. Previous studies have shown that RNN-T is difficult to train and a very complex training process is needed for a reasonable performance. In this paper, we explore RNN-T for a Chinese… ▽ More End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system. RNN transducer (RNN-T) is one of the popular end-to-end methods. Previous studies have shown that RNN-T is difficult to train and a very complex training process is needed for a reasonable performance. In this paper, we explore RNN-T for a Chinese large vocabulary continuous speech recognition (LVCSR) task and aim to simplify the training process while maintaining performance. First, a new strategy of learning rate decay is proposed to accelerate the model convergence. Second, we find that adding convolutional layers at the beginning of the network and using ordered data can discard the pre-training process of the encoder without loss of performance. Besides, we design experiments to find a balance among the usage of GPU memory, training circle and model performance. Finally, we achieve 16.9% character error rate (CER) on our test set which is 2% absolute improvement from a strong BLSTM CE system with language model trained on the same text corpus. △ Less

Submitted 22 April, 2019; v1 submitted 12 November, 2018; originally announced November 2018.

arXiv:1811.00882 [pdf]

doi 10.1364/OE.27.010127

Learning to decompose the modes in few-mode fibers with deep convolutional neural network

Authors: Yi An, Liang** Huang, Jun Li, **yong Leng, Lijia Yang, Pu Zhou

Abstract: We introduce deep learning technique to perform complete mode decomposition for few-mode optical fiber for the first time. Our goal is to learn a fast and accurate map** from near-field beam profiles to the complete mode coefficients, including both modal amplitudes and phases. We train the convolutional neural network with simulated beam patterns, and evaluate the network on both of the simulat… ▽ More We introduce deep learning technique to perform complete mode decomposition for few-mode optical fiber for the first time. Our goal is to learn a fast and accurate map** from near-field beam profiles to the complete mode coefficients, including both modal amplitudes and phases. We train the convolutional neural network with simulated beam patterns, and evaluate the network on both of the simulated beam data and the real beam data. In simulated beam data testing, the correlation between the reconstructed and the ideal beam profiles can achieve 0.9993 and 0.995 for 3-mode case and 5-mode case respectively. While in the real 3-mode beam data testing, the average correlation is 0.9912 and the mode decomposition can be potentially performed at 33 Hz frequency on Graphic Processing Unit, indicating real-time processing ability. The quantitative evaluations demonstrate the superiority of our deep learning based approach. △ Less

Submitted 18 April, 2019; v1 submitted 31 October, 2018; originally announced November 2018.

Journal ref: Optics Express Vol. 27, Issue 7, pp. 10127-10137 (2019)

arXiv:1710.03190 [pdf, other]

Estimating Heterogeneous Treatment Effects in Residential Demand Response

Authors: Datong P. Zhou, Maximilian Balandat, Claire J. Tomlin

Abstract: We evaluate the causal effect of hour-ahead price interventions on the reduction in residential electricity consumption using a data set from a large-scale experiment on 7,000 households in California. By estimating user-level counterfactuals using time-series prediction, we estimate an average treatment effect of ~0.10 kWh (11%) per intervention and household. Next, we leverage causal decision tr… ▽ More We evaluate the causal effect of hour-ahead price interventions on the reduction in residential electricity consumption using a data set from a large-scale experiment on 7,000 households in California. By estimating user-level counterfactuals using time-series prediction, we estimate an average treatment effect of ~0.10 kWh (11%) per intervention and household. Next, we leverage causal decision trees to detect treatment effect heterogeneity across users by incorporating census data. These decision trees depart from classification and regression trees, as we intend to estimate a causal effect between treated and control units rather than perform outcome regression. We compare the performance of causal decision trees with a simpler, yet more inaccurate k-means clustering approach that naively detects heterogeneity in the feature space, confirming the superiority of causal decision trees. Lastly, we comment on how our methods to detect heterogeneity can be used for targeting households to improve cost efficiency. △ Less

Submitted 25 October, 2018; v1 submitted 6 October, 2017; originally announced October 2017.

Comments: 8 pages, 11 figures, 3 tables

arXiv:1703.00976 [pdf, other]

Hedging Strategies for Load-Serving Entities in Wholesale Electricity Markets

Authors: Datong P. Zhou, Munther A. Dahleh, Claire J. Tomlin

Abstract: Load-serving entities which procure electricity from the wholesale electricity market to service end-users face significant quantity and price risks due to the volatile nature of electricity demand and quasi-fixed residential tariffs at which electricity is sold. This paper investigates strategies for load serving entities to hedge against such price risks. Specifically, we compute profit-maximizi… ▽ More Load-serving entities which procure electricity from the wholesale electricity market to service end-users face significant quantity and price risks due to the volatile nature of electricity demand and quasi-fixed residential tariffs at which electricity is sold. This paper investigates strategies for load serving entities to hedge against such price risks. Specifically, we compute profit-maximizing portfolios of forward contract and call options as a function of the uncertain aggregate user demand. We compare the profit to the case of Demand Response, where users are offered monetary incentives to temporarily reduce their consumption during periods of supply shortages. Using smart meter data of residential customers in California, we simulate optimal portfolios and derive conditions under which Demand Response outperforms call options and forward contracts. △ Less

Submitted 17 March, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

Comments: 8 pages, 7 figures

arXiv:1609.06193 [pdf, other]

Stability Analysis of Wholesale Electricity Markets under Dynamic Consumption Models and Real-Time Pricing

Authors: Datong P. Zhou, Mardavij Roozbehani, Munther A. Dahleh, Claire J. Tomlin

Abstract: This paper analyzes stability conditions for wholesale electricity markets under real-time retail pricing and realistic consumption models with memory, which explicitly take into account previous electricity prices and consumption levels. By passing on the current retail price of electricity from supplier to consumer and feeding the observed consumption back to the supplier, a closed-loop dynamica… ▽ More This paper analyzes stability conditions for wholesale electricity markets under real-time retail pricing and realistic consumption models with memory, which explicitly take into account previous electricity prices and consumption levels. By passing on the current retail price of electricity from supplier to consumer and feeding the observed consumption back to the supplier, a closed-loop dynamical system for electricity prices and consumption arises whose stability is to be investigated. Under mild assumptions on the generation cost of electricity and consumers' backlog disutility functions, we show that, for consumer models with price memory only, market stability is achieved if the ratio between the consumers' marginal backlog disutility and the suppliers' marginal cost of supply remains below a fixed threshold. Further, consumer models with price and consumption memory can result in greater stability regions and faster convergence to the equilibrium compared to models with price memory alone, if consumption deviations from nominal demand are adequately penalized. △ Less

Submitted 18 February, 2017; v1 submitted 20 September, 2016; originally announced September 2016.

Comments: 8 pages, 7 Figures, accepted to the 2017 American Control Conference

Showing 1–47 of 47 results for author: Zhou, P