Skip to main content

Showing 1–47 of 47 results for author: Zhou, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.10236  [pdf, other

    eess.IV cs.AI

    Lightening Anything in Medical Images

    Authors: Ben Fei, Yixuan Li, Weidong Yang, Hengjun Gao, **gyi Xu, Lipeng Ma, Yatian Yang, **hong Zhou

    Abstract: The development of medical imaging techniques has made a significant contribution to clinical decision-making. However, the existence of suboptimal imaging quality, as indicated by irregular illumination or imbalanced intensity, presents significant obstacles in automating disease screening, analysis, and diagnosis. Existing approaches for natural image enhancement are mostly trained with numerous… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 23 pages, 6 figures

  2. arXiv:2402.19020  [pdf, other

    eess.IV cs.CV

    Unsupervised Learning of High-resolution Light Field Imaging via Beam Splitter-based Hybrid Lenses

    Authors: Jianxin Lei, Chengcai Xu, Langqing Shi, Junhui Hou, ** Zhou

    Abstract: In this paper, we design a beam splitter-based hybrid light field imaging prototype to record 4D light field image and high-resolution 2D image simultaneously, and make a hybrid light field dataset. The 2D image could be considered as the high-resolution ground truth corresponding to the low-resolution central sub-aperture image of 4D light field image. Subsequently, we propose an unsupervised lea… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  3. arXiv:2401.06788  [pdf, other

    eess.AS cs.AI cs.SD

    The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

    Authors: He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie

    Abstract: This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task. In terms of data processing, we leverage the lip motion extractor from the baseline1 to produce… ▽ More

    Submitted 29 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: Included in CNVSRC Workshop 2023, NCMMSC 2023

  4. arXiv:2401.03473  [pdf, ps, other

    cs.SD cs.AI eess.AS

    ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, Binbin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

    Abstract: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours… ▽ More

    Submitted 20 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  5. MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

    Authors: He Wang, Pengcheng Guo, Pan Zhou, Lei Xie

    Abstract: While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness. However, current studies mainly focus on fusing the well-learned modality features, like the output of modality-specific encoders, without considering the… ▽ More

    Submitted 8 April, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: 5 pages, 3 figures Accepted at ICASSP 2024

  6. arXiv:2312.09760  [pdf, other

    eess.AS cs.SD

    U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

    Authors: Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

    Abstract: Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabu… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ASRU2023

  7. arXiv:2312.09746  [pdf, other

    cs.SD eess.AS

    Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies

    Authors: Bingshen Mu, Pengcheng Guo, Dake Guo, Pan Zhou, Wei Chen, Lei Xie

    Abstract: Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces challenges in real-world distant scenarios across various array topologies each with multiple recording devices. The focal point of the CHiME-7 Distant ASR task is to devise a unified system capable of generalizing various array topologies that have multiple recording devices and offering reliable recognition perf… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  8. arXiv:2309.07185  [pdf

    eess.SP cs.AI cs.HC

    A Health Monitoring System Based on Flexible Triboelectric Sensors for Intelligence Medical Internet of Things and its Applications in Virtual Reality

    Authors: Junqi Mao, Puen Zhou, Xiaoyao Wang, Hongbo Yao, Liuyang Liang, Yiqiao Zhao, Jiawei Zhang, Dayan Ban, Haiwu Zheng

    Abstract: The Internet of Medical Things (IoMT) is a platform that combines Internet of Things (IoT) technology with medical applications, enabling the realization of precision medicine, intelligent healthcare, and telemedicine in the era of digitalization and intelligence. However, the IoMT faces various challenges, including sustainable power supply, human adaptability of sensors and the intelligence of s… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  9. arXiv:2305.01790  [pdf

    cond-mat.mtrl-sci cs.ET eess.SP

    Cascaded Logic Gates Based on High-Performance Ambipolar Dual-Gate WSe2 Thin Film Transistors

    Authors: Xintong Li, Peng Zhou, Xuan Hu, Ethan Rivers, Kenji Watanabe, Takashi Taniguchi, Deji Akinwande, Joseph S. Friedman, Jean Anne C. Incorvia

    Abstract: Ambipolar dual-gate transistors based on two-dimensional (2D) materials, such as graphene, carbon nanotubes, black phosphorus, and certain transition metal dichalcogenides (TMDs), enable reconfigurable logic circuits with suppressed off-state current. These circuits achieve the same logical output as CMOS with fewer transistors and offer greater flexibility in design. The primary challenge lies in… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  10. arXiv:2301.01933  [pdf

    eess.SP

    Online Decomposition of Surface Electromyogram into Individual Motor Unit Activities Using Progressive FastICA Peel-off

    Authors: Haowen Zhao, Xu Zhang, Maoqi Chen, ** Zhou

    Abstract: Surface electromyogram (SEMG) decomposition provides a promising tool for decoding and understanding neural drive information non-invasively. In contrast to previous SEMG decomposition methods mainly developed in offline conditions, there are few studies on online SEMG decomposition. A novel method for online decomposition of SEMG data is presented using the progressive FastICA peel-off (PFP) algo… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: 11 pages, 8 figures, 56 references. Submitted to IEEE Transactions on Biomedical Engineering

  11. arXiv:2207.01287  [pdf, other

    eess.IV cs.CV

    FFCNet: Fourier Transform-Based Frequency Learning and Complex Convolutional Network for Colon Disease Classification

    Authors: Kai-Ni Wang, Yuting He, Shuaishuai Zhuang, Juzheng Miao, Xiaopu He, ** Zhou, Guanyu Yang, Guang-Quan Zhou, Shuo Li

    Abstract: Reliable automatic classification of colonoscopy images is of great significance in assessing the stage of colonic lesions and formulating appropriate treatment plans. However, it is challenging due to uneven brightness, location variability, inter-class similarity, and intra-class dissimilarity, affecting the classification accuracy. To address the above issues, we propose a Fourier-based Frequen… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted for publication at the 25th International Conference on Medical Image Computing and Computer Assisted Intervention - MICCAI 2022

  12. arXiv:2206.12759  [pdf, other

    cs.CL cs.SD eess.AS

    Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective

    Authors: Qingcheng Zeng, Dading Chong, Peilin Zhou, Jie Yang

    Abstract: Accented speech recognition and accent classification are relatively under-explored research areas in speech technology. Recently, deep learning-based methods and Transformer-based pretrained models have achieved superb performances in both areas. However, most accent classification tasks focused on classifying different kinds of English accents and little attention was paid to geographically-prox… ▽ More

    Submitted 28 June, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

    Comments: INTERSPEECH 2022

  13. arXiv:2205.11008  [pdf, other

    cs.CL cs.SD eess.AS

    Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection

    Authors: Peilin Zhou, Dading Chong, Helin Wang, Qingcheng Zeng

    Abstract: The past ten years have witnessed the rapid development of text-based intent detection, whose benchmark performances have already been taken to a remarkable level by deep learning techniques. However, automatic speech recognition (ASR) errors are inevitable in real-world applications due to the environment noise, unique speech patterns and etc, leading to sharp performance drop in state-of-the-art… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

    Comments: Submit to INTERSPEECH 2022

  14. arXiv:2205.09987  [pdf, other

    cs.RO eess.SY

    Model Predictive Manipulation of Compliant Objects with Multi-Objective Optimizer and Adversarial Network for Occlusion Compensation

    Authors: Jiaming Qi, Dongyu Li, Yufeng Gao, Peng Zhou, David Navarro-Alarcon

    Abstract: The robotic manipulation of compliant objects is currently one of the most active problems in robotics due to its potential to automate many important applications. Despite the progress achieved by the robotics community in recent years, the 3D sha** of these types of materials remains an open research problem. In this paper, we propose a new vision-based controller to automatically regulate the… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

  15. arXiv:2204.12768  [pdf, other

    cs.SD eess.AS

    Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training

    Authors: Dading Chong, Helin Wang, Peilin Zhou, Qingcheng Zeng

    Abstract: Transformer-based models attain excellent results and generalize well when trained on sufficient amounts of data. However, constrained by the limited data available in the audio domain, most transformer-based models for audio tasks are finetuned from pre-trained models in other domains (e.g. image), which has a notable gap with the audio domain. Other methods explore the self-supervised learning a… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: Submit to INTERSPEECH 2022

  16. arXiv:2202.09003  [pdf, ps, other

    cs.CL cs.SD eess.AS

    End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system

    Authors: Zhengyi Zhang, Pan Zhou

    Abstract: End-to-end (E2E) speech recognition architectures assemble all components of traditional speech recognition system into a single model. Although it simplifies ASR system, it introduces contextual ASR drawback: the E2E model has worse performance on utterances containing infrequent proper nouns. In this work, we propose to add a contextual bias attention (CBA) module to attention based encoder deco… ▽ More

    Submitted 17 February, 2022; originally announced February 2022.

    Comments: 5 pages, 5 tabels, 1 figure

  17. arXiv:2201.09163  [pdf

    eess.IV cs.CV

    Pulmonary Fissure Segmentation in CT Images Based on ODoS Filter and Shape Features

    Authors: Yuanyuan Peng, Pengpeng Luan, Hongbin Tu, Xiong Li, ** Zhou

    Abstract: Priori knowledge of pulmonary anatomy plays a vital role in diagnosis of lung diseases. In CT images, pulmonary fissure segmentation is a formidable mission due to various of factors. To address the challenge, an useful approach based on ODoS filter and shape features is presented for pulmonary fissure segmentation. Here, we adopt an ODoS filter by merging the orientation information and magnitude… ▽ More

    Submitted 22 January, 2022; originally announced January 2022.

  18. arXiv:2201.05344  [pdf, other

    eess.IV cs.CV cs.LG

    AWSnet: An Auto-weighted Supervision Attention Network for Myocardial Scar and Edema Segmentation in Multi-sequence Cardiac Magnetic Resonance Images

    Authors: Kai-Ni Wang, Xin Yang, Juzheng Miao, Lei Li, **g Yao, ** Zhou, Wufeng Xue, Guang-Quan Zhou, Xiahai Zhuang, Dong Ni

    Abstract: Multi-sequence cardiac magnetic resonance (CMR) provides essential pathology information (scar and edema) to diagnose myocardial infarction. However, automatic pathology segmentation can be challenging due to the difficulty of effectively exploring the underlying information from the multi-sequence CMR data. This paper aims to tackle the scar and edema segmentation from multi-sequence CMR with a n… ▽ More

    Submitted 14 January, 2022; originally announced January 2022.

    Comments: 19 pages, 10 figures, accepted by Medical Image Analysis

  19. arXiv:2110.09788  [pdf, other

    cs.CV eess.IV

    CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis

    Authors: Peng Zhou, Lingxi Xie, Bingbing Ni, Qi Tian

    Abstract: The style-based GAN (StyleGAN) architecture achieved state-of-the-art results for generating high-quality images, but it lacks explicit and precise control over camera poses. The recently proposed NeRF-based GANs made great progress towards 3D-aware generators, but they are unable to generate high-quality images yet. This paper presents CIPS-3D, a style-based, 3D-aware generator that is composed o… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: 3D-aware GANs based on NeRF, https://github.com/PeterouZh/CIPS-3D

  20. arXiv:2109.09161  [pdf, other

    cs.CL eess.AS

    Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

    Authors: Guolin Zheng, Yubei Xiao, Ke Gong, Pan Zhou, Xiaodan Liang, Liang Lin

    Abstract: Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply cascade pre-trained acoustic and language models to learn the transfer from speech to text. However, how to solve the representation discrepancy of speech and text i… ▽ More

    Submitted 9 October, 2021; v1 submitted 19 September, 2021; originally announced September 2021.

  21. Attention-based Neural Load Forecasting: A Dynamic Feature Selection Approach

    Authors: **g Xiong, Pengyang Zhou, Alan Chen, Yu Zhang

    Abstract: Encoder-decoder-based recurrent neural network (RNN) has made significant progress in sequence-to-sequence learning tasks such as machine translation and conversational models. Recent works have shown the advantage of this type of network in dealing with various time series forecasting tasks. The present paper focuses on the problem of multi-horizon short-term load forecasting, which plays a key r… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

  22. Deep-learning-based Hyperspectral imaging through a RGB camera

    Authors: Xinyu Gao, Tianlang Wang, **g Yang, **chao Tao, Yanqing Qiu, Yanlong Meng, Banging Mao, Pengwei Zhou, Yi Li

    Abstract: Hyperspectral image (HSI) contains both spatial pattern and spectral information which has been widely used in food safety, remote sensing, and medical detection. However, the acquisition of hyperspectral images is usually costly due to the complicated apparatus for the acquisition of optical spectrum. Recently, it has been reported that HSI can be reconstructed from single RGB image using convolu… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  23. arXiv:2106.06256  [pdf

    eess.SP physics.optics

    An RF-source-free microwave photonic radar with an optically injected semiconductor laser for high-resolution detection and imaging

    Authors: Pei Zhou, Rengheng Zhang, Nianqiang Li, Zhidong Jiang, Shilong Pan

    Abstract: This paper presents a novel microwave photonic (MWP) radar scheme that is capable of optically generating and processing broadband linear frequency-modulated (LFM) microwave signals without using any radio-frequency (RF) sources. In the transmitter, a broadband LFM microwave signal is generated by controlling the period-one (P1) oscillation of an optically injected semiconductor laser. After targe… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

  24. arXiv:2106.06143  [pdf, other

    eess.SP cs.LG

    Monotonic Neural Network: combining Deep Learning with Domain Knowledge for Chiller Plants Energy Optimization

    Authors: Fanhe Ma, Faen Zhang, Shenglan Ben, Shuxin Qin, Pengcheng Zhou, Changsheng Zhou, Fengyi Xu

    Abstract: In this paper, we are interested in building a domain knowledge based deep learning framework to solve the chiller plants energy optimization problems. Compared to the hotspot applications of deep learning (e.g. image classification and NLP), it is difficult to collect enormous data for deep network training in real-world physical systems. Most existing methods reduce the complex systems into line… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  25. arXiv:2106.02424  [pdf, other

    cs.RO eess.SY

    Contour Moments Based Manipulation of Composite Rigid-Deformable Objects with Finite Time Model Estimation and Shape/Position Control

    Authors: Jiaming Qi, Guangfu Ma, Jihong Zhu, Peng Zhou, Yueyong Lyu, Haibo Zhang, David Navarro-Alarcon

    Abstract: The robotic manipulation of composite rigid-deformable objects (i.e. those with mixed non-homogeneous stiffness properties) is a challenging problem with clear practical applications that, despite the recent progress in the field, it has not been sufficiently studied in the literature. To deal with this issue, in this paper we propose a new visual servoing method that has the capability to manipul… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

  26. arXiv:2104.03587  [pdf, other

    cs.SD cs.CL eess.AS

    WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition

    Authors: Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen

    Abstract: Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR). AED models, however, still have drawbacks when deploying in commercial applications. Autoregressive beam search decoding makes it inefficient for high-concurrency applications. It is also inconvenient to integrate external word-level languag… ▽ More

    Submitted 20 April, 2021; v1 submitted 8 April, 2021; originally announced April 2021.

  27. arXiv:2104.02868  [pdf, other

    cs.SD eess.AS

    Darts-Conformer: Towards Efficient Gradient-Based Neural Architecture Search For End-to-End ASR

    Authors: Xian Shi, Pan Zhou, Wei Chen, Lei Xie

    Abstract: Neural architecture search (NAS) has been successfully applied to tasks like image classification and language modeling for finding efficient high-performance network architectures. In ASR field especially end-to-end ASR, the related research is still in its infancy. In this work, we focus on applying NAS on the most popular manually designed model: Conformer, and then propose an efficient ASR mod… ▽ More

    Submitted 10 August, 2021; v1 submitted 6 April, 2021; originally announced April 2021.

    Comments: Submitted to ASRU 2021

  28. arXiv:2012.11896  [pdf, other

    cs.CL cs.SD eess.AS

    Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

    Authors: Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin

    Abstract: Low-resource automatic speech recognition (ASR) is challenging, as the low-resource target language data cannot well train an ASR model. To solve this issue, meta-learning formulates ASR for each source language into many small ASR tasks and meta-learns a model initialization on all tasks from different source languages to access fast adaptation on unseen target languages. However, for different s… ▽ More

    Submitted 12 April, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: accepted in AAAI2021

  29. arXiv:2009.01502  [pdf, other

    cs.MA cs.DC cs.LG eess.SY

    DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control in the IoV

    Authors: Pengyuan Zhou, Xianfu Chen, Zhi Liu, Tristan Braud, Pan Hui, Jussi Kangasharju

    Abstract: The Internet of Vehicles (IoV) enables real-time data exchange among vehicles and roadside units and thus provides a promising solution to alleviate traffic jams in the urban area. Meanwhile, better traffic management via efficient traffic light control can benefit the IoV as well by enabling a better communication environment and decreasing the network load. As such, IoV and efficient traffic lig… ▽ More

    Submitted 5 January, 2021; v1 submitted 3 September, 2020; originally announced September 2020.

    Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems

  30. arXiv:2008.06896  [pdf, other

    cs.RO eess.SY

    Adaptive Shape Servoing of Elastic Rods using Parameterized Regression Features and Auto-Tuning Motion Controls

    Authors: Jiaming Qi, Guangtao Ran, Bohui Wang, Jian Liu, Wanyu Ma, Peng Zhou, David Navarro-Alarcon

    Abstract: The robotic manipulation of deformable linear objects has shown great potential in a wide range of real-world applications. However, it presents many challenges due to the objects' complex nonlinearity and high-dimensional configuration. In this paper, we propose a new shape servoing framework to automatically manipulate elastic rods through visual feedback. Our new method uses parameterized regre… ▽ More

    Submitted 9 September, 2023; v1 submitted 16 August, 2020; originally announced August 2020.

    Comments: 8 pages, 12 figures

  31. arXiv:2004.00799  [pdf, ps, other

    cs.NI eess.SP

    Cost-efficient and Skew-aware Data Scheduling for Incremental Learning in 5G Network

    Authors: Lingjun Pu, Xin**g Yuan, Xiaohang Xu, Xu Chen, Pan Zhou, **gdong Xu

    Abstract: To facilitate the emerging applications in 5G networks, mobile network operators will provide many network functions in terms of control and prediction. Recently, they have recognized the power of machine learning (ML) and started to explore its potential to facilitate those network functions. Nevertheless, the current ML models for network functions are often derived in an offline manner, which i… ▽ More

    Submitted 12 September, 2021; v1 submitted 1 April, 2020; originally announced April 2020.

  32. arXiv:2001.00149  [pdf

    eess.IV cs.CV physics.med-ph

    Simulation of Skin Stretching around the Forehead Wrinkles in Rhytidectomy

    Authors: ** Zhou, Shuo Huang, Qiang Chen, Siyuan He, Guochao Cai

    Abstract: Objective: Skin stretching around the forehead wrinkles is an important method in rhytidectomy. Proper parameters are required to evaluate the surgical effect. In this paper, a simulation method was proposed to obtain the parameters. Methods: Three-dimensional point cloud data with a resolution of 50 μm were employed. First, a smooth supporting contour under the wrinkled forehead was generated via… ▽ More

    Submitted 1 January, 2020; originally announced January 2020.

  33. arXiv:1911.09275  [pdf, other

    eess.SP cs.LG

    A Machine Learning-enhanced Robust P-Phase Picker for Real-time Seismic Monitoring

    Authors: Dazhong Shen, Qi Zhang, Tong Xu, Hengshu Zhu, Wenjia Zhao, Zikai Yin, Peilun Zhou, Lihua Fang, Enhong Chen, Hui Xiong

    Abstract: Identifying the arrival times of seismic P-phases plays a significant role in real-time seismic monitoring, which provides critical guidance for emergency response activities. While considerable research has been conducted on this topic, efficiently capturing the arrival times of seismic P-phases hidden within intensively distributed and noisy seismic waves, such as those generated by the aftersho… ▽ More

    Submitted 20 August, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: Note that this paper is the English version of our work published in SCIENTIA SINICA Informationis (http://engine.scichina.com/doi/10.1360/SSI-2020-0214), which is suggested to be cited if needed

  34. arXiv:1911.00203  [pdf, other

    cs.CL eess.AS

    Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding

    Authors: Pan Zhou, Ruchao Fan, Wei Chen, Jia Jia

    Abstract: Transformer has shown promising results in many sequence to sequence transformation tasks recently. It utilizes a number of feed-forward self-attention layers to replace the recurrent neural networks (RNN) in attention-based encoder decoder (AED) architecture. Self-attention layer learns temporal dependence by incorporating sinusoidal positional embedding of tokens in a sequence for parallel compu… ▽ More

    Submitted 30 November, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

  35. arXiv:1908.10959  [pdf, other

    eess.SP cs.LG eess.IV math.OC stat.ML

    Short-and-Sparse Deconvolution -- A Geometric Approach

    Authors: Yenson Lau, Qing Qu, Han-Wen Kuo, Pengcheng Zhou, Yuqian Zhang, John Wright

    Abstract: Short-and-sparse deconvolution (SaSD) is the problem of extracting localized, recurring motifs in signals with spatial or temporal structure. Variants of this problem arise in applications such as image deblurring, microscopy, neural spike sorting, and more. The problem is challenging in both theory and practice, as natural optimization formulations are nonconvex. Moreover, practical deconvolution… ▽ More

    Submitted 1 October, 2019; v1 submitted 28 August, 2019; originally announced August 2019.

    Comments: *YL and QQ contributed equally to this work; 30 figures, 45 pages; This version: added an experiment comparing with other methods, corrected typos and added references

  36. arXiv:1907.08363  [pdf, other

    eess.SY

    Joint Coverage and Power Control in Highly Dynamic and Massive UAV Networks: An Aggregative Game-theoretic Learning Approach

    Authors: Zhuoying Li, Pan Zhou, Yanru Zhang, Lin Gao

    Abstract: Unmanned aerial vehicles (UAV) ad-hoc network is a significant contingency plan for communication after a natural disaster, such as typhoon and earthquake. To achieve efficient and rapid networks deployment, we employ noncooperative game theory and amended binary log-linear algorithm (BLLA) seeking for the Nash equilibrium which achieves the optimal network performance. We not only take channel ov… ▽ More

    Submitted 13 April, 2024; v1 submitted 18 July, 2019; originally announced July 2019.

  37. arXiv:1907.06081  [pdf

    physics.optics eess.IV

    Preliminary study on the modal decomposition of Hermite Gaussian beams via deep learning

    Authors: Yi An, Tianyue Hou, Jun Li, Liang** Huang, **yong Leng, Lijia Yang, Pu Zhou

    Abstract: The Hermite-Gaussian (HG) modes make up a complete and orthonormal basis, which have been extensively used to describe optical fields. Here, we demonstrate, for the first time to our knowledge, deep learning-based modal decomposition (MD) of HG beams. This method offers a fast, economical and robust way to acquire both the power content and phase information through a single-shot beam intensity im… ▽ More

    Submitted 16 July, 2019; v1 submitted 13 July, 2019; originally announced July 2019.

    Comments: 6 figures

  38. arXiv:1906.06972  [pdf, other

    cs.CV eess.IV

    EnlightenGAN: Deep Light Enhancement without Paired Supervision

    Authors: Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, Zhangyang Wang

    Abstract: Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data? As one such example, this paper explores the low-light image enhancement problem, where in practice it is extremely challenging to simultaneously take a low-light and a normal-light photo of the same visual scene. We propose… ▽ More

    Submitted 24 January, 2021; v1 submitted 17 June, 2019; originally announced June 2019.

  39. arXiv:1906.01895  [pdf, ps, other

    cs.CV eess.IV

    AI-Skin : Skin Disease Recognition based on Self-learning and Wide Data Collection through a Closed Loop Framework

    Authors: Min Chen, ** Zhou, Di Wu, Long Hu, Mohammad Mehedi Hassan, Atif Alamri

    Abstract: There are a lot of hidden dangers in the change of human skin conditions, such as the sunburn caused by long-time exposure to ultraviolet radiation, which not only has aesthetic impact causing psychological depression and lack of self-confidence, but also may even be life-threatening due to skin canceration. Current skin disease researches adopt the auto-classification system for improving the acc… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

  40. arXiv:1904.11983  [pdf

    eess.IV physics.optics

    Deep learning enabled superfast and accurate M^2 evaluation for fiber beams

    Authors: Yi An, Jun Li, Liang** Huang, **yong Leng, Lijia Yang, Pu Zhou

    Abstract: We introduce deep learning technique to predict the beam propagation factor M^2 of the laser beams emitting from few-mode fiber for the first time, to the best of our knowledge. The deep convolutional neural network (CNN) is trained with paired data of simulated near-field beam patterns and their calculated M^2 value, aiming at learning a fast and accurate map** from the former to the latter. Th… ▽ More

    Submitted 13 July, 2019; v1 submitted 26 April, 2019; originally announced April 2019.

    Comments: 12 pages, 10 figures

    Journal ref: Optics Express, 27, 18683-18694 (2019)

  41. arXiv:1811.05250  [pdf, ps, other

    cs.CL cs.CV cs.SD eess.AS

    Modality Attention for End-to-End Audio-visual Speech Recognition

    Authors: Pan Zhou, Wenwen Yang, Wei Chen, Yanfeng Wang, Jia Jia

    Abstract: Audio-visual speech recognition (AVSR) system is thought to be one of the most promising solutions for robust speech recognition, especially in noisy environment. In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance. Our method is realized using… ▽ More

    Submitted 23 April, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

    Comments: accepted by ICASSP2019

  42. arXiv:1811.05247  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    An Online Attention-based Model for Speech Recognition

    Authors: Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu

    Abstract: Attention-based end-to-end models such as Listen, Attend and Spell (LAS), simplify the whole pipeline of traditional automatic speech recognition (ASR) systems and become popular in the field of speech recognition. In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global… ▽ More

    Submitted 25 April, 2019; v1 submitted 13 November, 2018; originally announced November 2018.

  43. arXiv:1811.05097  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Exploring RNN-Transducer for Chinese Speech Recognition

    Authors: Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie

    Abstract: End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system. RNN transducer (RNN-T) is one of the popular end-to-end methods. Previous studies have shown that RNN-T is difficult to train and a very complex training process is needed for a reasonable performance. In this paper, we explore RNN-T for a Chinese… ▽ More

    Submitted 22 April, 2019; v1 submitted 12 November, 2018; originally announced November 2018.

  44. arXiv:1811.00882  [pdf

    eess.SP physics.optics

    Learning to decompose the modes in few-mode fibers with deep convolutional neural network

    Authors: Yi An, Liang** Huang, Jun Li, **yong Leng, Lijia Yang, Pu Zhou

    Abstract: We introduce deep learning technique to perform complete mode decomposition for few-mode optical fiber for the first time. Our goal is to learn a fast and accurate map** from near-field beam profiles to the complete mode coefficients, including both modal amplitudes and phases. We train the convolutional neural network with simulated beam patterns, and evaluate the network on both of the simulat… ▽ More

    Submitted 18 April, 2019; v1 submitted 31 October, 2018; originally announced November 2018.

    Journal ref: Optics Express Vol. 27, Issue 7, pp. 10127-10137 (2019)

  45. arXiv:1710.03190  [pdf, other

    physics.soc-ph eess.SY

    Estimating Heterogeneous Treatment Effects in Residential Demand Response

    Authors: Datong P. Zhou, Maximilian Balandat, Claire J. Tomlin

    Abstract: We evaluate the causal effect of hour-ahead price interventions on the reduction in residential electricity consumption using a data set from a large-scale experiment on 7,000 households in California. By estimating user-level counterfactuals using time-series prediction, we estimate an average treatment effect of ~0.10 kWh (11%) per intervention and household. Next, we leverage causal decision tr… ▽ More

    Submitted 25 October, 2018; v1 submitted 6 October, 2017; originally announced October 2017.

    Comments: 8 pages, 11 figures, 3 tables

  46. arXiv:1703.00976  [pdf, other

    eess.SY

    Hedging Strategies for Load-Serving Entities in Wholesale Electricity Markets

    Authors: Datong P. Zhou, Munther A. Dahleh, Claire J. Tomlin

    Abstract: Load-serving entities which procure electricity from the wholesale electricity market to service end-users face significant quantity and price risks due to the volatile nature of electricity demand and quasi-fixed residential tariffs at which electricity is sold. This paper investigates strategies for load serving entities to hedge against such price risks. Specifically, we compute profit-maximizi… ▽ More

    Submitted 17 March, 2017; v1 submitted 2 March, 2017; originally announced March 2017.

    Comments: 8 pages, 7 figures

  47. arXiv:1609.06193  [pdf, other

    eess.SY

    Stability Analysis of Wholesale Electricity Markets under Dynamic Consumption Models and Real-Time Pricing

    Authors: Datong P. Zhou, Mardavij Roozbehani, Munther A. Dahleh, Claire J. Tomlin

    Abstract: This paper analyzes stability conditions for wholesale electricity markets under real-time retail pricing and realistic consumption models with memory, which explicitly take into account previous electricity prices and consumption levels. By passing on the current retail price of electricity from supplier to consumer and feeding the observed consumption back to the supplier, a closed-loop dynamica… ▽ More

    Submitted 18 February, 2017; v1 submitted 20 September, 2016; originally announced September 2016.

    Comments: 8 pages, 7 Figures, accepted to the 2017 American Control Conference