-
Optimizing Contrail Detection: A Deep Learning Approach with EfficientNet-b4 Encoding
Authors:
Qunwei Lin,
Qian Leng,
Zhicheng Ding,
Chao Yan,
Xiaonan Xu
Abstract:
In the pursuit of environmental sustainability, the aviation industry faces the challenge of minimizing its ecological footprint. Among the key solutions is contrail avoidance, targeting the linear ice-crystal clouds produced by aircraft exhaust. These contrails exacerbate global warming by trap** atmospheric heat, necessitating precise segmentation and comprehensive analysis of contrail images…
▽ More
In the pursuit of environmental sustainability, the aviation industry faces the challenge of minimizing its ecological footprint. Among the key solutions is contrail avoidance, targeting the linear ice-crystal clouds produced by aircraft exhaust. These contrails exacerbate global warming by trap** atmospheric heat, necessitating precise segmentation and comprehensive analysis of contrail images to gauge their environmental impact. However, this segmentation task is complex due to the varying appearances of contrails under different atmospheric conditions and potential misalignment issues in predictive modeling. This paper presents an innovative deep-learning approach utilizing the efficient net-b4 encoder for feature extraction, seamlessly integrating misalignment correction, soft labeling, and pseudo-labeling techniques to enhance the accuracy and efficiency of contrail detection in satellite imagery. The proposed methodology aims to redefine contrail image analysis and contribute to the objectives of sustainable aviation by providing a robust framework for precise contrail detection and analysis in satellite imagery, thus aiding in the mitigation of aviation's environmental impact.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Disturbance Rejection-Guarded Learning for Vibration Suppression of Two-Inertia Systems
Authors:
Fan Zhang,
**feng Chen,
Yu Hu,
Zhiqiang Gao,
Ge Lv,
Qin Lin
Abstract:
Model uncertainty presents significant challenges in vibration suppression of multi-inertia systems, as these systems often rely on inaccurate nominal mathematical models due to system identification errors or unmodeled dynamics. An observer, such as an extended state observer (ESO), can estimate the discrepancy between the inaccurate nominal model and the true model, thus improving control perfor…
▽ More
Model uncertainty presents significant challenges in vibration suppression of multi-inertia systems, as these systems often rely on inaccurate nominal mathematical models due to system identification errors or unmodeled dynamics. An observer, such as an extended state observer (ESO), can estimate the discrepancy between the inaccurate nominal model and the true model, thus improving control performance via disturbance rejection. The conventional observer design is memoryless in the sense that once its estimated disturbance is obtained and sent to the controller, the datum is discarded. In this research, we propose a seamless integration of ESO and machine learning. On one hand, the machine learning model attempts to model the disturbance. With the assistance of prior information about the disturbance, the observer is expected to achieve faster convergence in disturbance estimation. On the other hand, machine learning benefits from an additional assurance layer provided by the ESO, as any imperfections in the machine learning model can be compensated for by the ESO. We validated the effectiveness of this novel learning-for-control paradigm through simulation and physical tests on two-inertial motion control systems used for vibration studies.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Application analysis of ai technology combined with spiral CT scanning in early lung cancer screening
Authors:
Shulin Li,
Liqiang Yu,
Bo Liu,
Qunwei Lin,
Jiaxin Huang
Abstract:
At present, the incidence and fatality rate of lung cancer in China rank first among all malignant tumors. Despite the continuous development and improvement of China's medical level, the overall 5-year survival rate of lung cancer patients is still lower than 20% and is staged. A number of studies have confirmed that early diagnosis and treatment of early stage lung cancer is of great significanc…
▽ More
At present, the incidence and fatality rate of lung cancer in China rank first among all malignant tumors. Despite the continuous development and improvement of China's medical level, the overall 5-year survival rate of lung cancer patients is still lower than 20% and is staged. A number of studies have confirmed that early diagnosis and treatment of early stage lung cancer is of great significance to improve the prognosis of patients. In recent years, artificial intelligence technology has gradually begun to be applied in oncology. ai is used in cancer screening, clinical diagnosis, radiation therapy (image acquisition, at-risk organ segmentation, image calibration and delivery) and other aspects of rapid development. However, whether medical ai can be socialized depends on the public's attitude and acceptance to a certain extent. However, at present, there are few studies on the diagnosis of early lung cancer by AI technology combined with SCT scanning. In view of this, this study applied the combined method in early lung cancer screening, aiming to find a safe and efficient screening mode and provide a reference for clinical diagnosis and treatment.
△ Less
Submitted 26 January, 2024;
originally announced February 2024.
-
Activity Detection for Massive Connectivity in Cell-free Networks with Unknown Large-scale Fading, Channel Statistics, Noise Variance, and Activity Probability: A Bayesian Approach
Authors:
Hao Zhang,
Qingfeng Lin,
Yang Li,
Lei Cheng,
Yik-Chung Wu
Abstract:
Activity detection is an important task in the next generation grant-free multiple access. While there are a number of existing algorithms designed for this purpose, they mostly require precise information about the network, such as large-scale fading coefficients, small-scale fading channel statistics, noise variance at the access points, and user activity probability. Acquiring these information…
▽ More
Activity detection is an important task in the next generation grant-free multiple access. While there are a number of existing algorithms designed for this purpose, they mostly require precise information about the network, such as large-scale fading coefficients, small-scale fading channel statistics, noise variance at the access points, and user activity probability. Acquiring these information would take a significant overhead and their estimated values might not be accurate. This problem is even more severe in cell-free networks as there are many of these parameters to be acquired. Therefore, this paper sets out to investigate the activity detection problem without the above-mentioned information. In order to handle so many unknown parameters, this paper employs the Bayesian approach, where the unknown variables are endowed with prior distributions which effectively act as regularizations. Together with the likelihood function, a maximum a posteriori (MAP) estimator and a variational inference algorithm are derived. Extensive simulations demonstrate that the proposed methods, even without the knowledge of these system parameters, perform better than existing state-of-the-art methods, such as covariance-based and approximate message passing methods.
△ Less
Submitted 2 February, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Robust Control Barrier Functions for Safe Control Under Uncertainty Using Extended State Observer and Output Measurement
Authors:
**feng Chen,
Zhiqiang Gao,
Qin Lin
Abstract:
Control barrier functions-based quadratic programming (CBF-QP) is gaining popularity as an effective controller synthesis tool for safe control. However, the provable safety is established on an accurate dynamic model and access to all states. To address such a limitation, this paper proposes a novel design combining an extended state observer (ESO) with a CBF for safe control of a system with mod…
▽ More
Control barrier functions-based quadratic programming (CBF-QP) is gaining popularity as an effective controller synthesis tool for safe control. However, the provable safety is established on an accurate dynamic model and access to all states. To address such a limitation, this paper proposes a novel design combining an extended state observer (ESO) with a CBF for safe control of a system with model uncertainty and external disturbances only using output measurement. Our approach provides a less conservative estimation error bound than other disturbance observer-based CBFs. Moreover, only output measurements are needed to estimate the disturbances instead of access to the full state. The bounds of state estimation error and disturbance estimation error are obtained in a unified manner and then used for robust safe control under uncertainty. We validate our approach's efficacy in simulations of an adaptive cruise control system and a Segway self-balancing scooter.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
A Survey of Time Series Anomaly Detection Methods in the AIOps Domain
Authors:
Zhenyu Zhong,
Qiliang Fan,
Jiacheng Zhang,
Minghua Ma,
Shenglin Zhang,
Yongqian Sun,
Qingwei Lin,
Yuzhi Zhang,
Dan Pei
Abstract:
Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection…
▽ More
Internet-based services have seen remarkable success, generating vast amounts of monitored key performance indicators (KPIs) as univariate or multivariate time series. Monitoring and analyzing these time series are crucial for researchers, service operators, and on-call engineers to detect outliers or anomalies indicating service failures or significant events. Numerous advanced anomaly detection methods have emerged to address availability and performance issues. This review offers a comprehensive overview of time series anomaly detection in Artificial Intelligence for IT operations (AIOps), which uses AI capabilities to automate and optimize operational workflows. Additionally, it explores future directions for real-world and next-generation time-series anomaly detection based on recent advancements.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Asynchronous Activity Detection for Cell-Free Massive MIMO: From Centralized to Distributed Algorithms
Authors:
Yang Li,
Qingfeng Lin,
Ya-Feng Liu,
Bo Ai,
Yik-Chung Wu
Abstract:
Device activity detection in the emerging cell-free massive multiple-input multiple-output (MIMO) systems has been recognized as a crucial task in machine-type communications, in which multiple access points (APs) jointly identify the active devices from a large number of potential devices based on the received signals. Most of the existing works addressing this problem rely on the impractical ass…
▽ More
Device activity detection in the emerging cell-free massive multiple-input multiple-output (MIMO) systems has been recognized as a crucial task in machine-type communications, in which multiple access points (APs) jointly identify the active devices from a large number of potential devices based on the received signals. Most of the existing works addressing this problem rely on the impractical assumption that different active devices transmit signals synchronously. However, in practice, synchronization cannot be guaranteed due to the low-cost oscillators, which brings additional discontinuous and nonconvex constraints to the detection problem. To address this challenge, this paper reveals an equivalent reformulation to the asynchronous activity detection problem, which facilitates the development of a centralized algorithm and a distributed algorithm that satisfy the highly nonconvex constraints in a gentle fashion as the iteration number increases, so that the sequence generated by the proposed algorithms can get around bad stationary points. To reduce the capacity requirements of the fronthauls, we further design a communication-efficient accelerated distributed algorithm. Simulation results demonstrate that the proposed centralized and distributed algorithms outperform state-of-the-art approaches, and the proposed accelerated distributed algorithm achieves a close detection performance to that of the centralized algorithm but with a much smaller number of bits to be transmitted on the fronthaul links.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
Scale-free and Task-agnostic Attack: Generating Photo-realistic Adversarial Patterns with Patch Quilting Generator
Authors:
Xiangbo Gao,
Cheng Luo,
Qinliang Lin,
Weicheng Xie,
Minmin Liu,
Linlin Shen,
Keerthy Kusumam,
Siyang Song
Abstract:
\noindent Traditional L_p norm-restricted image attack algorithms suffer from poor transferability to black box scenarios and poor robustness to defense algorithms. Recent CNN generator-based attack approaches can synthesize unrestricted and semantically meaningful entities to the image, which is shown to be transferable and robust. However, such methods attack images by either synthesizing local…
▽ More
\noindent Traditional L_p norm-restricted image attack algorithms suffer from poor transferability to black box scenarios and poor robustness to defense algorithms. Recent CNN generator-based attack approaches can synthesize unrestricted and semantically meaningful entities to the image, which is shown to be transferable and robust. However, such methods attack images by either synthesizing local adversarial entities, which are only suitable for attacking specific contents or performing global attacks, which are only applicable to a specific image scale. In this paper, we propose a novel Patch Quilting Generative Adversarial Networks (PQ-GAN) to learn the first scale-free CNN generator that can be applied to attack images with arbitrary scales for various computer vision tasks. The principal investigation on transferability of the generated adversarial examples, robustness to defense frameworks, and visual quality assessment show that the proposed PQG-based attack framework outperforms the other nine state-of-the-art adversarial attack approaches when attacking the neural networks trained on two standard evaluation datasets (i.e., ImageNet and CityScapes).
△ Less
Submitted 19 November, 2022; v1 submitted 12 August, 2022;
originally announced August 2022.
-
Online Target Speaker Voice Activity Detection for Speaker Diarization
Authors:
Weiqing Wang,
Qingjian Lin,
Ming Li
Abstract:
This paper proposes an online target speaker voice activity detection system for speaker diarization tasks, which does not require a priori knowledge from the clustering-based diarization system to obtain the target speaker embeddings. First, we employ a ResNet-based front-end model to extract the frame-level speaker embeddings for each coming block of a signal. Next, we predict the detection stat…
▽ More
This paper proposes an online target speaker voice activity detection system for speaker diarization tasks, which does not require a priori knowledge from the clustering-based diarization system to obtain the target speaker embeddings. First, we employ a ResNet-based front-end model to extract the frame-level speaker embeddings for each coming block of a signal. Next, we predict the detection state of each speaker based on these frame-level speaker embeddings and the previously estimated target speaker embedding. Then, the target speaker embeddings are updated by aggregating these frame-level speaker embeddings according to the predictions in the current block. We iteratively extract the results for each block and update the target speaker embedding until reaching the end of the signal. Experimental results show that the proposed method is better than the offline clustering-based diarization system on the AliMeeting dataset.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Low-Latency Online Speaker Diarization with Graph-Based Label Generation
Authors:
Yucong Zhang,
Qinjian Lin,
Weiqing Wang,
Lin Yang,
Xuyang Wang,
Junjie Wang,
Ming Li
Abstract:
This paper introduces an online speaker diarization system that can handle long-time audio with low latency. We enable Agglomerative Hierarchy Clustering (AHC) to work in an online fashion by introducing a label matching algorithm. This algorithm solves the inconsistency between output labels and hidden labels that are generated each turn. To ensure the low latency in the online setting, we introd…
▽ More
This paper introduces an online speaker diarization system that can handle long-time audio with low latency. We enable Agglomerative Hierarchy Clustering (AHC) to work in an online fashion by introducing a label matching algorithm. This algorithm solves the inconsistency between output labels and hidden labels that are generated each turn. To ensure the low latency in the online setting, we introduce a variant of AHC, namely chkpt-AHC, to cluster the speakers. In addition, we propose a speaker embedding graph to exploit a graph-based re-clustering method, further improving the performance. In the experiment, we evaluate our systems on both DIHARD3 and VoxConverse datasets. The experimental results show that our proposed online systems have better performance than our baseline online system and have comparable performance to our offline systems. We find out that the framework combining the chkpt-AHC method and the label matching algorithm works well in the online setting. Moreover, the chkpt-AHC method greatly reduces the time cost, while the graph-based re-clustering method helps improve the performance.
△ Less
Submitted 24 June, 2022; v1 submitted 26 November, 2021;
originally announced November 2021.
-
Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification
Authors:
Qingjian Lin,
Lin Yang,
Xuyang Wang,
Xiaoyi Qin,
Junjie Wang,
Ming Li
Abstract:
With the development of deep learning, automatic speaker verification has made considerable progress over the past few years. However, to design a lightweight and robust system with limited computational resources is still a challenging problem. Traditionally, a speaker verification system is symmetrical, indicating that the same embedding extraction model is applied for both enrollment and verifi…
▽ More
With the development of deep learning, automatic speaker verification has made considerable progress over the past few years. However, to design a lightweight and robust system with limited computational resources is still a challenging problem. Traditionally, a speaker verification system is symmetrical, indicating that the same embedding extraction model is applied for both enrollment and verification in inference. In this paper, we come up with an innovative asymmetric structure, which takes the large-scale ECAPA-TDNN model for enrollment and the small-scale ECAPA-TDNNLite model for verification. As a symmetrical system, our proposed ECAPA-TDNNLite model achieves an EER of 3.07% on the Voxceleb1 original test set with only 11.6M FLOPS. Moreover, the asymmetric structure further reduces the EER to 2.31%, without increasing any computational costs during verification.
△ Less
Submitted 25 January, 2022; v1 submitted 8 October, 2021;
originally announced October 2021.
-
The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge
Authors:
Weiqing Wang,
Danwei Cai,
Qingjian Lin,
Lin Yang,
Junjie Wang,
** Wang,
Ming Li
Abstract:
This report describes the submission of the DKU-DukeECE-Lenovo team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2021 track 4. Our system including a voice activity detection (VAD) model, a speaker embedding model, two clustering-based speaker diarization systems with different similarity measurements, two different overlapped speech detection (OSD) models, and a target-speaker voice act…
▽ More
This report describes the submission of the DKU-DukeECE-Lenovo team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2021 track 4. Our system including a voice activity detection (VAD) model, a speaker embedding model, two clustering-based speaker diarization systems with different similarity measurements, two different overlapped speech detection (OSD) models, and a target-speaker voice activity detection (TS-VAD) model. Our final submission, consisting of 5 independent systems, achieves a DER of 5.07% on the challenge test set.
△ Less
Submitted 6 September, 2021; v1 submitted 5 September, 2021;
originally announced September 2021.
-
DL-AMP and DBTO: An Automatic Merge Planning and Trajectory Optimization and Its Application in Autonomous Driving
Authors:
Yuncheng Jiang,
Qi Lin,
Jiwei Zhang,
Jun Wang,
Danjian Qian,
Yuxi Cai
Abstract:
This paper presents an automatic merging algorithm for autonomous driving vehicles, which decouples the specific motion planning problem into a Dual-Layer Automatic Merge Planning (DL_AMP) and a Descent-Based Trajectory Optimization (DBTO). This work leads to great improvements in finding the best merge opportunity, lateral and longitudinal merge planning and control, trajectory postprocessing and…
▽ More
This paper presents an automatic merging algorithm for autonomous driving vehicles, which decouples the specific motion planning problem into a Dual-Layer Automatic Merge Planning (DL_AMP) and a Descent-Based Trajectory Optimization (DBTO). This work leads to great improvements in finding the best merge opportunity, lateral and longitudinal merge planning and control, trajectory postprocessing and driving comfort.
△ Less
Submitted 29 July, 2021; v1 submitted 6 July, 2021;
originally announced July 2021.
-
Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits
Authors:
Qingjian Lin,
Lin Yang,
Xuyang Wang,
Luyuan Xie,
Chen Jia,
Junjie Wang
Abstract:
Target speech separation is the process of filtering a certain speaker's voice out of speech mixtures according to the additional speaker identity information provided. Recent works have made considerable improvement by processing signals in the time domain directly. The majority of them take fully overlapped speech mixtures for training. However, since most real-life conversations occur randomly…
▽ More
Target speech separation is the process of filtering a certain speaker's voice out of speech mixtures according to the additional speaker identity information provided. Recent works have made considerable improvement by processing signals in the time domain directly. The majority of them take fully overlapped speech mixtures for training. However, since most real-life conversations occur randomly and are sparsely overlapped, we argue that training with different overlap ratio data benefits. To do so, an unavoidable problem is that the popularly used SI-SNR loss has no definition for silent sources. This paper proposes the weighted SI-SNR loss, together with the joint learning of target speech separation and personal VAD. The weighted SI-SNR loss imposes a weight factor that is proportional to the target speaker's duration and returns zero when the target speaker is absent. Meanwhile, the personal VAD generates masks and sets non-target speech to silence. Experiments show that our proposed method outperforms the baseline by 1.73 dB in terms of SDR on fully overlapped speech, as well as by 4.17 dB and 0.9 dB on sparsely overlapped speech of clean and noisy conditions. Besides, with slight degradation in performance, our model could reduce the time costs in inference.
△ Less
Submitted 26 September, 2021; v1 submitted 27 June, 2021;
originally announced June 2021.
-
Optimal Online Algorithms for Peak-Demand Reduction Maximization with Energy Storage
Authors:
Yanfang Mo,
Qiulin Lin,
Minghua Chen,
Si-Zhao Joe Qin
Abstract:
The high proportions of demand charges in electric bills motivate large-power customers to leverage energy storage for reducing the peak procurement from the outer grid. Given limited energy storage, we expect to maximize the peak-demand reduction in an online fashion, challenged by the highly uncertain demands and renewable injections, the non-cumulative nature of peak consumption, and the coupli…
▽ More
The high proportions of demand charges in electric bills motivate large-power customers to leverage energy storage for reducing the peak procurement from the outer grid. Given limited energy storage, we expect to maximize the peak-demand reduction in an online fashion, challenged by the highly uncertain demands and renewable injections, the non-cumulative nature of peak consumption, and the coupling of online decisions. In this paper, we propose an optimal online algorithm that achieves the best competitive ratio, following the idea of maintaining a constant ratio between the online and the optimal offline peak-reduction performance. We further show that the optimal competitive ratio can be computed by solving a linear number of linear-fractional programs. Moreover, we extend the algorithm to adaptively maintain the best competitive ratio given the revealed inputs and actions at each decision-making round. The adaptive algorithm retains the optimal worst-case guarantee and attains improved average-case performance. We evaluate our proposed algorithms using real-world traces and show that they obtain up to 81% peak reduction of the optimal offline benchmark. Additionally, the adaptive algorithm achieves at least 20% more peak reduction against baseline alternatives.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
CKNet: A Convolutional Neural Network Based on Koopman Operator for Modeling Latent Dynamics from Pixels
Authors:
Yongqian Xiao,
Xin Xu,
QianLi Lin
Abstract:
With the development of end-to-end control based on deep learning, it is important to study new system modeling techniques to realize dynamics modeling with high-dimensional inputs. In this paper, a novel Koopman-based deep convolutional network, called CKNet, is proposed to identify latent dynamics from raw pixels. CKNet learns an encoder and decoder to play the role of the Koopman eigenfunctions…
▽ More
With the development of end-to-end control based on deep learning, it is important to study new system modeling techniques to realize dynamics modeling with high-dimensional inputs. In this paper, a novel Koopman-based deep convolutional network, called CKNet, is proposed to identify latent dynamics from raw pixels. CKNet learns an encoder and decoder to play the role of the Koopman eigenfunctions and modes, respectively. The Koopman eigenvalues can be approximated by eigenvalues of the learned state transition matrix. The deterministic convolutional Koopman network (DCKNet) and the variational convolutional Koopman network (VCKNet) are proposed to span some subspace for approximating the Koopman operator respectively. Because CKNet is trained under the constraints of the Koopman theory, the identified latent dynamics is in a linear form and has good interpretability. Besides, the state transition and control matrices are trained as trainable tensors so that the identified dynamics is also time-invariant. We also design an auxiliary weight term for reducing multi-step linearity and prediction losses. Experiments were conducted on two offline trained and four online trained nonlinear forced dynamical systems with continuous action spaces in Gym and Mujoco environment respectively, and the results show that identified dynamics are adequate for approximating the latent dynamics and generating clear images. Especially for offline trained cases, this work confirms CKNet from a novel perspective that we visualize the evolutionary processes of the latent states and the Koopman eigenfunctions with DCKNet and VCKNet separately to each task based on the same episode and results demonstrate that different approaches learn similar features in shapes.
△ Less
Submitted 27 July, 2021; v1 submitted 19 February, 2021;
originally announced February 2021.
-
The DKU-Duke-Lenovo System Description for the Third DIHARD Speech Diarization Challenge
Authors:
Weiqing Wang,
Qingjian Lin,
Danwei Cai,
Lin Yang,
Ming Li
Abstract:
In this paper, we present the submitted system for the third DIHARD Speech Diarization Challenge from the DKU-Duke-Lenovo team. Our system consists of several modules: voice activity detection (VAD), segmentation, speaker embedding extraction, attentive similarity scoring, agglomerative hierarchical clustering. In addition, the target speaker VAD (TSVAD) is used for the phone call data to further…
▽ More
In this paper, we present the submitted system for the third DIHARD Speech Diarization Challenge from the DKU-Duke-Lenovo team. Our system consists of several modules: voice activity detection (VAD), segmentation, speaker embedding extraction, attentive similarity scoring, agglomerative hierarchical clustering. In addition, the target speaker VAD (TSVAD) is used for the phone call data to further improve the performance. Our final submitted system achieves a DER of 15.43% for the core evaluation set and 13.39% for the full evaluation set on task 1, and we also get a DER of 21.63% for core evaluation set and 18.90% for full evaluation set on task 2.
△ Less
Submitted 6 February, 2021;
originally announced February 2021.
-
Galaxy Image Translation with Semi-supervised Noise-reconstructed Generative Adversarial Networks
Authors:
Qiufan Lin,
Dominique Fouchez,
Jérôme Pasquet
Abstract:
Image-to-image translation with Deep Learning neural networks, particularly with Generative Adversarial Networks (GANs), is one of the most powerful methods for simulating astronomical images. However, current work is limited to utilizing paired images with supervised translation, and there has been rare discussion on reconstructing noise background that encodes instrumental and observational effe…
▽ More
Image-to-image translation with Deep Learning neural networks, particularly with Generative Adversarial Networks (GANs), is one of the most powerful methods for simulating astronomical images. However, current work is limited to utilizing paired images with supervised translation, and there has been rare discussion on reconstructing noise background that encodes instrumental and observational effects. These limitations might be harmful for subsequent scientific applications in astrophysics. Therefore, we aim to develop methods for using unpaired images and preserving noise characteristics in image translation. In this work, we propose a two-way image translation model using GANs that exploits both paired and unpaired images in a semi-supervised manner, and introduce a noise emulating module that is able to learn and reconstruct noise characterized by high-frequency features. By experimenting on multi-band galaxy images from the Sloan Digital Sky Survey (SDSS) and the Canada France Hawaii Telescope Legacy Survey (CFHT), we show that our method recovers global and local properties effectively and outperforms benchmark image translation models. To our best knowledge, this work is the first attempt to apply semi-supervised methods and noise reconstruction techniques in astrophysical studies.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution
Authors:
Xiaoyu Xiang,
Qian Lin,
Jan P. Allebach
Abstract:
Due to the limits of bandwidth and storage space, digital images are usually down-scaled and compressed when transmitted over networks, resulting in loss of details and jarring artifacts that can lower the performance of high-level visual tasks. In this paper, we aim to generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor by explorin…
▽ More
Due to the limits of bandwidth and storage space, digital images are usually down-scaled and compressed when transmitted over networks, resulting in loss of details and jarring artifacts that can lower the performance of high-level visual tasks. In this paper, we aim to generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor by exploring joint compression artifacts reduction (CAR) and super-resolution (SR) tasks. First, we propose a context-aware joint CAR and SR neural network (CAJNN) that integrates both local and non-local features to solve CAR and SR in one-stage. Finally, a deep reconstruction network is adopted to predict high quality and high-resolution images. Evaluation on CAR and SR benchmark datasets shows that our CAJNN model outperforms previous methods and also takes 26.2% shorter runtime. Based on this model, we explore addressing two critical challenges in high-level computer vision: optical character recognition of low-resolution texts, and extremely tiny face detection. We demonstrate that CAJNN can serve as an effective image preprocessing method and improve the accuracy for real-scene text recognition (from 85.30% to 85.75%) and the average precision for tiny face detection (from 0.317 to 0.611).
△ Less
Submitted 17 December, 2020; v1 submitted 18 October, 2020;
originally announced October 2020.
-
Scalable, Proposal-free Instance Segmentation Network for 3D Pixel Clustering and Particle Trajectory Reconstruction in Liquid Argon Time Projection Chambers
Authors:
Dae Heun Koh,
Pierre Côte de Soux,
Laura Dominé,
François Drielsma,
Ran Itay,
Qing Lin,
Kazuhiro Terao,
Ka Vang Tsang,
Tracy Usher
Abstract:
Liquid Argon Time Projection Chambers (LArTPCs) are high resolution particle imaging detectors, employed by accelerator-based neutrino oscillation experiments for high precision physics measurements. While images of particle trajectories are intuitive to analyze for physicists, the development of a high quality, automated data reconstruction chain remains challenging. One of the most critical reco…
▽ More
Liquid Argon Time Projection Chambers (LArTPCs) are high resolution particle imaging detectors, employed by accelerator-based neutrino oscillation experiments for high precision physics measurements. While images of particle trajectories are intuitive to analyze for physicists, the development of a high quality, automated data reconstruction chain remains challenging. One of the most critical reconstruction steps is particle clustering: the task of grou** 3D image pixels into different particle instances that share the same particle type. In this paper, we propose the first scalable deep learning algorithm for particle clustering in LArTPC data using sparse convolutional neural networks (SCNN). Building on previous works on SCNNs and proposal free instance segmentation, we build an end-to-end trainable instance segmentation network that learns an embedding of the image pixels to perform point cloud clustering in a transformed space. We benchmark the performance of our algorithm on PILArNet, a public 3D particle imaging dataset, with respect to common clustering evaluation metrics. 3D pixels were successfully clustered into individual particle trajectories with 90% of them having an adjusted Rand index score greater than 92% with a mean pixel clustering efficiency and purity above 96%. This work contributes to the development of an end-to-end optimizable full data reconstruction chain for LArTPCs, in particular pixel-based 3D imaging detectors including the near detector of the Deep Underground Neutrino Experiment. Our algorithm is made available in the open access repository, and we share our Singularity software container, which can be used to reproduce our work on the dataset.
△ Less
Submitted 6 July, 2020;
originally announced July 2020.
-
COVID-Net S: Towards computer-aided severity assessment via training and validation of deep neural networks for geographic extent and opacity extent scoring of chest X-rays for SARS-CoV-2 lung disease severity
Authors:
Alexander Wong,
Zhong Qiu Lin,
Linda Wang,
Audrey G. Chung,
Beiyi Shen,
Almas Abbasi,
Mahsa Hoshmand-Kochi,
Timothy Q. Duong
Abstract:
Background: A critical step in effective care and treatment planning for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of the COVID-19 pandemic, is the assessment of the severity of disease progression. Chest x-rays (CXRs) are often used to assess SARS-CoV-2 severity, with two important assessment metrics being extent of lung involvement and degree of opacity. In this pro…
▽ More
Background: A critical step in effective care and treatment planning for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of the COVID-19 pandemic, is the assessment of the severity of disease progression. Chest x-rays (CXRs) are often used to assess SARS-CoV-2 severity, with two important assessment metrics being extent of lung involvement and degree of opacity. In this proof-of-concept study, we assess the feasibility of computer-aided scoring of CXRs of SARS-CoV-2 lung disease severity using a deep learning system.
Materials and Methods: Data consisted of 396 CXRs from SARS-CoV-2 positive patient cases. Geographic extent and opacity extent were scored by two board-certified expert chest radiologists (with 20+ years of experience) and a 2nd-year radiology resident. The deep neural networks used in this study, which we name COVID-Net S, are based on a COVID-Net network architecture. 100 versions of the network were independently learned (50 to perform geographic extent scoring and 50 to perform opacity extent scoring) using random subsets of CXRs from the study, and we evaluated the networks using stratified Monte Carlo cross-validation experiments.
Findings: The COVID-Net S deep neural networks yielded R$^2$ of 0.664 $\pm$ 0.032 and 0.635 $\pm$ 0.044 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively, in stratified Monte Carlo cross-validation experiments. The best performing networks achieved R$^2$ of 0.739 and 0.741 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively.
Interpretation: The results are promising and suggest that the use of deep neural networks on CXRs could be an effective tool for computer-aided assessment of SARS-CoV-2 lung disease severity, although additional studies are needed before adoption for routine clinical use.
△ Less
Submitted 16 April, 2021; v1 submitted 26 May, 2020;
originally announced May 2020.
-
Atss-Net: Target Speaker Separation via Attention-based Neural Network
Authors:
Tingle Li,
Qingjian Lin,
Yuanyuan Bao,
Ming Li
Abstract:
Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more f…
▽ More
Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. Experimental results show that our Atss-Net yields better performance than the VoiceFilter, although it only contains half of the parameters. Furthermore, our proposed model also demonstrates promising performance in speech enhancement.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Lithium niobate photonic-crystal electro-optic modulator
Authors:
Mingxiao Li,
**gwei Ling,
Yang He,
Usman A. Javid,
Shixin Xue,
Qiang Lin
Abstract:
Modern advanced photonic integrated circuits require dense integration of high-speed electro-optic functional elements on a compact chip that consumes only moderate power. Energy efficiency, operation speed, and device dimension are thus crucial metrics underlying almost all current developments of photonic signal processing units. Recently, thin-film lithium niobate (LN) emerges as a promising pl…
▽ More
Modern advanced photonic integrated circuits require dense integration of high-speed electro-optic functional elements on a compact chip that consumes only moderate power. Energy efficiency, operation speed, and device dimension are thus crucial metrics underlying almost all current developments of photonic signal processing units. Recently, thin-film lithium niobate (LN) emerges as a promising platform for photonic integrated circuits. Here we make an important step towards miniaturizing functional components on this platform, reporting probably the smallest high-speed LN electro-optic modulators, based upon photonic crystal nanobeam resonators. The devices exhibit a significant tuning efficiency up to 1.98 GHz/V, a broad modulation bandwidth of 17.5 GHz, while with a tiny electro-optic modal volume of only 0.58 $μ{\rm m}^3$. The modulators enable efficient electro-optic driving of high-Q photonic cavity modes in both adiabatic and non-adiabatic regimes, and allow us to achieve electro-optic switching at 11 Gb/s with a bit-switching energy as low as 22 fJ. The demonstration of energy efficient and high-speed electro-optic modulation at the wavelength scale paves a crucial foundation for realizing large-scale LN photonic integrated circuits that are of immense importance for broad applications in data communication, microwave photonics, and quantum photonics.
△ Less
Submitted 9 June, 2020; v1 submitted 17 February, 2020;
originally announced March 2020.
-
Safe Planning for Self-Driving Via Adaptive Constrained ILQR
Authors:
Yanjun Pan,
Qin Lin,
Het Shah,
John M. Dolan
Abstract:
Constrained Iterative Linear Quadratic Regulator (CILQR), a variant of ILQR, has been recently proposed for motion planning problems of autonomous vehicles to deal with constraints such as obstacle avoidance and reference tracking. However, the previous work considers either deterministic trajectories or persistent prediction for target dynamical obstacles. The other drawback is lack of generality…
▽ More
Constrained Iterative Linear Quadratic Regulator (CILQR), a variant of ILQR, has been recently proposed for motion planning problems of autonomous vehicles to deal with constraints such as obstacle avoidance and reference tracking. However, the previous work considers either deterministic trajectories or persistent prediction for target dynamical obstacles. The other drawback is lack of generality - it requires manual weight tuning for different scenarios. In this paper, two significant improvements are achieved. Firstly, a two-stage uncertainty-aware prediction is proposed. The short-term prediction with safety guarantee based on reachability analysis is responsible for dealing with extreme maneuvers conducted by target vehicles. The long-term prediction leveraging an adaptive least square filter preserves the long-term optimality of the planned trajectory since using reachability only for long-term prediction is too pessimistic and makes the planner over-conservative. Secondly, to allow a wider coverage over different scenarios and to avoid tedious parameter tuning case by case, this paper designs a scenario-based analytical function taking the states from the ego vehicle and the target vehicle as input, and carrying weights of a cost function as output. It allows the ego vehicle to execute multiple behaviors (such as lane-kee** and overtaking) under a single planner. We demonstrate safety, effectiveness, and real-time performance of the proposed planner in simulations.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team
Authors:
Qingjian Lin,
Weicheng Cai,
Lin Yang,
Junjie Wang,
Jun Zhang,
Ming Li
Abstract:
In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. O…
▽ More
In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. Our final submission employs the ResNet-LSTM based VAD, the Deep ResNet based speaker embedding, the LSTM based similarity scoring and spectral clustering. Variational Bayes (VB) diarization is applied in the resegmentation stage and overlap detection also brings slight improvement. Our proposed system achieves 18.84% DER in Track1 and 27.90% DER in Track2. Although our systems have reduced the DERs by 27.5% and 31.7% relatively against the official baselines, we believe that the diarization task is still very difficult.
△ Less
Submitted 4 May, 2020; v1 submitted 23 February, 2020;
originally announced February 2020.
-
PuckNet: Estimating hockey puck location from broadcast video
Authors:
Kanav Vats,
William McNally,
Chris Dulhanty,
Zhong Qiu Lin,
David A. Clausi,
John Zelek
Abstract:
Puck location in ice hockey is essential for hockey analysts for determining the location of play and analyzing game events. However, because of the difficulty involved in obtaining accurate annotations due to the extremely low visibility and commonly occurring occlusions of the puck, the problem is very challenging. The problem becomes even more challenging in broadcast videos with changing camer…
▽ More
Puck location in ice hockey is essential for hockey analysts for determining the location of play and analyzing game events. However, because of the difficulty involved in obtaining accurate annotations due to the extremely low visibility and commonly occurring occlusions of the puck, the problem is very challenging. The problem becomes even more challenging in broadcast videos with changing camera angles. We introduce a novel methodology for determining puck location from approximate puck location annotations in broadcast video. Our method uniquely leverages the existing puck location information that is publicly available in existing hockey event data and uses the corresponding one-second broadcast video clips as input to the network. The rationale behind using video as input instead of static images is that with video, the temporal information can be utilized to handle puck occlusions. The network outputs a heatmap representing the probability of the puck location using a 3D CNN based architecture. The network is able to regress the puck location from broadcast hockey video clips with varying camera angles. Experimental results demonstrate the capability of the method, achieving 47.07% AUC on the test dataset. The network is also able to estimate the puck location in defensive/offensive zones with an accuracy of greater than 80%.
△ Less
Submitted 17 March, 2021; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Orbital-angular-momentum free-space optical communication via the azimuthal phase-shift
Authors:
R. Lopez-Rios,
U. A. Javid,
Q. Lin
Abstract:
Free-space optical communication using the orbital angular momentum (OAM) of light has garnered significant interest lately due to the potentially vast bandwidth intrinsic to the infinite Hilbert space of OAM modes. Unfortunately, OAM light beams suffer from serious distortions due to atmospheric turbulence (AT) that has become a dominant factor limiting the advance of OAM-based free-space communi…
▽ More
Free-space optical communication using the orbital angular momentum (OAM) of light has garnered significant interest lately due to the potentially vast bandwidth intrinsic to the infinite Hilbert space of OAM modes. Unfortunately, OAM light beams suffer from serious distortions due to atmospheric turbulence (AT) that has become a dominant factor limiting the advance of OAM-based free-space communication. Here we propose and demonstrate a free-space communication scheme---using OAM beams and their azimuthal-mode phase-shift for keying (OAM-APSK)---which is resilient to AT-induced distortions. Combined with a digital holographic mode sorting (DHMS) technique, the proposed approach is able to achieve high signal-to-noise ratios and to maintain low modal crosstalk, even for extremely strong turbulence conditions, with magnitudes significantly beyond existing AT mitigation methods. The demonstrated OAM-APSK and DHMS schemes may now open up a great avenue for OAM-based free-space optical communication that could elegantly resolve the long-standing challenge imposed by atmospheric turbulence.
△ Less
Submitted 1 October, 2019;
originally announced October 2019.
-
LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
Authors:
Qingjian Lin,
Ruiqing Yin,
Ming Li,
Hervé Bredin,
Claude Barras
Abstract:
More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage, traditional algorithms like probabilistic linear discriminant analysis (PLDA) are widely used for scoring the similarity between two speech segments. In this pa…
▽ More
More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage, traditional algorithms like probabilistic linear discriminant analysis (PLDA) are widely used for scoring the similarity between two speech segments. In this paper, we propose a supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long short-term memory networks (Bi-LSTM). Spectral clustering is applied on top of the similarity matrix to further improve the performance. Experimental results show that our system significantly outperforms the state-of-the-art methods and achieves a diarization error rate of 6.63% on the NIST SRE 2000 CALLHOME database.
△ Less
Submitted 23 July, 2019;
originally announced July 2019.
-
A Probabilistic Approach for Demand-Aware Ride-Sharing Optimization
Authors:
Qiulin Lin,
Wenjie Xu,
Minghua Chen,
Xiaojun Lin
Abstract:
Ride-sharing is a modern urban-mobility paradigm with tremendous potential in reducing congestion and pollution. Demand-aware design is a promising avenue for addressing a critical challenge in ride-sharing systems, namely joint optimization of request-vehicle assignment and routing for a fleet of vehicles. In this paper, we develop a probabilistic demand-aware framework to tackle the challenge. W…
▽ More
Ride-sharing is a modern urban-mobility paradigm with tremendous potential in reducing congestion and pollution. Demand-aware design is a promising avenue for addressing a critical challenge in ride-sharing systems, namely joint optimization of request-vehicle assignment and routing for a fleet of vehicles. In this paper, we develop a probabilistic demand-aware framework to tackle the challenge. We focus on maximizing the expected number of passenger pickups, given the probability distributions of future demands. The key idea of our approach is to assign requests to vehicles in a probabilistic manner. It differentiates our work from existing ones and allows us to explore a richer design space to tackle the request-vehicle assignment puzzle with a performance guarantee but still kee** the final solution practically implementable. The optimization problem is non-convex, combinatorial, and NP-hard in nature. As a key contribution, we explore the problem structure and propose an elegant approximation of the objective function to develop a dual-subgradient heuristic. We characterize a condition under which the heuristic generates a $\left(1-1/e\right)$ approximation solution. Our solution is simple and scalable, amendable for practical implementation. Results of numerical experiments based on real-world traces in Manhattan show that, as compared to a conventional demand-oblivious scheme, our demand-aware solution improves the passenger pickups by up to 46%. The results also show that joint optimization at the fleet level leads to 19% more pickups than that by separate optimizations at individual vehicles.
△ Less
Submitted 6 June, 2019; v1 submitted 30 April, 2019;
originally announced May 2019.
-
H2B: Heartbeat-based Secret Key Generation Using Piezo Vibration Sensors
Authors:
Qi Lin,
Weitao Xu,
Jun Liu,
Abdelwahed Khamis,
Wen Hu,
Mahbub Hassan,
Aruna Seneviratne
Abstract:
We present Heartbeats-2-Bits (H2B), which is a system for securely pairing wearable devices by generating a shared secret key from the skin vibrations caused by heartbeat. This work is motivated by potential power saving opportunity arising from the fact that heartbeat intervals can be detected energy-efficiently using inexpensive and power-efficient piezo sensors, which obviates the need to emplo…
▽ More
We present Heartbeats-2-Bits (H2B), which is a system for securely pairing wearable devices by generating a shared secret key from the skin vibrations caused by heartbeat. This work is motivated by potential power saving opportunity arising from the fact that heartbeat intervals can be detected energy-efficiently using inexpensive and power-efficient piezo sensors, which obviates the need to employ complex heartbeat monitors such as Electrocardiogram or Photoplethysmogram. Indeed, our experiments show that piezo sensors can measure heartbeat intervals on many different body locations including chest, wrist, waist, neck and ankle. Unfortunately, we also discover that the heartbeat interval signal captured by piezo vibration sensors has low Signal-to-Noise Ratio (SNR) because they are not designed as precision heartbeat monitors, which becomes the key challenge for H2B. To overcome this problem, we first apply a quantile function-based quantization method to fully extract the useful entropy from the noisy piezo measurements. We then propose a novel Compressive Sensing-based reconciliation method to correct the high bit mismatch rates between the two independently generated keys caused by low SNR. We prototype H2B using off-the-shelf piezo sensors and evaluate its performance on a dataset collected from different body positions of 23 participants. Our results show that H2B has an overwhelming pairing success rate of 95.6%. We also analyze and demonstrate H2B's robustness against three types of attacks. Finally, our power measurements show that H2B is very power-efficient.
△ Less
Submitted 19 February, 2019;
originally announced April 2019.
-
Competitive Online Optimization under Inventory Constraints
Authors:
Qiulin Lin,
Hanling Yi,
John Pang,
Minghua Chen,
Adam Wierman,
Michael Honig,
Yuanzhang Xiao
Abstract:
This paper studies online optimization under inventory (budget) constraints. While online optimization is a well-studied topic, versions with inventory constraints have proven difficult. We consider a formulation of inventory-constrained optimization that is a generalization of the classic one-way trading problem and has a wide range of applications. We present a new algorithmic framework, \textsf…
▽ More
This paper studies online optimization under inventory (budget) constraints. While online optimization is a well-studied topic, versions with inventory constraints have proven difficult. We consider a formulation of inventory-constrained optimization that is a generalization of the classic one-way trading problem and has a wide range of applications. We present a new algorithmic framework, \textsf{CR-Pursuit}, and prove that it achieves the minimal competitive ratio among all deterministic algorithms (up to a problem-dependent constant factor) for inventory-constrained online optimization. Our algorithm and its analysis not only simplify and unify the state-of-the-art results for the standard one-way trading problem, but they also establish novel bounds for generalizations including concave revenue functions. For example, for one-way trading with price elasticity, the \textsf{CR-Pursuit} algorithm achieves a competitive ratio that is within a small additive constant (i.e., 1/3) to the lower bound of $\ln θ+1$, where $θ$ is the ratio between the maximum and minimum base prices.
△ Less
Submitted 25 January, 2019;
originally announced January 2019.
-
Balancing Cost and Dissatisfaction in Online EV Charging under Real-time Pricing
Authors:
Hanling Yi,
Qiulin Lin,
Minghua Chen
Abstract:
We consider an increasingly popular demand-response scenario where a user schedules the flexible electric vehicle (EV) charging load in response to real-time electricity prices. The objective is to minimize the total charging cost with user dissatisfaction taken into account. We focus on the online setting where neither accurate prediction nor distribution of future real-time prices is available t…
▽ More
We consider an increasingly popular demand-response scenario where a user schedules the flexible electric vehicle (EV) charging load in response to real-time electricity prices. The objective is to minimize the total charging cost with user dissatisfaction taken into account. We focus on the online setting where neither accurate prediction nor distribution of future real-time prices is available to the user when making irrevocable charging decision in each time slot. The emphasis on considering user dissatisfaction and achieving optimal competitive ratio differentiates our work from existing ones and makes our study uniquely challenging. Our key contribution is two simple online algorithms with the best possible competitive ratio among all deterministic algorithms. The optimal competitive ratio is upper-bounded by \min\left\{ \sqrt{α/p_{\min}},p_{\max}/p_{\min}\right\} and the bound is asymptotically tight with respect to α, where p_{\max} and p_{\min} are the upper and lower bounds of real-time prices and α\geq p_{\min} captures the consideration of user dissatisfaction. The bound under small and large values of αsuggests the fundamental difference of the problems with considering user dissatisfaction (αtakes small values) and without (αtakes large values). Along the way we also develop a general technique for designing online algorithms from an idea hinted in One-Way-Trading Problem, which can be of independent interest. Simulation results based on real-world traces corroborate our theoretical findings and show that our algorithms achieve substantial performance gain as compared to conceivable alternatives. The results also suggest that increasing EV charging rate limit decreases overall cost almost linearly.
△ Less
Submitted 15 January, 2019;
originally announced January 2019.
-
EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech Recognition on the Edge
Authors:
Zhong Qiu Lin,
Audrey G. Chung,
Alexander Wong
Abstract:
Despite showing state-of-the-art performance, deep learning for speech recognition remains challenging to deploy in on-device edge scenarios such as mobile and other consumer devices. Recently, there have been greater efforts in the design of small, low-footprint deep neural networks (DNNs) that are more appropriate for edge devices, with much of the focus on design principles for hand-crafting ef…
▽ More
Despite showing state-of-the-art performance, deep learning for speech recognition remains challenging to deploy in on-device edge scenarios such as mobile and other consumer devices. Recently, there have been greater efforts in the design of small, low-footprint deep neural networks (DNNs) that are more appropriate for edge devices, with much of the focus on design principles for hand-crafting efficient network architectures. In this study, we explore a human-machine collaborative design strategy for building low-footprint DNN architectures for speech recognition through a marriage of human-driven principled network design prototy** and machine-driven design exploration. The efficacy of this design strategy is demonstrated through the design of a family of highly-efficient DNNs (nicknamed EdgeSpeechNets) for limited-vocabulary speech recognition. Experimental results using the Google Speech Commands dataset for limited-vocabulary speech recognition showed that EdgeSpeechNets have higher accuracies than state-of-the-art DNNs (with the best EdgeSpeechNet achieving ~97% accuracy), while achieving significantly smaller network sizes (as much as 7.8x smaller) and lower computational cost (as much as 36x fewer multiply-add operations, 10x lower prediction latency, and 16x smaller memory footprint on a Motorola Moto E phone), making them very well-suited for on-device edge voice interface applications.
△ Less
Submitted 13 November, 2018; v1 submitted 17 October, 2018;
originally announced October 2018.