Search | arXiv e-print repository

Effective Management of Airport Security Queues with Passenger Reassignment

Authors: Shangqing Cao, Aparimit Kasliwal, Masoud Reihanifar, Francesc Robuste, Mark Hansen

Abstract: Airport security queues often suffer from inefficiencies that result in long wait times and decreased throughput, especially at peak departure time, affecting both passengers and airlines. This work addresses the problem of reassigning passengers to specific time slots for crossing security, aiming to mitigate these inefficiencies. We frame this problem as a Minimum Cost Network Flow (MCNF) proble… ▽ More Airport security queues often suffer from inefficiencies that result in long wait times and decreased throughput, especially at peak departure time, affecting both passengers and airlines. This work addresses the problem of reassigning passengers to specific time slots for crossing security, aiming to mitigate these inefficiencies. We frame this problem as a Minimum Cost Network Flow (MCNF) problem, enabling us to solve it exactly in polynomial time due to its linear programming structure. Our approach redistributes passenger demand across different time intervals. By optimizing the reassignment of passengers to sigma-minute time slots, we achieve significant improvements in throughput and reductions in waiting time. Preliminary results demonstrate the effectiveness of our method in enhancing operational efficiency and passenger satisfaction. The MCNF formulation offers a scalable and adaptable solution, providing long-term benefits for airport security management. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00947 [pdf, other]

Fleet Size and Spill for UAM Operation under Uncertain Demand

Authors: Shangqing Cao, Xuan Jiang, Emin Burak Onat, Bo Zou, Mark Hansen, Raja Sengupta, Anjan Chakrabarty

Abstract: Variation and imbalance in demand poses significant challenges to Urban Air Mobility (UAM) operations, affecting strategic decisions such as fleet sizing. To study the implications of demand variation on UAM fleet operations, we propose a stochastic passenger arrival time generation model that uses real-world data to infer demand distributions, and two integer programs that compute the zero-spill… ▽ More Variation and imbalance in demand poses significant challenges to Urban Air Mobility (UAM) operations, affecting strategic decisions such as fleet sizing. To study the implications of demand variation on UAM fleet operations, we propose a stochastic passenger arrival time generation model that uses real-world data to infer demand distributions, and two integer programs that compute the zero-spill fleet size and the spill-minimizing flight schedules and charging policies, respectively. Our numerical experiment on a two-vertiport network shows that spill in relatively inelastic to fleet size and that the driving factor behind spill is the imbalance in demand. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2405.11118 [pdf, other]

A Simulation-Optimization Framework for Develo** Wind-Resilient AAM Networks

Authors: Emin Burak Onat, Shangqing Cao, Raiyan Rizwan, Xuan Jiang, Mark Hansen, Raja Sengupta, Anjan Chakrabarty

Abstract: Environmental factors pose a significant challenge to the operational efficiency and safety of advanced air mobility (AAM) networks. This paper presents a simulation-optimization framework that dynamically integrates wind variability into AAM operations. We employ a nonlinear charging model within a multi-vertiport environment to optimize fleet size and scheduling. Our framework assesses the impac… ▽ More Environmental factors pose a significant challenge to the operational efficiency and safety of advanced air mobility (AAM) networks. This paper presents a simulation-optimization framework that dynamically integrates wind variability into AAM operations. We employ a nonlinear charging model within a multi-vertiport environment to optimize fleet size and scheduling. Our framework assesses the impact of wind on operational parameters, providing strategies to enhance the resilience of AAM ecosystems. The results demonstrate that wind conditions exert significant influence on fleet size even for short-distance flights, their impact on fleet size and energy requirements becomes more pronounced over longer distances. Efficient management of fleet size and charging policies, particularly for long-distance networks, is needed to accommodate the variability of wind conditions effectively. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted to ICRAT 2024

arXiv:2403.14135 [pdf, other]

Powerful Lossy Compression for Noisy Images

Authors: Shilv Cai, Xiaoguo Liang, Shuning Cao, Luxin Yan, Sheng Zhong, Liqun Chen, Xu Zou

Abstract: Image compression and denoising represent fundamental challenges in image processing with many real-world applications. To address practical demands, current solutions can be categorized into two main strategies: 1) sequential method; and 2) joint method. However, sequential methods have the disadvantage of error accumulation as there is information loss between multiple individual models. Recentl… ▽ More Image compression and denoising represent fundamental challenges in image processing with many real-world applications. To address practical demands, current solutions can be categorized into two main strategies: 1) sequential method; and 2) joint method. However, sequential methods have the disadvantage of error accumulation as there is information loss between multiple individual models. Recently, the academic community began to make some attempts to tackle this problem through end-to-end joint methods. Most of them ignore that different regions of noisy images have different characteristics. To solve these problems, in this paper, our proposed signal-to-noise ratio~(SNR) aware joint solution exploits local and non-local features for image compression and denoising simultaneously. We design an end-to-end trainable network, which includes the main encoder branch, the guidance branch, and the signal-to-noise ratio~(SNR) aware branch. We conducted extensive experiments on both synthetic and real-world datasets, demonstrating that our joint solution outperforms existing state-of-the-art methods. △ Less

Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted by ICME 2024

arXiv:2402.18070 [pdf, other]

A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing

Authors: Limin Jiang, Yi Shi, Haiqin Hu, Qingyu Deng, Siyi Xu, Yintao Liu, Feng Yuan, Si Wang, Yihao Shen, Fangfang Ye, Shan Cao, Zhiyuan Jiang

Abstract: Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and… ▽ More Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and consecutive character of WBP. Furthermore, the large amount of data in WBPs cannot be processed quickly in symmetric multiprocessors (SMPs) due to the unpredictability of memory latency. To address this issue, we propose a hierarchical dataflow-driven architecture to accelerate WBP. A pack-and-ship approach is presented under a non-uniform memory access (NUMA) architecture to allow the subordinate tiles to operate in a bundled access and execute manner. We also propose a multi-level dataflow model and the related scheduling scheme to manage and allocate the heterogeneous hardware resources. Experiment results demonstrate that our prototype achieves $2\times$ and $2.3\times$ speedup in terms of normalized throughput and single-tile clock cycles compared with GPU and DSP counterparts in several critical WBP benchmarks. Additionally, a link-level throughput of $288$ Mbps can be achieved with a $45$-core configuration. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 7 pages, 7 figures, conference

arXiv:2312.06969 [pdf, ps, other]

Channel Estimation for Movable Antenna Communication Systems: A Framework Based on Compressed Sensing

Authors: Zhenyu Xiao, Songqi Cao, Lipeng Zhu, Yanming Liu, Xiang-Gen Xia, Rui Zhang

Abstract: Movable antenna (MA) is a new technology with great potential to improve communication performance by enabling local movement of antennas for pursuing better channel conditions. In particular, the acquisition of complete channel state information (CSI) between the transmitter (Tx) and receiver (Rx) regions is an essential problem for MA systems to reap performance gains. In this paper, we propose… ▽ More Movable antenna (MA) is a new technology with great potential to improve communication performance by enabling local movement of antennas for pursuing better channel conditions. In particular, the acquisition of complete channel state information (CSI) between the transmitter (Tx) and receiver (Rx) regions is an essential problem for MA systems to reap performance gains. In this paper, we propose a general channel estimation framework for MA systems by exploiting the multi-path field response channel structure. Specifically, the angles of departure (AoDs), angles of arrival (AoAs), and complex coefficients of the multi-path components (MPCs) are jointly estimated by employing the compressed sensing method, based on multiple channel measurements at designated positions of the Tx-MA and Rx-MA. Under this framework, the Tx-MA and Rx-MA measurement positions fundamentally determine the measurement matrix for compressed sensing, of which the mutual coherence is analyzed from the perspective of Fourier transform. Moreover, two criteria for MA measurement positions are provided to guarantee the successful recovery of MPCs. Then, we propose several MA measurement position setups and compare their performance. Finally, comprehensive simulation results show that the proposed framework is able to estimate the complete CSI between the Tx and Rx regions with a high accuracy. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2310.09078 [pdf, other]

DNFS-VNE: Deep Neuro Fuzzy System Driven Virtual Network Embedding

Authors: Ailing Xiao, Ning Chen, Sheng Wu, Peiying Zhang, Suzhi Cao, Chunxiao Jiang

Abstract: By decoupling substrate resources, network virtualization (NV) is a promising solution for meeting diverse demands and ensuring differentiated quality of service (QoS). In particular, virtual network embedding (VNE) is a critical enabling technology that enhances the flexibility and scalability of network deployment by addressing the coupling of Internet processes and services. However, in the exi… ▽ More By decoupling substrate resources, network virtualization (NV) is a promising solution for meeting diverse demands and ensuring differentiated quality of service (QoS). In particular, virtual network embedding (VNE) is a critical enabling technology that enhances the flexibility and scalability of network deployment by addressing the coupling of Internet processes and services. However, in the existing works, the black-box nature of deep neural networks (DNNs) limits the analysis, development, and improvement of systems. In recent times, interpretable deep learning (DL) represented by deep neuro-fuzzy systems (DNFS) combined with fuzzy inference has shown promising interpretability to further exploit the hidden value in the data. Motivated by this, we propose a DNFS-based VNE algorithm that aims to provide an interpretable NV scheme. Specifically, data-driven convolutional neural networks (CNNs) are used as fuzzy implication operators to compute the embedding probabilities of candidate substrate nodes through entailment operations. And, the identified fuzzy rule patterns are cached into the weights by forward computation and gradient back-propagation (BP). Moreover, the fuzzy rule base is constructed based on Mamdani-type linguistic rules using linguistic labels. In addition, the DNFS-driven five-block structure-based policy network serves as the agent for deep reinforcement learning (DRL), which optimizes VNE decision-making through interaction with the environment. Finally, the effectiveness of evaluation indicators and fuzzy rules is verified by experiments. △ Less

Submitted 7 December, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

arXiv:2309.02959 [pdf, other]

A Non-Invasive Interpretable NAFLD Diagnostic Method Combining TCM Tongue Features

Authors: Shan Cao, Qunsheng Ruan, Qingfeng Wu, Weiqiang Lin

Abstract: Non-alcoholic fatty liver disease (NAFLD) is a clinicopathological syndrome characterized by hepatic steatosis resulting from the exclusion of alcohol and other identifiable liver-damaging factors. It has emerged as a leading cause of chronic liver disease worldwide. Currently, the conventional methods for NAFLD detection are expensive and not suitable for users to perform daily diagnostics. To ad… ▽ More Non-alcoholic fatty liver disease (NAFLD) is a clinicopathological syndrome characterized by hepatic steatosis resulting from the exclusion of alcohol and other identifiable liver-damaging factors. It has emerged as a leading cause of chronic liver disease worldwide. Currently, the conventional methods for NAFLD detection are expensive and not suitable for users to perform daily diagnostics. To address this issue, this study proposes a non-invasive and interpretable NAFLD diagnostic method, the required user-provided indicators are only Gender, Age, Height, Weight, Waist Circumference, Hip Circumference, and tongue image. This method involves merging patients' physiological indicators with tongue features, which are then input into a fusion network named SelectorNet. SelectorNet combines attention mechanisms with feature selection mechanisms, enabling it to autonomously learn the ability to select important features. The experimental results show that the proposed method achieves an accuracy of 77.22\% using only non-invasive data, and it also provides compelling interpretability matrices. This study contributes to the early diagnosis of NAFLD and the intelligent advancement of TCM tongue diagnosis. The project mentioned in this paper is currently publicly available. △ Less

Submitted 5 December, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.00787 [pdf, other]

Online Targetless Radar-Camera Extrinsic Calibration Based on the Common Features of Radar and Camera

Authors: Lei Cheng, Siyang Cao

Abstract: Sensor fusion is essential for autonomous driving and autonomous robots, and radar-camera fusion systems have gained popularity due to their complementary sensing capabilities. However, accurate calibration between these two sensors is crucial to ensure effective fusion and improve overall system performance. Calibration involves intrinsic and extrinsic calibration, with the latter being particula… ▽ More Sensor fusion is essential for autonomous driving and autonomous robots, and radar-camera fusion systems have gained popularity due to their complementary sensing capabilities. However, accurate calibration between these two sensors is crucial to ensure effective fusion and improve overall system performance. Calibration involves intrinsic and extrinsic calibration, with the latter being particularly important for achieving accurate sensor fusion. Unfortunately, many target-based calibration methods require complex operating procedures and well-designed experimental conditions, posing challenges for researchers attempting to reproduce the results. To address this issue, we introduce a novel approach that leverages deep learning to extract a common feature from raw radar data (i.e., Range-Doppler-Angle data) and camera images. Instead of explicitly representing these common features, our method implicitly utilizes these common features to match identical objects from both data sources. Specifically, the extracted common feature serves as an example to demonstrate an online targetless calibration method between the radar and camera systems. The estimation of the extrinsic transformation matrix is achieved through this feature-based approach. To enhance the accuracy and robustness of the calibration, we apply the RANSAC and Levenberg-Marquardt (LM) nonlinear optimization algorithm for deriving the matrix. Our experiments in the real world demonstrate the effectiveness and accuracy of our proposed method. △ Less

Submitted 24 January, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

arXiv:2308.07077 [pdf]

Distributed UAV Swarm Augmented Wideband Spectrum Sensing Using Nyquist Folding Receiver

Authors: Kaili Jiang, Kailun Tian, Hancong Feng, Yuxin Zhao, Dechang Wang, Sen Cao, Jian Gao, Xuying Zhang, Yanfei Li, Junyu Yuan, Ying Xiong, Bin Tang

Abstract: Distributed unmanned aerial vehicle (UAV) swarms are formed by multiple UAVs with increased portability, higher levels of sensing capabilities, and more powerful autonomy. These features make them attractive for many recent applica-tions, potentially increasing the shortage of spectrum resources. In this paper, wideband spectrum sensing augmented technology is discussed for distributed UAV swarms… ▽ More Distributed unmanned aerial vehicle (UAV) swarms are formed by multiple UAVs with increased portability, higher levels of sensing capabilities, and more powerful autonomy. These features make them attractive for many recent applica-tions, potentially increasing the shortage of spectrum resources. In this paper, wideband spectrum sensing augmented technology is discussed for distributed UAV swarms to improve the utilization of spectrum. However, the sub-Nyquist sampling applied in existing schemes has high hardware complexity, power consumption, and low recovery efficiency for non-strictly sparse conditions. Thus, the Nyquist folding receiver (NYFR) is considered for the distributed UAV swarms, which can theoretically achieve full-band spectrum detection and reception using a single analog-to-digital converter (ADC) at low speed for all circuit components. There is a focus on the sensing model of two multichannel scenarios for the distributed UAV swarms, one with a complete functional receiver for the UAV swarm with RIS, and another with a decentralized UAV swarm equipped with a complete functional receiver for each UAV element. The key issue is to consider whether the application of RIS technology will bring advantages to spectrum sensing and the data fusion problem of decentralized UAV swarms based on the NYFR architecture. Therefore, the property for multiple pulse reconstruction is analyzed through the Gershgorin circle theorem, especially for very short pulses. Further, the block sparse recovery property is analyzed for wide bandwidth signals. The proposed technology can improve the processing capability for multiple signals and wide bandwidth signals while reducing interference from folded noise and subsampled harmonics. Experiment results show augmented spectrum sensing efficiency under non-strictly sparse conditions. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.07075 [pdf, other]

Wideband Power Spectrum Sensing: a Fast Practical Solution for Nyquist Folding Receiver

Authors: Kaili Jiang, Dechang Wang, Kailun Tian, Hancong Feng, Yuxin Zhao, Sen Cao, Jian Gao, Xuying Zhang, Yanfei Li, Junyu Yuan, Ying Xiong, Bin Tang

Abstract: The limited availability of spectrum resources has been growing into a critical problem in wireless communications, remote sensing, and electronic surveillance, etc. To address the high-speed sampling bottleneck of wideband spectrum sensing, a fast and practical solution of power spectrum estimation for Nyquist folding receiver (NYFR) is proposed in this paper. The NYFR architectures is can theore… ▽ More The limited availability of spectrum resources has been growing into a critical problem in wireless communications, remote sensing, and electronic surveillance, etc. To address the high-speed sampling bottleneck of wideband spectrum sensing, a fast and practical solution of power spectrum estimation for Nyquist folding receiver (NYFR) is proposed in this paper. The NYFR architectures is can theoretically achieve the full-band signal sensing with a hundred percent of probability of intercept. But the existing algorithm is difficult to realize in real-time due to its high complexity and complicated calculations. By exploring the sub-sampling principle inherent in NYFR, a computationally efficient method is introduced with compressive covariance sensing. That can be efficient implemented via only the non-uniform fast Fourier transform, fast Fourier transform, and some simple multiplication operations. Meanwhile, the state-of-the-art power spectrum reconstruction model for NYFR of time-domain and frequency-domain is constructed in this paper as a comparison. Furthermore, the computational complexity of the proposed method scales linearly with the Nyquist-rate sampled number of samples and the sparsity of spectrum occupancy. Simulation results and discussion demonstrate that the low complexity in sampling and computation is a more practical solution to meet the real-time wideband spectrum sensing applications. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2307.15264 [pdf, other]

3D Radar and Camera Co-Calibration: A Flexible and Accurate Method for Target-based Extrinsic Calibration

Authors: Lei Cheng, Arindam Sengupta, Siyang Cao

Abstract: Advances in autonomous driving are inseparable from sensor fusion. Heterogeneous sensors are widely used for sensor fusion due to their complementary properties, with radar and camera being the most equipped sensors. Intrinsic and extrinsic calibration are essential steps in sensor fusion. The extrinsic calibration, independent of the sensor's own parameters, and performed after the sensors are in… ▽ More Advances in autonomous driving are inseparable from sensor fusion. Heterogeneous sensors are widely used for sensor fusion due to their complementary properties, with radar and camera being the most equipped sensors. Intrinsic and extrinsic calibration are essential steps in sensor fusion. The extrinsic calibration, independent of the sensor's own parameters, and performed after the sensors are installed, greatly determines the accuracy of sensor fusion. Many target-based methods require cumbersome operating procedures and well-designed experimental conditions, making them extremely challenging. To this end, we propose a flexible, easy-to-reproduce and accurate method for extrinsic calibration of 3D radar and camera. The proposed method does not require a specially designed calibration environment, and instead places a single corner reflector (CR) on the ground to iteratively collect radar and camera data simultaneously using Robot Operating System (ROS), and obtain radar-camera point correspondences based on their timestamps, and then use these point correspondences as input to solve the perspective-n-point (PnP) problem, and finally get the extrinsic calibration matrix. Also, RANSAC is used for robustness and the Levenberg-Marquardt (LM) nonlinear optimization algorithm is used for accuracy. Multiple controlled environment experiments as well as real-world experiments demonstrate the efficiency and accuracy (AED error is 15.31 pixels and Acc up to 89\%) of the proposed method. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2304.04774 [pdf, other]

DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion

Authors: ZiHan Cao, ShiQi Cao, Xiao Wu, JunMing Hou, Ran Ran, Liang-Jian Deng

Abstract: Denosing diffusion model, as a generative model, has received a lot of attention in the field of image generation recently, thanks to its powerful generation capability. However, diffusion models have not yet received sufficient research in the field of image fusion. In this article, we introduce diffusion model to the image fusion field, treating the image fusion task as image-to-image translatio… ▽ More Denosing diffusion model, as a generative model, has received a lot of attention in the field of image generation recently, thanks to its powerful generation capability. However, diffusion models have not yet received sufficient research in the field of image fusion. In this article, we introduce diffusion model to the image fusion field, treating the image fusion task as image-to-image translation and designing two different conditional injection modulation modules (i.e., style transfer modulation and wavelet modulation) to inject coarse-grained style information and fine-grained high-frequency and low-frequency information into the diffusion UNet, thereby generating fused images. In addition, we also discussed the residual learning and the selection of training objectives of the diffusion model in the image fusion task. Extensive experimental results based on quantitative and qualitative assessments compared with benchmarks demonstrates state-of-the-art results and good generalization performance in image fusion tasks. Finally, it is hoped that our method can inspire other works and gain insight into this field to better apply the diffusion model to image fusion tasks. Code shall be released for better reproducibility. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2303.09278 [pdf, other]

DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

Authors: Yanzhe Fu, Yueteng Kang, Songjun Cao, Long Ma

Abstract: Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speech recognition (ASR). However, the large model size and the non-streaming architecture make it hard to be used under low-resource or streaming scenarios. In this work, we propose a two-stage knowledge distillation method to solve these two problems: the first step is to make the big and non-streaming teacher model smaller, and th… ▽ More Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speech recognition (ASR). However, the large model size and the non-streaming architecture make it hard to be used under low-resource or streaming scenarios. In this work, we propose a two-stage knowledge distillation method to solve these two problems: the first step is to make the big and non-streaming teacher model smaller, and the second step is to make it streaming. Specially, we adopt the MSE loss for the distillation of hidden layers and the modified LF-MMI loss for the distillation of the prediction layer. Experiments are conducted on Gigaspeech, Librispeech, and an in-house dataset. The results show that the distilled student model (DistillW2V2) we finally get is 8x faster and 12x smaller than the original teacher model. For the 480ms latency setup, the DistillW2V2's relative word error rate (WER) degradation varies from 9% to 23.4% on test sets, which reveals a promising way to extend the W2V2's application scope. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2301.02069 [pdf, other]

doi 10.1007/s10278-022-00755-z

Deep Learning for Breast MRI Style Transfer with Limited Training Data

Authors: Shixing Cao, Nicholas Konz, James Duncan, Maciej A. Mazurowski

Abstract: In this work we introduce a novel medical image style transfer method, StyleMapper, that can transfer medical scans to an unseen style with access to limited training data. This is made possible by training our model on unlimited possibilities of simulated random medical imaging styles on the training set, making our work more computationally efficient when compared with other style transfer metho… ▽ More In this work we introduce a novel medical image style transfer method, StyleMapper, that can transfer medical scans to an unseen style with access to limited training data. This is made possible by training our model on unlimited possibilities of simulated random medical imaging styles on the training set, making our work more computationally efficient when compared with other style transfer methods. Moreover, our method enables arbitrary style transfer: transferring images to styles unseen in training. This is useful for medical imaging, where images are acquired using different protocols and different scanner models, resulting in a variety of styles that data may need to be transferred between. Methods: Our model disentangles image content from style and can modify an image's style by simply replacing the style encoding with one extracted from a single image of the target style, with no additional optimization required. This also allows the model to distinguish between different styles of images, including among those that were unseen in training. We propose a formal description of the proposed model. Results: Experimental results on breast magnetic resonance images indicate the effectiveness of our method for style transfer. Conclusion: Our style transfer method allows for the alignment of medical images taken with different scanners into a single unified style dataset, allowing for the training of other downstream tasks on such a dataset for tasks such as classification, object detection and others. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: preprint version, accepted in the Journal of Digital Imaging (JDIM). 16 pages (+ author names + references + supplementary), 6 figures

Journal ref: J Digit Imaging (2022)

arXiv:2211.03731 [pdf, other]

doi 10.1109/TSP.2023.3287671

Group Testing with Side Information via Generalized Approximate Message Passing

Authors: Shu-Jie Cao, Ritesh Goenka, Chau-Wai Wong, Ajit Rajwade, Dror Baron

Abstract: Group testing can help maintain a widespread testing program using fewer resources amid a pandemic. In a group testing setup, we are given n samples, one per individual. Each individual is either infected or uninfected. These samples are arranged into m < n pooled samples, where each pool is obtained by mixing a subset of the n individual samples. Infected individuals are then identified using a g… ▽ More Group testing can help maintain a widespread testing program using fewer resources amid a pandemic. In a group testing setup, we are given n samples, one per individual. Each individual is either infected or uninfected. These samples are arranged into m < n pooled samples, where each pool is obtained by mixing a subset of the n individual samples. Infected individuals are then identified using a group testing algorithm. In this paper, we incorporate side information (SI) collected from contact tracing (CT) into nonadaptive/single-stage group testing algorithms. We generate different types of possible CT SI data by incorporating different possible characteristics of the spread of disease. These data are fed into a group testing framework based on generalized approximate message passing (GAMP). Numerical results show that our GAMP-based algorithms provide improved accuracy. △ Less

Submitted 16 June, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

Comments: To appear in IEEE Trans. Signal Processing. arXiv admin note: substantial text overlap with arXiv:2106.02699, arXiv:2011.14186

arXiv:2208.09785 [pdf, ps, other]

High-Performance Transmission Mechanism Design of Multi-Stream Carrier Aggregation for 5G Non-Standalone Network

Authors: Jun Yu, Shunqing Zhang, Jiayun Sun, Shugong Xu, Shan Cao

Abstract: Multi-stream carrier aggregation is a key technology to expand bandwidth and improve the throughput of the fifth-generation wireless communication systems. However, due to the diversified propagation properties of different frequency bands, the traffic migration task is much more challenging, especially in hybrid sub-6 GHz and millimeter wave bands scenario. Existing schemes either neglected to co… ▽ More Multi-stream carrier aggregation is a key technology to expand bandwidth and improve the throughput of the fifth-generation wireless communication systems. However, due to the diversified propagation properties of different frequency bands, the traffic migration task is much more challenging, especially in hybrid sub-6 GHz and millimeter wave bands scenario. Existing schemes either neglected to consider the transmission rate difference between multi-stream carrier, or only consider simple low mobility scenario. In this paper, we propose a low-complexity traffic splitting algorithm based on fuzzy proportional integral derivative control mechanism. The proposed algorithm only relies on the local radio link control buffer information of sub-6 GHz and mmWave bands, while frequent feedback from user equipment (UE) side is minimized. As shown in the numerical examples, the proposed traffic splitting mechanism can achieve more than 90% link resource utilization ratio for different UE transmission requirements with different mobilities, which corresponds to 10% improvement if compared with conventional baselines. △ Less

Submitted 20 August, 2022; originally announced August 2022.

Comments: 17 pages, 7 figures

arXiv:2206.14496 [pdf]

doi 10.1016/j.energy.2022.124552

Auto-Encoder-Extreme Learning Machine Model for Boiler NOx Emission Concentration Prediction

Authors: Zhenhao Tang, Shikui Wang, Xiangying Chai, Shengxian Cao, Tinghui Ouyang, Yang Li

Abstract: An automatic encoder (AE) extreme learning machine (ELM)-AE-ELM model is proposed to predict the NOx emission concentration based on the combination of mutual information algorithm (MI), AE, and ELM. First, the importance of practical variables is computed by the MI algorithm, and the mechanism is analyzed to determine the variables related to the NOx emission concentration. Then, the time delay c… ▽ More An automatic encoder (AE) extreme learning machine (ELM)-AE-ELM model is proposed to predict the NOx emission concentration based on the combination of mutual information algorithm (MI), AE, and ELM. First, the importance of practical variables is computed by the MI algorithm, and the mechanism is analyzed to determine the variables related to the NOx emission concentration. Then, the time delay correlations between the selected variables and NOx emission concentration are further analyzed to reconstruct the modeling data. Subsequently, the AE is applied to extract hidden features within the input variables. Finally, an ELM algorithm establishes the relationship between the NOx emission concentration and deep features. The experimental results on practical data indicate that the proposed model shows promising performance compared to state-of-art models. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: Accepted by Energy

Journal ref: Energy 256 (2022) 124552

arXiv:2206.08189 [pdf, other]

Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training

Authors: Bowen Zhang, Songjun Cao, Xiaoming Zhang, Yike Zhang, Long Ma, Takahiro Shinozaki

Abstract: Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain insufficiently studied. Besides, modern semi-supervised speech recognition algorithms either treat unlabeled data indiscriminately or filter out noisy samples with a confi… ▽ More Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain insufficiently studied. Besides, modern semi-supervised speech recognition algorithms either treat unlabeled data indiscriminately or filter out noisy samples with a confidence threshold. The dissimilarities among different unlabeled data are often ignored. In this paper, we propose Censer, a semi-supervised speech recognition algorithm based on self-supervised pre-training to maximize the utilization of unlabeled data. The pre-training stage of Censer adopts wav2vec2.0 and the fine-tuning stage employs an improved semi-supervised learning algorithm from slimIPL, which leverages unlabeled data progressively according to their pseudo labels' qualities. We also incorporate a temporal pseudo label pool and an exponential moving average to control the pseudo labels' update frequency and to avoid model divergence. Experimental results on Libri-Light and LibriSpeech datasets manifest our proposed method achieves better performance compared to existing approaches while being more unified. △ Less

Submitted 27 June, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

arXiv:2204.11769 [pdf, ps, other]

Multi-scale reconstruction of undersampled spectral-spatial OCT data for coronary imaging using deep learning

Authors: Xueshen Li, Shengting Cao, Hongshan Liu, Xinwen Yao, Brigitta C. Brott, Silvio H. Litovsky, Xiaoyu Song, Yuye Ling, Yu Gan

Abstract: Coronary artery disease (CAD) is a cardiovascular condition with high morbidity and mortality. Intravascular optical coherence tomography (IVOCT) has been considered as an optimal imagining system for the diagnosis and treatment of CAD. Constrained by Nyquist theorem, dense sampling in IVOCT attains high resolving power to delineate cellular structures/ features. There is a trade-off between high… ▽ More Coronary artery disease (CAD) is a cardiovascular condition with high morbidity and mortality. Intravascular optical coherence tomography (IVOCT) has been considered as an optimal imagining system for the diagnosis and treatment of CAD. Constrained by Nyquist theorem, dense sampling in IVOCT attains high resolving power to delineate cellular structures/ features. There is a trade-off between high spatial resolution and fast scanning rate for coronary imaging. In this paper, we propose a viable spectral-spatial acquisition method that down-scales the sampling process in both spectral and spatial domain while maintaining high quality in image reconstruction. The down-scaling schedule boosts data acquisition speed without any hardware modifications. Additionally, we propose a unified multi-scale reconstruction framework, namely Multiscale- Spectral-Spatial-Magnification Network (MSSMN), to resolve highly down-scaled (compressed) OCT images with flexible magnification factors. We incorporate the proposed methods into Spectral Domain OCT (SD-OCT) imaging of human coronary samples with clinical features such as stent and calcified lesions. Our experimental results demonstrate that spectral-spatial downscaled data can be better reconstructed than data that is downscaled solely in either spectral or spatial domain. Moreover, we observe better reconstruction performance using MSSMN than using existing reconstruction methods. Our acquisition method and multi-scale reconstruction framework, in combination, may allow faster SD-OCT inspection with high resolution during coronary intervention. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: 11 pages, 8 figures, reviewed by IEEE trans BME

arXiv:2203.04767 [pdf, other]

A practical framework for multi-domain speech recognition and an instance sampling method to neural language modeling

Authors: Yike Zhang, Xiaobing Feng, Yi Liu, Songjun Cao, Long Ma

Abstract: Automatic speech recognition (ASR) systems used on smart phones or vehicles are usually required to process speech queries from very different domains. In such situations, a vanilla ASR system usually fails to perform well on every domain. This paper proposes a multi-domain ASR framework for Tencent Map, a navigation app used on smart phones and in-vehicle infotainment systems. The proposed framew… ▽ More Automatic speech recognition (ASR) systems used on smart phones or vehicles are usually required to process speech queries from very different domains. In such situations, a vanilla ASR system usually fails to perform well on every domain. This paper proposes a multi-domain ASR framework for Tencent Map, a navigation app used on smart phones and in-vehicle infotainment systems. The proposed framework consists of three core parts: a basic ASR module to generate n-best lists of a speech query, a text classification module to determine which domain the speech query belongs to, and a reranking module to rescore n-best lists using domain-specific language models. In addition, an instance sampling based method to training neural network language models (NNLMs) is proposed to address the data imbalance problem in multi-domain ASR. In experiments, the proposed framework was evaluated on navigation domain and music domain, since navigating and playing music are two main features of Tencent Map. Compared to a general ASR system, the proposed framework achieves a relative 13% $\sim$ 22% character error rate reduction on several test sets collected from Tencent Map and our in-car voice assistant. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 7 pages, 1 figure

arXiv:2203.03640 [pdf, other]

doi 10.1109/TMI.2020.3014433

Conquering Data Variations in Resolution: A Slice-Aware Multi-Branch Decoder Network

Authors: Shuxin Wang, Shilei Cao, Zhizhong Chai, Dong Wei, Kai Ma, Liansheng Wang, Yefeng Zheng

Abstract: Fully convolutional neural networks have made promising progress in joint liver and liver tumor segmentation. Instead of following the debates over 2D versus 3D networks (for example, pursuing the balance between large-scale 2D pretraining and 3D context), in this paper, we novelly identify the wide variation in the ratio between intra- and inter-slice resolutions as a crucial obstacle to the perf… ▽ More Fully convolutional neural networks have made promising progress in joint liver and liver tumor segmentation. Instead of following the debates over 2D versus 3D networks (for example, pursuing the balance between large-scale 2D pretraining and 3D context), in this paper, we novelly identify the wide variation in the ratio between intra- and inter-slice resolutions as a crucial obstacle to the performance. To tackle the mismatch between the intra- and inter-slice information, we propose a slice-aware 2.5D network that emphasizes extracting discriminative features utilizing not only in-plane semantics but also out-of-plane coherence for each separate slice. Specifically, we present a slice-wise multi-input multi-output architecture to instantiate such a design paradigm, which contains a Multi-Branch Decoder (MD) with a Slice-centric Attention Block (SAB) for learning slice-specific features and a Densely Connected Dice (DCD) loss to regularize the inter-slice predictions to be coherent and continuous. Based on the aforementioned innovations, we achieve state-of-the-art results on the MICCAI 2017 Liver Tumor Segmentation (LiTS) dataset. Besides, we also test our model on the ISBI 2019 Segmentation of THoracic Organs at Risk (SegTHOR) dataset, and the result proves the robustness and generalizability of the proposed method in other segmentation tasks. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: Published by IEEE TMI

arXiv:2203.03582 [pdf, other]

Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

Authors: Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang

Abstract: Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2.0 models. Due to the conditional independence assumption, CTC-based models are always weaker than attention-based encoder-decoder models and require the assistance of external language models (LMs). To solve this is… ▽ More Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2.0 models. Due to the conditional independence assumption, CTC-based models are always weaker than attention-based encoder-decoder models and require the assistance of external language models (LMs). To solve this issue, we propose two knowledge transferring methods that leverage pre-trained LMs, such as BERT and GPT2, to improve CTC-based models. The first method is based on representation learning, in which the CTC-based models use the representation produced by BERT as an auxiliary learning target. The second method is based on joint classification learning, which combines GPT2 for text modeling with a hybrid CTC/attention architecture. Experiment on AISHELL-1 corpus yields a character error rate (CER) of 4.2% on the test set. When compared to the vanilla CTC-based models fine-tuned from the wav2vec2.0 models, our knowledge transferring method reduces CER by 16.1% relatively without external LMs. △ Less

Submitted 22 February, 2022; originally announced March 2022.

Comments: ICASSP 2022

arXiv:2202.05430 [pdf]

Wind power ramp prediction algorithm based on wavelet deep belief network

Authors: Zhenhao Tang, Qingyu Meng, Shengxian Cao, Yang Li, Zhongha Mu, Xiaoya Pang

Abstract: The wind power ramp events threaten the power grid safety significantly. To improve the ramp prediction accuracy, a hybrid wavelet deep belief network algorithm with adaptive feature selection (WDBNAFS) is proposed. First, the wind power characteristic is analyzed. Then, wavelet decomposition is addressed to the time series, and an adaptive feature selection algorithm is proposed to select the inp… ▽ More The wind power ramp events threaten the power grid safety significantly. To improve the ramp prediction accuracy, a hybrid wavelet deep belief network algorithm with adaptive feature selection (WDBNAFS) is proposed. First, the wind power characteristic is analyzed. Then, wavelet decomposition is addressed to the time series, and an adaptive feature selection algorithm is proposed to select the inputs of the prediction model. Finally, a deep belief network is employed to predict the wind power ramp event, and the proposed WDBNAFS was testified with the experiments based on the practical data. The simulation results demonstrate that the prediction accuracy of the proposed algorithm is more than 90%. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: in Chinese language

Journal ref: ACTA Energiae Solaris Sinica 40 (2019) 3213-3220

arXiv:2112.07254 [pdf, other]

Improving Hybrid CTC/Attention End-to-end Speech Recognition with Pretrained Acoustic and Language Model

Authors: Keqi Deng, Songjun Cao, Yike Zhang, Long Ma

Abstract: Recently, self-supervised pretraining has achieved impressive results in end-to-end (E2E) automatic speech recognition (ASR). However, the dominant sequence-to-sequence (S2S) E2E model is still hard to fully utilize the self-supervised pre-training methods because its decoder is conditioned on acoustic representation thus cannot be pretrained separately. In this paper, we propose a pretrained Tran… ▽ More Recently, self-supervised pretraining has achieved impressive results in end-to-end (E2E) automatic speech recognition (ASR). However, the dominant sequence-to-sequence (S2S) E2E model is still hard to fully utilize the self-supervised pre-training methods because its decoder is conditioned on acoustic representation thus cannot be pretrained separately. In this paper, we propose a pretrained Transformer (Preformer) S2S ASR architecture based on hybrid CTC/attention E2E models to fully utilize the pretrained acoustic models (AMs) and language models (LMs). In our framework, the encoder is initialized with a pretrained AM (wav2vec2.0). The Preformer leverages CTC as an auxiliary task during training and inference. Furthermore, we design a one-cross decoder (OCD), which relaxes the dependence on acoustic representations so that it can be initialized with pretrained LM (DistilGPT2). Experiments are conducted on the AISHELL-1 corpus and achieve a $4.6\%$ character error rate (CER) on the test set. Compared with our vanilla hybrid CTC/attention Transformer baseline, our proposed CTC/attention-based Preformer yields $27\%$ relative CER reduction. To the best of our knowledge, this is the first work to utilize both pretrained AM and LM in a S2S ASR system. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: ASRU2021

arXiv:2109.07349 [pdf, other]

Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-supervised Learning

Authors: Keqi Deng, Songjun Cao, Long Ma

Abstract: Recently, self-supervised pre-training has gained success in automatic speech recognition (ASR). However, considering the difference between speech accents in real scenarios, how to identify accents and use accent features to improve ASR is still challenging. In this paper, we employ the self-supervised pre-training method for both accent identification and accented speech recognition tasks. For t… ▽ More Recently, self-supervised pre-training has gained success in automatic speech recognition (ASR). However, considering the difference between speech accents in real scenarios, how to identify accents and use accent features to improve ASR is still challenging. In this paper, we employ the self-supervised pre-training method for both accent identification and accented speech recognition tasks. For the former task, a standard deviation constraint loss (SDC-loss) based end-to-end (E2E) architecture is proposed to identify accents under the same language. As for accented speech recognition task, we design an accent-dependent ASR system, which can utilize additional accent input features. Furthermore, we propose a frame-level accent feature, which is extracted based on the proposed accent identification model and can be dynamically adjusted. We pre-train our models using 960 hours unlabeled LibriSpeech dataset and fine-tune them on AESRC2020 speech dataset. The experimental results show that our proposed accent-dependent ASR system is significantly ahead of the AESRC2020 baseline and achieves $6.5\%$ relative word error rate (WER) reduction compared with our accent-independent ASR system. △ Less

Submitted 15 September, 2021; originally announced September 2021.

Comments: INTERSPEECH2021

arXiv:2109.07327 [pdf, ps, other]

Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning

Authors: Songjun Cao, Yueteng Kang, Yanzhe Fu, Xiaoshuo Xu, Sining Sun, Yike Zhang, Long Ma

Abstract: Recently self-supervised learning has emerged as an effective approach to improve the performance of automatic speech recognition (ASR). Under such a framework, the neural network is usually pre-trained with massive unlabeled data and then fine-tuned with limited labeled data. However, the non-streaming architecture like bidirectional transformer is usually adopted by the neural network to achieve… ▽ More Recently self-supervised learning has emerged as an effective approach to improve the performance of automatic speech recognition (ASR). Under such a framework, the neural network is usually pre-trained with massive unlabeled data and then fine-tuned with limited labeled data. However, the non-streaming architecture like bidirectional transformer is usually adopted by the neural network to achieve competitive results, which can not be used in streaming scenarios. In this paper, we mainly focus on improving the performance of streaming transformer under the self-supervised learning framework. Specifically, we propose a novel two-stage training method during fine-tuning, which combines knowledge distilling and self-training. The proposed training method achieves 16.3% relative word error rate (WER) reduction on Librispeech noisy test set. Finally, by only using the 100h clean subset of Librispeech as the labeled data and the rest (860h) as the unlabeled data, our streaming transformer based model obtains competitive WERs 3.5/8.7 on Librispeech clean/noisy test sets. △ Less

Submitted 15 September, 2021; originally announced September 2021.

Comments: INTERSPEECH2021

arXiv:2108.01240 [pdf]

doi 10.13334/j.0258-8013.pcsee.210574

Dynamic Prediction Model for NOx Emission of SCR System Based on Hybrid Data-driven Algorithms

Authors: Zhenhao Tang, Shikui Wang, Shengxian Cao, Yang Li, Tao Shen

Abstract: Aiming at the problem that delay time is difficult to determine and prediction accuracy is low in building prediction model of SCR system, a dynamic modeling scheme based on a hybrid of multiple data-driven algorithms was proposed. First, processed abnormal values and normalized the data. To improve the relevance of the input data, used MIC to estimate delay time and reconstructed production data.… ▽ More Aiming at the problem that delay time is difficult to determine and prediction accuracy is low in building prediction model of SCR system, a dynamic modeling scheme based on a hybrid of multiple data-driven algorithms was proposed. First, processed abnormal values and normalized the data. To improve the relevance of the input data, used MIC to estimate delay time and reconstructed production data. Then used combined feature selection method to determine input variables. To further mine data information, VMD was used to decompose input time series. Finally, established NOx emission prediction model combining ELM and EC model. Experimental results based on actual historical operating data show that the MAPE of predicted results is 2.61%. Model sensitivity analysis shows that besides the amount of ammonia injection, the inlet oxygen concentration and the flue gas temperature have a significant impact on NOx emission, which should be considered in SCR process control and optimization. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: in Chinese language, Accepted by Proceedings of the CSEE

Journal ref: Proceedings of the CSEE 42 (2022) 3295-3306

arXiv:2107.10327 [pdf, other]

mmPose-NLP: A Natural Language Processing Approach to Precise Skeletal Pose Estimation using mmWave Radars

Authors: Arindam Sengupta, Siyang Cao

Abstract: In this paper we presented mmPose-NLP, a novel Natural Language Processing (NLP) inspired Sequence-to-Sequence (Seq2Seq) skeletal key-point estimator using millimeter-wave (mmWave) radar data. To the best of the author's knowledge, this is the first method to precisely estimate upto 25 skeletal key-points using mmWave radar data alone. Skeletal pose estimation is critical in several applications r… ▽ More In this paper we presented mmPose-NLP, a novel Natural Language Processing (NLP) inspired Sequence-to-Sequence (Seq2Seq) skeletal key-point estimator using millimeter-wave (mmWave) radar data. To the best of the author's knowledge, this is the first method to precisely estimate upto 25 skeletal key-points using mmWave radar data alone. Skeletal pose estimation is critical in several applications ranging from autonomous vehicles, traffic monitoring, patient monitoring, gait analysis, to defense security forensics, and aid both preventative and actionable decision making. The use of mmWave radars for this task, over traditionally employed optical sensors, provide several advantages, primarily its operational robustness to scene lighting and adverse weather conditions, where optical sensor performance degrade significantly. The mmWave radar point-cloud (PCL) data is first voxelized (analogous to tokenization in NLP) and $N$ frames of the voxelized radar data (analogous to a text paragraph in NLP) is subjected to the proposed mmPose-NLP architecture, where the voxel indices of the 25 skeletal key-points (analogous to keyword extraction in NLP) are predicted. The voxel indices are converted back to real world 3-D coordinates using the voxel dictionary used during the tokenization process. Mean Absolute Error (MAE) metrics were used to measure the accuracy of the proposed system against the ground truth, with the proposed mmPose-NLP offering <3 cm localization errors in the depth, horizontal and vertical axes. The effect of the number of input frames vs performance/accuracy was also studied for N = {1,2,..,10}. A comprehensive methodology, results, discussions and limitations are presented in this paper. All the source codes and results are made available on GitHub for furthering research and development in this critical yet emerging domain of skeletal key-point estimation using mmWave radars. △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: Submitted to IEEE Transactions

arXiv:2107.03165 [pdf, other]

Improving Speech Recognition Accuracy of Local POI Using Geographical Models

Authors: Songjun Cao, Yike Zhang, Xiaobing Feng, Long Ma

Abstract: Nowadays voice search for points of interest (POI) is becoming increasingly popular. However, speech recognition for local POI has remained to be a challenge due to multi-dialect and massive POI. This paper improves speech recognition accuracy for local POI from two aspects. Firstly, a geographic acoustic model (Geo-AM) is proposed. The Geo-AM deals with multi-dialect problem using dialect-specifi… ▽ More Nowadays voice search for points of interest (POI) is becoming increasingly popular. However, speech recognition for local POI has remained to be a challenge due to multi-dialect and massive POI. This paper improves speech recognition accuracy for local POI from two aspects. Firstly, a geographic acoustic model (Geo-AM) is proposed. The Geo-AM deals with multi-dialect problem using dialect-specific input feature and dialect-specific top layer. Secondly, a group of geo-specific language models (Geo-LMs) are integrated into our speech recognition system to improve recognition accuracy of long tail and homophone POI. During decoding, specific language models are selected on demand according to users' geographic location. Experiments show that the proposed Geo-AM achieves 6.5%$\sim$10.1% relative character error rate (CER) reduction on an accent testset and the proposed Geo-AM and Geo-LM totally achieve over 18.7% relative CER reduction on Tencent Map task. △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: Accepted by SLT 2021

arXiv:2010.09586 [pdf, other]

Brain Atlas Guided Attention U-Net for White Matter Hyperintensity Segmentation

Authors: Zicong Zhang, Kimerly Powell, Changchang Yin, Shilei Cao, Dani Gonzalez, Yousef Hannawi, ** Zhang

Abstract: White Matter Hyperintensities (WMH) are the most common manifestation of cerebral small vessel disease (cSVD) on the brain MRI. Accurate WMH segmentation algorithms are important to determine cSVD burden and its clinical consequences. Most of existing WMH segmentation algorithms require both fluid attenuated inversion recovery (FLAIR) images and T1-weighted images as inputs. However, T1-weighted i… ▽ More White Matter Hyperintensities (WMH) are the most common manifestation of cerebral small vessel disease (cSVD) on the brain MRI. Accurate WMH segmentation algorithms are important to determine cSVD burden and its clinical consequences. Most of existing WMH segmentation algorithms require both fluid attenuated inversion recovery (FLAIR) images and T1-weighted images as inputs. However, T1-weighted images are typically not part of standard clinicalscans which are acquired for patients with acute stroke. In this paper, we propose a novel brain atlas guided attention U-Net (BAGAU-Net) that leverages only FLAIR images with a spatially-registered white matter (WM) brain atlas to yield competitive WMH segmentation performance. Specifically, we designed a dual-path segmentation model with two novel connecting mechanisms, namely multi-input attention module (MAM) and attention fusion module (AFM) to fuse the information from two paths for accurate results. Experiments on two publicly available datasets show the effectiveness of the proposed BAGAU-Net. With only FLAIR images and WM brain atlas, BAGAU-Net outperforms the state-of-the-art method with T1-weighted images, paving the way for effective development of WMH segmentation. Availability:https://github.com/Ericzhang1/BAGAU-Net △ Less

Submitted 21 December, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: Accepted by AMIA 2021 Virtual Informatics Summit

arXiv:2007.12756 [pdf]

Detecting Dynamic States of Temporal Networks Using Connection Series Tensors

Authors: Shun Cao, Hiroki Sayama

Abstract: Many temporal networks exhibit multiple system states, such as weekday and weekend patterns in social contact networks. The detection of such distinct states in temporal network data has recently been explored as it helps reveal underlying dynamical processes. A commonly used method is network aggregation over a time window, which aggregates a subsequence of multiple network snapshots into one sta… ▽ More Many temporal networks exhibit multiple system states, such as weekday and weekend patterns in social contact networks. The detection of such distinct states in temporal network data has recently been explored as it helps reveal underlying dynamical processes. A commonly used method is network aggregation over a time window, which aggregates a subsequence of multiple network snapshots into one static network. This method, however, necessarily discards temporal dynamics within the time window. Here we develop a new method for detecting dynamic states in temporal networks using information regarding the timeline of contacts between each pair of nodes. We apply a similarity measure informed by the techniques of processing time series and community detection to sequentially discompose a given temporal network into multiple dynamic states (including repeated ones). Experiments with empirical temporal network data demonstrated that our method outperformed the conventional approach using simple network aggregation in revealing interpretable system states. In addition, our method allows users to analyze hierarchical temporal structures and to uncover dynamic state at different spatial/temporal resolutions. △ Less

Submitted 19 August, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

Comments: 18 pages, 9 figures, 3 tables

arXiv:2007.07241 [pdf, other]

Learning Frame Level Attention for Environmental Sound Classification

Authors: Zhichao Zhang, Shugong Xu, Shunqing Zhang, Tianhao Qiao, Shan Cao

Abstract: Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to foc… ▽ More Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose a convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. We investigated the classification performance when using different attention scaling function and applying different layers. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and our method achieved the state-of-the-art or competitive classification accuracy with lower computational complexity. We also visualized our attention results and observed that the proposed attention mechanism was able to lead the network tofocus on the semantically relevant parts of environmental sounds. △ Less

Submitted 12 July, 2020; originally announced July 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1907.02230

arXiv:2005.00205 [pdf, other]

Multi-head Monotonic Chunkwise Attention For Online Speech Recognition

Authors: Baiji Liu, Songjun Cao, Sining Sun, Weibin Zhang, Long Ma

Abstract: The attention mechanism of the Listen, Attend and Spell (LAS) model requires the whole input sequence to calculate the attention context and thus is not suitable for online speech recognition. To deal with this problem, we propose multi-head monotonic chunk-wise attention (MTH-MoChA), an improved version of MoChA. MTH-MoChA splits the input sequence into small chunks and computes multi-head attent… ▽ More The attention mechanism of the Listen, Attend and Spell (LAS) model requires the whole input sequence to calculate the attention context and thus is not suitable for online speech recognition. To deal with this problem, we propose multi-head monotonic chunk-wise attention (MTH-MoChA), an improved version of MoChA. MTH-MoChA splits the input sequence into small chunks and computes multi-head attentions over the chunks. We also explore useful training strategies such as LSTM pooling, minimum world error rate training and SpecAugment to further improve the performance of MTH-MoChA. Experiments on AISHELL-1 data show that the proposed model, along with the training strategies, improve the character error rate (CER) of MoChA from 8.96% to 7.68% on test set. On another 18000 hours in-car speech data set, MTH-MoChA obtains 7.28% CER, which is significantly better than a state-of-the-art hybrid system. △ Less

Submitted 1 May, 2020; originally announced May 2020.

arXiv:2004.02872 [pdf, other]

Lossless Image Compression through Super-Resolution

Authors: Sheng Cao, Chao-Yuan Wu, Philipp Krähenbühl

Abstract: We introduce a simple and efficient lossless image compression algorithm. We store a low resolution version of an image as raw pixels, followed by several iterations of lossless super-resolution. For lossless super-resolution, we predict the probability of a high-resolution image, conditioned on the low-resolution input, and use entropy coding to compress this super-resolution operator. Super-Reso… ▽ More We introduce a simple and efficient lossless image compression algorithm. We store a low resolution version of an image as raw pixels, followed by several iterations of lossless super-resolution. For lossless super-resolution, we predict the probability of a high-resolution image, conditioned on the low-resolution input, and use entropy coding to compress this super-resolution operator. Super-Resolution based Compression (SReC) is able to achieve state-of-the-art compression rates with practical runtimes on large datasets. Code is available online at https://github.com/caoscott/SReC. △ Less

Submitted 6 April, 2020; originally announced April 2020.

Comments: Tech report

arXiv:2003.02386 [pdf, ps, other]

mmFall: Fall Detection using 4D MmWave Radar and a Hybrid Variational RNN AutoEncoder

Authors: Feng **, Arindam Sengupta, Siyang Cao

Abstract: In this paper we propose mmFall - a novel fall detection system, which comprises of (i) the emerging millimeter-wave (mmWave) radar sensor to collect the human body's point cloud along with the body centroid, and (ii) a variational recurrent autoencoder (VRAE) to compute the anomaly level of the body motion based on the acquired point cloud. A fall is claimed to have occurred when the spike in ano… ▽ More In this paper we propose mmFall - a novel fall detection system, which comprises of (i) the emerging millimeter-wave (mmWave) radar sensor to collect the human body's point cloud along with the body centroid, and (ii) a variational recurrent autoencoder (VRAE) to compute the anomaly level of the body motion based on the acquired point cloud. A fall is claimed to have occurred when the spike in anomaly level and the drop in centroid height occur simultaneously. The mmWave radar sensor provides several advantages, such as privacycompliance and high-sensitivity to motion, over the traditional sensing modalities. However, (i) randomness in radar point cloud data and (ii) difficulties in fall collection/labeling in the traditional supervised fall detection approaches are the two main challenges. To overcome the randomness in radar data, the proposed VRAE uses variational inference, a probabilistic approach rather than the traditional deterministic approach, to infer the posterior probability of the body's latent motion state at each frame, followed by a recurrent neural network (RNN) to learn the temporal features of the motion over multiple frames. Moreover, to circumvent the difficulties in fall data collection/labeling, the VRAE is built upon an autoencoder architecture in a semi-supervised approach, and trained on only normal activities of daily living (ADL) such that in the inference stage the VRAE will generate a spike in the anomaly level once an abnormal motion, such as fall, occurs. During the experiment, we implemented the VRAE along with two other baselines, and tested on the dataset collected in an apartment. The receiver operating characteristic (ROC) curve indicates that our proposed model outperforms the other two baselines, and achieves 98% detection out of 50 falls at the expense of just 2 false alarms. △ Less

Submitted 28 July, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

Comments: Preprint version

arXiv:2002.01527 [pdf]

Prediction of Component Shifts in Pick and Place Process of Surface Mount Technology Using Support Vector Regression

Authors: Shun Cao, Irandokht Parviziomran, Haeyong Yang, Seungbae Park, Daehan Won

Abstract: In pick and place (P&P) process of surface mount technology (SMT) the placed component can shift from its ideal (or designed) position on the wet solder paste. The solder paste with some fluid properties could slump and the unbalance between different sides of solder paste can lead to other forces on the components as well. Though the shifts are usually considered to be negligible and can be made… ▽ More In pick and place (P&P) process of surface mount technology (SMT) the placed component can shift from its ideal (or designed) position on the wet solder paste. The solder paste with some fluid properties could slump and the unbalance between different sides of solder paste can lead to other forces on the components as well. Though the shifts are usually considered to be negligible and can be made up to some extent by the following self-alignment during the process of soldering reflow, it should be attracted attention as its importance for addressing the quality of the printed circuit board (PCB) in SMT. To minimize or control the component shifts, whose relationship with the characteristics of the solder paste (e.g., offset, volume) should be studied initially. In this paper, we design a comprehensive experiment and collect the data from a state-of-the-art SMT assembly line. Then we use support vector regression (SVR) model to predict the component shifts based on different situations of solder paste and placement settings. Also, two kernel functions, linear (SVR-Linear) and radial basis function (SVR-RBF), are employed. The achieved results indicate that the component shift in P&P process is significant, and the SVR model is highly qualified for the forecast of the component shifts. Particularly, the SVR-RBF model outperforms the SVR-Linear model considering the prediction error. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: 8 pages, 8 figures, 5 tables, 25th International Conference on Production Research Manufacturing Innovation: Cyber Physical Manufacturing August 9-14, 2019 | Chicago, Illinois (USA)

arXiv:2002.01255 [pdf, other]

Revealing Much While Saying Less: Predictive Wireless for Status Update

Authors: Zhiyuan Jiang, Zixu Cao, Siyu Fu, Fei Peng, Shan Cao, Shunqing Zhang, Shugong Xu

Abstract: Wireless communications for status update are becoming increasingly important, especially for machine-type control applications. Existing work has been mainly focused on Age of Information (AoI) optimizations. In this paper, a status-aware predictive wireless interface design, networking and implementation are presented which aim to minimize the status recovery error of a wireless networked system… ▽ More Wireless communications for status update are becoming increasingly important, especially for machine-type control applications. Existing work has been mainly focused on Age of Information (AoI) optimizations. In this paper, a status-aware predictive wireless interface design, networking and implementation are presented which aim to minimize the status recovery error of a wireless networked system by leveraging online status model predictions. Two critical issues of predictive status update are addressed: practicality and usefulness. Link-level experiments on a Software-Defined-Radio (SDR) testbed are conducted and test results show that the proposed design can significantly reduce the number of wireless transmissions while maintaining a low status recovery error. A Status-aware Multi-Agent Reinforcement learning neTworking solution (SMART) is proposed to dynamically and autonomously control the transmit decisions of devices in an ad hoc network based on their individual statuses. System-level simulations of a multi dense platooning scenario are carried out on a road traffic simulator. Results show that the proposed schemes can greatly improve the platooning control performance in terms of the minimum safe distance between successive vehicles, in comparison with the AoI-optimized status-unaware and communication latency-optimized schemes---this demonstrates the usefulness of our proposed status update schemes in a real-world application. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: To appear in IEEE INFOCOM 2020

arXiv:2001.09619 [pdf]

Data-Driven Prediction Model of Components Shift during Reflow Process in Surface Mount Technology

Authors: Irandokht Parviziomran, Shun Cao, Krishnaswami Srihari, Daehan Won

Abstract: In surface mount technology (SMT), mounted components on soldered pads are subject to move during reflow process. This capability is known as self-alignment and is the result of fluid dynamic behaviour of molten solder paste. This capability is critical in SMT because inaccurate self-alignment causes defects such as overhanging, tombstoning, etc. while on the other side, it can enable components t… ▽ More In surface mount technology (SMT), mounted components on soldered pads are subject to move during reflow process. This capability is known as self-alignment and is the result of fluid dynamic behaviour of molten solder paste. This capability is critical in SMT because inaccurate self-alignment causes defects such as overhanging, tombstoning, etc. while on the other side, it can enable components to be perfectly self-assembled on or near the desire position. The aim of this study is to develop a machine learning model that predicts the components movement during reflow in x and y-directions as well as rotation. Our study is composed of two steps: (1) experimental data are studied to reveal the relationships between self-alignment and various factors including component geometry, pad geometry, etc. (2) advanced machine learning prediction models are applied to predict the distance and the direction of components shift using support vector regression (SVR), neural network (NN), and random forest regression (RFR). As a result, RFR can predict components shift with the average fitness of 99%, 99%, and 96% and with average prediction error of 13.47 (um), 12.02 (um), and 1.52 (deg.) for component shift in x, y, and rotational directions, respectively. This enhancement provides the future capability of the parameters' optimization in the pick and placement machine to control the best placement location and minimize the intrinsic defects caused by the self-alignment. △ Less

Submitted 27 January, 2020; originally announced January 2020.

arXiv:2001.09612 [pdf]

Optimization of Passive Chip Components Placement with Self-Alignment Effect for Advanced Surface Mounting Technology

Authors: Irandokht Parviziomran, Shun Cao, Haeyong Yang, Seungbae Park, Daehan Won

Abstract: Surface mount technology (SMT) is an enhanced method in electronic packaging in which electronic components are placed directly on soldered printing circuit board (PCB) and are permanently attached on PCB with the aim of reflow soldering process. During reflow process, once deposited solder pastes start melting, electronic components move in a direction that achieve their highest symmetry. This mo… ▽ More Surface mount technology (SMT) is an enhanced method in electronic packaging in which electronic components are placed directly on soldered printing circuit board (PCB) and are permanently attached on PCB with the aim of reflow soldering process. During reflow process, once deposited solder pastes start melting, electronic components move in a direction that achieve their highest symmetry. This motion is known as self-alignment since can correct potential mounting misalignment. In this study, two noticeable machine learning algorithms, including support vector regression (SVR) and random forest regression (RFR) are proposed as a prediction technique to (1) diagnose the relation among component self-alignment, deposited solder paste status and placement machining parameters, (2) predict the final component position on PCB in x, y, and rotational directions before entering in the reflow process. Based on the prediction result, a non-linear optimization model (NLP) is developed to optimize placement parameters at initial stage. Resultantly, RFR outperforms in terms of prediction model fitness and error. The optimization model is run for 6 samples in which the minimum Euclidean distance from component position after reflow process from ideal position (i.e., the center of pads) is outlined as 25.57 (μm) regarding defined boundaries in model. △ Less

Submitted 27 January, 2020; originally announced January 2020.

arXiv:2001.00068 [pdf, other]

Asymptotic convergence rate of the longest run in an inflating Bernoulli net

Authors: Kai Ni, Shanshan Cao, Xiaoming Huo

Abstract: In image detection, one problem is to test whether the set, though mostly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve, for example, a curve with $C^α$-norm bounded by $β$. One approach is to analyze the data by counting membership in multiscale multianisotropic strips, which involves an algorithm that delves into the… ▽ More In image detection, one problem is to test whether the set, though mostly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve, for example, a curve with $C^α$-norm bounded by $β$. One approach is to analyze the data by counting membership in multiscale multianisotropic strips, which involves an algorithm that delves into the length of the path connecting many consecutive "significant" nodes. In this paper, we develop the mathematical formalism of this algorithm and analyze the statistical property of the length of the longest significant run. The rate of convergence is derived. Using percolation theory and random graph theory, we present a novel probabilistic model named pseudo-tree model. Based on the asymptotic results for pseudo-tree model, we further study the length of the longest significant run in an "inflating" Bernoulli net. We find that the probability parameter $p$ of significant node plays an important role: there is a threshold $p_c$, such that in the cases of $p<p_c$ and $p>p_c$, very different asymptotic behaviors of the length of the significant are observed. We apply our results to the detection of an underlying curvilinear feature and argue that we achieve the lowest possible detectable strength in theory. △ Less

Submitted 31 December, 2019; originally announced January 2020.

arXiv:1911.09592 [pdf, other]

doi 10.1109/JSEN.2020.2991741

mm-Pose: Real-Time Human Skeletal Posture Estimation using mmWave Radars and CNNs

Authors: Arindam Sengupta, Feng **, Renyuan Zhang, Siyang Cao

Abstract: In this paper, mm-Pose, a novel approach to detect and track human skeletons in real-time using an mmWave radar, is proposed. To the best of the authors' knowledge, this is the first method to detect >15 distinct skeletal joints using mmWave radar reflection signals. The proposed method would find several applications in traffic monitoring systems, autonomous vehicles, patient monitoring systems a… ▽ More In this paper, mm-Pose, a novel approach to detect and track human skeletons in real-time using an mmWave radar, is proposed. To the best of the authors' knowledge, this is the first method to detect >15 distinct skeletal joints using mmWave radar reflection signals. The proposed method would find several applications in traffic monitoring systems, autonomous vehicles, patient monitoring systems and defense forces to detect and track human skeleton for effective and preventive decision making in real-time. The use of radar makes the system operationally robust to scene lighting and adverse weather conditions. The reflected radar point cloud in range, azimuth and elevation are first resolved and projected in Range-Azimuth and Range-Elevation planes. A novel low-size high-resolution radar-to-image representation is also presented, that overcomes the sparsity in traditional point cloud data and offers significant reduction in the subsequent machine learning architecture. The RGB channels were assigned with the normalized values of range, elevation/azimuth and the power level of the reflection signals for each of the points. A forked CNN architecture was used to predict the real-world position of the skeletal joints in 3-D space, using the radar-to-image representation. The proposed method was tested for a single human scenario for four primary motions, (i) Walking, (ii) Swinging left arm, (iii) Swinging right arm, and (iv) Swinging both arms to validate accurate predictions for motion in range, azimuth and elevation. The detailed methodology, implementation, challenges, and validation results are presented. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: Submitted to IEEE Sensors Journal

arXiv:1911.06372 [pdf, ps, other]

doi 10.1109/TVT.2019.2901493

Automotive Radar Interference Mitigation Using Adaptive Noise Canceller

Authors: Feng **, Siyang Cao

Abstract: Interference among frequency modulated continues wave automotive radars can either increase the noise floor, which occurs in the most cases, or generate a ghost target in rare situations. To address the increment of noise floor due to interference, we proposed a low calculation cost method using adaptive noise canceller to increase the signal-to-interference ratio. In a quadrature receiver, the in… ▽ More Interference among frequency modulated continues wave automotive radars can either increase the noise floor, which occurs in the most cases, or generate a ghost target in rare situations. To address the increment of noise floor due to interference, we proposed a low calculation cost method using adaptive noise canceller to increase the signal-to-interference ratio. In a quadrature receiver, the interference in the positive half of frequency spectrum is correlated to that in the negative half of frequency spectrum, whereas the beat frequencies from real targets are always present in the positive frequency. Thus, we estimated the power of the negative frequency as an indication of interference, and fed the positive frequency and negative frequency components into the primary and reference channel of an adaptive noise canceller, respectively. The least mean square algorithm was used to solve for the optimum filter solution. As a result, both the simulation and experiment showed a good interference mitigation performance. △ Less

Submitted 14 November, 2019; originally announced November 2019.

Comments: This paper has been submitted to IEEE Transactions on Vehicular Technology

Journal ref: in IEEE Transactions on Vehicular Technology, vol. 68, no. 4, pp. 3747-3754, April 2019

arXiv:1911.06364 [pdf, ps, other]

MmWave Radar Point Cloud Segmentation using GMM in Multimodal Traffic Monitoring

Authors: Feng **, Arindam Sengupta, Siyang Cao, Yao-Jan Wu

Abstract: In multimodal traffic monitoring, we gather traffic statistics for distinct transportation modes, such as pedestrians, cars and bicycles, in order to analyze and improve people's daily mobility in terms of safety and convenience. On account of its robustness to bad light and adverse weather conditions, and inherent speed measurement ability, the radar sensor is a suitable option for this applicati… ▽ More In multimodal traffic monitoring, we gather traffic statistics for distinct transportation modes, such as pedestrians, cars and bicycles, in order to analyze and improve people's daily mobility in terms of safety and convenience. On account of its robustness to bad light and adverse weather conditions, and inherent speed measurement ability, the radar sensor is a suitable option for this application. However, the sparse radar data from conventional commercial radars make it extremely challenging for transportation mode classification. Thus, we propose to use a high-resolution millimeter-wave(mmWave) radar sensor to obtain a relatively richer radar point cloud representation for a traffic monitoring scenario. Based on a new feature vector, we use the multivariate Gaussian mixture model (GMM) to do the radar point cloud segmentation, i.e. `point-wise' classification, in an unsupervised learning environment. In our experiment, we collected radar point clouds for pedestrians and cars, which also contained the inevitable clutter from the surroundings. The experimental results using GMM on the new feature vector demonstrated a good segmentation performance in terms of the intersection-over-union (IoU) metrics. The detailed methodology and validation metrics are presented and discussed. △ Less

Submitted 31 January, 2020; v1 submitted 14 November, 2019; originally announced November 2019.

Comments: This paper has been accepted by the IEEE International Radar Conference 2020

arXiv:1911.06363 [pdf, ps, other]

doi 10.1109/RADAR.2019.8835656

Multiple Patients Behavior Detection in Real-time using mmWave Radar and Deep CNNs

Authors: Feng **, Renyuan Zhang, Arindam Sengupta, Siyang Cao, Salim Hariri, Nimit K. Agarwal, Sumit K. Agarwal

Abstract: To address potential gaps noted in patient monitoring in the hospital, a novel patient behavior detection system using mmWave radar and deep convolution neural network (CNN), which supports the simultaneous recognition of multiple patients' behaviors in real-time, is proposed. In this study, we use an mmWave radar to track multiple patients and detect the scattering point cloud of each one. For ea… ▽ More To address potential gaps noted in patient monitoring in the hospital, a novel patient behavior detection system using mmWave radar and deep convolution neural network (CNN), which supports the simultaneous recognition of multiple patients' behaviors in real-time, is proposed. In this study, we use an mmWave radar to track multiple patients and detect the scattering point cloud of each one. For each patient, the Doppler pattern of the point cloud over a time period is collected as the behavior signature. A three-layer CNN model is created to classify the behavior for each patient. The tracking and point clouds detection algorithm was also implemented on an mmWave radar hardware platform with an embedded graphics processing unit (GPU) board to collect Doppler pattern and run the CNN model. A training dataset of six types of behavior were collected, over a long duration, to train the model using Adam optimizer with an objective to minimize cross-entropy loss function. Lastly, the system was tested for real-time operation and obtained a very good inference accuracy when predicting each patient's behavior in a two-patient scenario. △ Less

Submitted 14 November, 2019; originally announced November 2019.

Comments: This paper has been submitted to IEEE Radar Conference 2019

arXiv:1908.05863 [pdf, other]

Sub-Spectrogram Segmentation for Environmental Sound Classification via Convolutional Recurrent Neural Network and Score Level Fusion

Authors: Tianhao Qiao, Shunqing Zhang, Zhichao Zhang, Shan Cao, Shugong Xu

Abstract: Environmental Sound Classification (ESC) is an important and challenging problem, and feature representation is a critical and even decisive factor in ESC. Feature representation ability directly affects the accuracy of sound classification. Therefore, the ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. In this paper, we… ▽ More Environmental Sound Classification (ESC) is an important and challenging problem, and feature representation is a critical and even decisive factor in ESC. Feature representation ability directly affects the accuracy of sound classification. Therefore, the ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. In this paper, we propose a subspectrogram segmentation based ESC classification framework. In addition, we adopt the proposed Convolutional Recurrent Neural Network (CRNN) and score level fusion to jointly improve the classification accuracy. Extensive truncation schemes are evaluated to find the optimal number and the corresponding band ranges of sub-spectrograms. Based on the numerical experiments, the proposed framework can achieve 81.9% ESC classification accuracy on the public dataset ESC-50, which provides 9.1% accuracy improvement over traditional baseline schemes. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Comments: accepted in the 2019 IEEE International Workshop on Signal Processing Systems (SiPS2019)

arXiv:1908.02334 [pdf]

Predicted disease compositions of human gliomas estimated from multiparametric MRI can predict endothelial proliferation, tumor grade, and overall survival

Authors: Emily E Diller, Sha Cao, Beth Ey, Robert Lober, Jason G Parker

Abstract: Background and Purpose: Biopsy is the main determinants of glioma clinical management, but require invasive sampling that fail to detect relevant features because of tumor heterogeneity. The purpose of this study was to evaluate the accuracy of a voxel-wise, multiparametric MRI radiomic method to predict features and develop a minimally invasive method to objectively assess neoplasms. Methods: M… ▽ More Background and Purpose: Biopsy is the main determinants of glioma clinical management, but require invasive sampling that fail to detect relevant features because of tumor heterogeneity. The purpose of this study was to evaluate the accuracy of a voxel-wise, multiparametric MRI radiomic method to predict features and develop a minimally invasive method to objectively assess neoplasms. Methods: Multiparametric MRI were registered to T1-weighted gadolinium contrast-enhanced data using a 12 degree-of-freedom affine model. The retrospectively collected MRI data included T1-weighted, T1-weighted gadolinium contrast-enhanced, T2-weighted, fluid attenuated inversion recovery, and multi-b-value diffusion-weighted acquired at 1.5T or 3.0T. Clinical experts provided voxel-wise annotations for five disease states on a subset of patients to establish a training feature vector of 611,930 observations. Then, a k-nearest-neighbor (k-NN) classifier was trained using a 25% hold-out design. The trained k-NN model was applied to 13,018,171 observations from seventeen histologically confirmed glioma patients. Linear regression tested overall survival (OS) relationship to predicted disease compositions (PDC) and diagnostic age (alpha = 0.05). Canonical discriminant analysis tested if PDC and diagnostic age could differentiate clinical, genetic, and microscopic factors (alpha = 0.05). Results: The model predicted voxel annotation class with a Dice similarity coefficient of 94.34% +/- 2.98. Linear combinations of PDCs and diagnostic age predicted OS (p = 0.008), grade (p = 0.014), and endothelia proliferation (p = 0.003); but fell short predicting gene mutations for TP53BP1 and IDH1. Conclusions: This voxel-wise, multi-parametric MRI radiomic strategy holds potential as a non-invasive decision-making aid for clinicians managing patients with glioma. △ Less

Submitted 6 August, 2019; originally announced August 2019.

Comments: 13 pages, 3 figures, 5 tables

arXiv:1907.02230 [pdf, other]

Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification

Authors: Zhichao Zhang, Shugong Xu, Tianhao Qiao, Shunqing Zhang, Shan Cao

Abstract: Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the s… ▽ More Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The ESC performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the semantically relevant frames and salient frames. Specifically, we first propose an convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and achieved the state-of-the-art performance in terms of classification accuracy. △ Less

Submitted 4 July, 2019; originally announced July 2019.

Comments: Accepted to Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 2019

arXiv:1907.00594 [pdf, other]

Fingerprint-based Localization using Commercial LTE Signals: A Field-Trial Study

Authors: Heng Zhang, Zhichao Zhang, Shunqing Zhang, Shugong Xu, Shan Cao

Abstract: Wireless localization for mobile device has attracted more and more interests by increasing the demand for location based services. Fingerprint-based localization is promising, especially in non-Line-of-Sight (NLoS) or rich scattering environments, such as urban areas and indoor scenarios. In this paper, we propose a novel fingerprint-based localization technique based on deep learning framework u… ▽ More Wireless localization for mobile device has attracted more and more interests by increasing the demand for location based services. Fingerprint-based localization is promising, especially in non-Line-of-Sight (NLoS) or rich scattering environments, such as urban areas and indoor scenarios. In this paper, we propose a novel fingerprint-based localization technique based on deep learning framework under commercial long term evolution (LTE) systems. Specifically, we develop a software defined user equipment to collect the real time channel state information (CSI) knowledge from LTE base stations and extract the intrinsic features among CSI observations. On top of that, we propose a time domain fusion approach to assemble multiple positioning estimations. Experimental results demonstrated that the proposed localization technique can significantly improve the localization accuracy and robustness, e.g. achieves Mean Distance Error (MDE) of 0.47 meters for indoor and of 19.9 meters for outdoor scenarios, respectively. △ Less

Submitted 1 July, 2019; originally announced July 2019.

Comments: 5 pages, 7 figures, conference

arXiv:1905.05953 [pdf]

Learning-based Single-step Quantitative Susceptibility Map** Reconstruction Without Brain Extraction

Authors: Hongjiang Wei, Steven Cao, Yuyao Zhang, Xiaojun Guan, Fuhua Yan, Kristen W. Yeom, Chunlei Liu

Abstract: Quantitative susceptibility map** (QSM) estimates the underlying tissue magnetic susceptibility from MRI gradient-echo phase signal and typically requires several processing steps. These steps involve phase unwrap**, brain volume extraction, background phase removal and solving an ill-posed inverse problem. The resulting susceptibility map is known to suffer from inaccuracy near the edges of t… ▽ More Quantitative susceptibility map** (QSM) estimates the underlying tissue magnetic susceptibility from MRI gradient-echo phase signal and typically requires several processing steps. These steps involve phase unwrap**, brain volume extraction, background phase removal and solving an ill-posed inverse problem. The resulting susceptibility map is known to suffer from inaccuracy near the edges of the brain tissues, in part due to imperfect brain extraction, edge erosion of the brain tissue and the lack of phase measurement outside the brain. This inaccuracy has thus hindered the application of QSM for measuring the susceptibility of tissues near the brain edges, e.g., quantifying cortical layers and generating superficial venography. To address these challenges, we propose a learning-based QSM reconstruction method that directly estimates the magnetic susceptibility from total phase images without the need for brain extraction and background phase removal, referred to as autoQSM. The neural network has a modified U-net structure and is trained using QSM maps computed by a two-step QSM method. 209 healthy subjects with ages ranging from 11 to 82 years were employed for patch-wise network training. The network was validated on data dissimilar to the training data, e.g. in vivo mouse brain data and brains with lesions, which suggests that the network has generalized and learned the underlying mathematical relationship between magnetic field perturbation and magnetic susceptibility. AutoQSM was able to recover magnetic susceptibility of anatomical structures near the edges of the brain including the veins covering the cortical surface, spinal cord and nerve tracts near the mouse brain boundaries. The advantages of high-quality maps, no need for brain volume extraction and high reconstruction speed demonstrate its potential for future applications. △ Less

Submitted 15 May, 2019; originally announced May 2019.

Comments: 26 pages

Showing 1–50 of 59 results for author: Cao, S