Search | arXiv e-print repository

Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness

Authors: Yang Zhang, Mingying Li, Huilin Pan, Moyun Liu, Yang Zhou

Abstract: Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neu… ▽ More Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neural architecture search (NAS) that automates the model design process gains considerable attention because of its promising performance. However, NAS is computationally intensive due to the large search space and huge data volume. In this work, we propose an efficient NAS-based framework for visual fault detection of freight trains to search for the task-specific detection head with capacities of multi-scale representation. First, we design a scale-aware search space for discovering an effective receptive field in the head. Second, we explore the robustness of data volume to reduce search costs based on the specifically designed search space, and a novel sharing strategy is proposed to reduce memory and further improve search efficiency. Extensive experimental results demonstrate the effectiveness of our method with data volume robustness, which achieves 46.8 and 47.9 mAP on the Bottom View and Side View datasets, respectively. Our framework outperforms the state-of-the-art approaches and linearly decreases the search costs with reduced data volumes. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 11 pages, 8 figures

arXiv:2405.13901 [pdf, other]

DCT-Based Decorrelated Attention for Vision Transformers

Authors: Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Koushik Biswas, Ahmet Enis Cetin, Ulas Bagci

Abstract: Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transf… ▽ More Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transformers by introducing a simple, yet highly innovative, initialization approach utilizing Discrete Cosine Transform (DCT) coefficients. Our proposed DCT-based attention initialization marks a significant gain compared to traditional initialization strategies; offering a robust foundation for the attention mechanism. Our experiments reveal that the DCT-based initialization enhances the accuracy of Vision Transformers in classification tasks. (ii) We also recognize that since DCT effectively decorrelates image information in the frequency domain, this decorrelation is useful for compression because it allows the quantization step to discard many of the higher-frequency components. Based on this observation, we propose a novel DCT-based compression technique for the attention function of Vision Transformers. Since high-frequency DCT coefficients usually correspond to noise, we truncate the high-frequency DCT components of the input patches. Our DCT-based compression reduces the size of weight matrices for queries, keys, and values. While maintaining the same level of accuracy, our DCT compressed Swin Transformers obtain a considerable decrease in the computational overhead. △ Less

Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.12367 [pdf, other]

Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning

Authors: Zheyuan Zhang, Elif Keles, Gorkem Durak, Yavuz Taktak, Onkar Susladkar, Vandan Gorade, Debesh Jha, Asli C. Ormeci, Alpay Medetalibeyoglu, Lanhong Yao, Bin Wang, Ilkin Sevgi Isler, Linkai Peng, Hongyi Pan, Camila Lopes Vendrami, Amir Bourhani, Yury Velichko, Boqing Gong, Concetto Spampinato, Ayis Pyrros, Pallavi Tiwari, Derk C. F. Klatte, Megan Engels, Sanne Hoogenboom, Candice W. Bolan , et al. (13 additional authors not shown)

Abstract: Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective st… ▽ More Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1W) and T2-weighted (T2W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We developed a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet's accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen's kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (std: 7.2%, at case level) with CT, 85.0% (std: 7.9%) with T1W MRI, and 86.3% (std: 6.4%) with T2W MRI. There was a high correlation for pancreas volume prediction with R^2 of 0.91, 0.84, and 0.85 for CT, T1W, and T2W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1W and T2W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at https://osf.io/kysnj/. Our source code is available at https://github.com/NUBagciLab/PaNSegNet. △ Less

Submitted 25 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: under review version

arXiv:2405.06166 [pdf, other]

MDNet: Multi-Decoder Network for Abdominal CT Organs Segmentation

Authors: Debesh Jha, Nikhil Kumar Tomar, Koushik Biswas, Gorkem Durak, Matthew Antalek, Zheyuan Zhang, Bin Wang, Md Mostafijur Rahman, Hongyi Pan, Alpay Medetalibeyoglu, Yury Velichko, Daniela Ladner, Amir Borhani, Ulas Bagci

Abstract: Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple di… ▽ More Accurate segmentation of organs from abdominal CT scans is essential for clinical applications such as diagnosis, treatment planning, and patient monitoring. To handle challenges of heterogeneity in organ shapes, sizes, and complex anatomical relationships, we propose a \textbf{\textit{\ac{MDNet}}}, an encoder-decoder network that uses the pre-trained \textit{MiT-B2} as the encoder and multiple different decoder networks. Each decoder network is connected to a different part of the encoder via a multi-scale feature enhancement dilated block. With each decoder, we increase the depth of the network iteratively and refine segmentation masks, enriching feature maps by integrating previous decoders' feature maps. To refine the feature map further, we also utilize the predicted masks from the previous decoder to the current decoder to provide spatial attention across foreground and background regions. MDNet effectively refines the segmentation mask with a high dice similarity coefficient (DSC) of 0.9013 and 0.9169 on the Liver Tumor segmentation (LiTS) and MSD Spleen datasets. Additionally, it reduces Hausdorff distance (HD) to 3.79 for the LiTS dataset and 2.26 for the spleen segmentation dataset, underscoring the precision of MDNet in capturing the complex contours. Moreover, \textit{\ac{MDNet}} is more interpretable and robust compared to the other baseline models. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.01503 [pdf, other]

PAM-UNet: Shifting Attention on Region of Interest in Medical Images

Authors: Abhijit Das, Debesh Jha, Vandan Gorade, Koushik Biswas, Hongyi Pan, Zheyuan Zhang, Daniela P. Ladner, Yury Velichko, Amir Borhani, Ulas Bagci

Abstract: Computer-aided segmentation methods can assist medical personnel in improving diagnostic outcomes. While recent advancements like UNet and its variants have shown promise, they face a critical challenge: balancing accuracy with computational efficiency. Shallow encoder architectures in UNets often struggle to capture crucial spatial features, leading in inaccurate and sparse segmentation. To addre… ▽ More Computer-aided segmentation methods can assist medical personnel in improving diagnostic outcomes. While recent advancements like UNet and its variants have shown promise, they face a critical challenge: balancing accuracy with computational efficiency. Shallow encoder architectures in UNets often struggle to capture crucial spatial features, leading in inaccurate and sparse segmentation. To address this limitation, we propose a novel \underline{P}rogressive \underline{A}ttention based \underline{M}obile \underline{UNet} (\underline{PAM-UNet}) architecture. The inverted residual (IR) blocks in PAM-UNet help maintain a lightweight framework, while layerwise \textit{Progressive Luong Attention} ($\mathcal{PLA}$) promotes precise segmentation by directing attention toward regions of interest during synthesis. Our approach prioritizes both accuracy and speed, achieving a commendable balance with a mean IoU of 74.65 and a dice score of 82.87, while requiring only 1.32 floating-point operations per second (FLOPS) on the Liver Tumor Segmentation Benchmark (LiTS) 2017 dataset. These results highlight the importance of develo** efficient segmentation models to accelerate the adoption of AI in clinical practice. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted at 2024 IEEE EMBC

arXiv:2403.15828 [pdf, other]

TJCCT: A Two-timescale Approach for UAV-assisted Mobile Edge Computing

Authors: Zemin Sun, Geng Sun, Qingqing Wu, Long He, Shuang Liang, Hongyang Pan, Dusit Niyato, Chau Yuen, Victor C. M. Leung

Abstract: Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) is emerging as a promising paradigm to provide aerial-terrestrial computing services in close proximity to mobile devices (MDs). However, meeting the demands of computation-intensive and delay-sensitive tasks for MDs poses several challenges, including the demand-supply contradiction between MDs and MEC servers, the demand-supply h… ▽ More Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) is emerging as a promising paradigm to provide aerial-terrestrial computing services in close proximity to mobile devices (MDs). However, meeting the demands of computation-intensive and delay-sensitive tasks for MDs poses several challenges, including the demand-supply contradiction between MDs and MEC servers, the demand-supply heterogeneity between MDs and MEC servers, the trajectory control requirements on energy efficiency and timeliness, and the different time-scale dynamics of the network. To address these issues, we first present a hierarchical architecture by incorporating terrestrial-aerial computing capabilities and leveraging UAV flexibility. Furthermore, we formulate a joint computing resource allocation, computation offloading, and trajectory control problem to maximize the system utility. Since the problem is a non-convex and NP-hard mixed integer nonlinear programming (MINLP), we propose a two-timescale joint computing resource allocation, computation offloading, and trajectory control (TJCCT) approach for solving the problem. In the short timescale, we propose a price-incentive model for on-demand computing resource allocation and a matching mechanism-based method for computation offloading. In the long timescale, we propose a convex optimization-based method for UAV trajectory control. Besides, we theoretically prove the stability, optimality, and polynomial complexity of TJCCT. Extended simulation results demonstrate that the proposed TJCCT outperforms the comparative algorithms in terms of the system utility, average processing rate, average completion delay, and average completion ratio. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.12985 [pdf, other]

Multi-objective Optimization for Data Collection in UAV-assisted Agricultural IoT

Authors: Lingling Liu, Aimin Wang, Geng Sun, Jiahui Li, Hongyang Pan, Tony Q. S. Quek

Abstract: The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, the… ▽ More The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, therefore in this work we consider employing a UAV as an aerial BS to acquire data of agricultural Internet of Things (IoT) devices. To this end, we first formulate a UAV-assisted data collection multi-objective optimization problem (UDCMOP) to efficiently collect the data from agricultural sensing devices. Specifically, we aim to collaboratively optimize the hovering positions of UAV, visit sequence of UAV, speed of UAV, in addition to the transmit power of devices, to simultaneously achieve the maximization of minimum transmit rate of devices, the minimization of total energy consumption of devices, and the minimization of total energy consumption of UAV. Second, the proposed UDCMOP is a non-convex mixed integer nonlinear optimization problem, which indicates that it includes continuous and discrete solutions, making it intractable to be solved. Therefore, we solve it by proposing an improved multi-objective artificial hummingbird algorithm (IMOAHA) with several specific improvement factors, that are the hybrid initialization operator, Cauchy mutation foraging operator, in addition to the discrete mutation operator. Finally, simulations are carried out to testify that the proposed IMOAHA can effectively improve the system performance comparing to other benchmarks. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 13 pages, 7 figures, 4 tables

arXiv:2403.06532 [pdf, other]

Reconstructing Visual Stimulus Images from EEG Signals Based on Deep Visual Representation Model

Authors: Hongguang Pan, Zhuoyi Li, Yunpeng Fu, Xuebin Qin, Jianchen Hu

Abstract: Reconstructing visual stimulus images is a significant task in neural decoding, and up to now, most studies consider the functional magnetic resonance imaging (fMRI) as the signal source. However, the fMRI-based image reconstruction methods are difficult to widely applied because of the complexity and high cost of the acquisition equipments. Considering the advantages of low cost and easy portabil… ▽ More Reconstructing visual stimulus images is a significant task in neural decoding, and up to now, most studies consider the functional magnetic resonance imaging (fMRI) as the signal source. However, the fMRI-based image reconstruction methods are difficult to widely applied because of the complexity and high cost of the acquisition equipments. Considering the advantages of low cost and easy portability of the electroencephalogram (EEG) acquisition equipments, we propose a novel image reconstruction method based on EEG signals in this paper. Firstly, to satisfy the high recognizability of visual stimulus images in fast switching manner, we build a visual stimuli image dataset, and obtain the EEG dataset by a corresponding EEG signals collection experiment. Secondly, the deep visual representation model(DVRM) consisting of a primary encoder and a subordinate decoder is proposed to reconstruct visual stimuli. The encoder is designed based on the residual-in-residual dense blocks to learn the distribution characteristics between EEG signals and visual stimulus images, while the decoder is designed based on the deep neural network to reconstruct the visual stimulus image from the learned deep visual representation. The DVRM can fit the deep and multiview visual features of human natural state and make the reconstructed images more precise. Finally, we evaluate the DVRM in the quality of the generated images on our EEG dataset. The results show that the DVRM have good performance in the task of learning deep visual representation from EEG signals and generating reconstructed images that are realistic and highly resemble the original images. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.05024 [pdf, other]

A Probabilistic Hadamard U-Net for MRI Bias Field Correction

Authors: Xin Zhu, Hongyi Pan, Yury Velichko, Adam B. Murphy, Ashley Ross, Baris Turkbey, Ahmet Enis Cetin, Ulas Bagci

Abstract: Magnetic field inhomogeneity correction remains a challenging task in MRI analysis. Most established techniques are designed for brain MRI by supposing that image intensities in the identical tissue follow a uniform distribution. Such an assumption cannot be easily applied to other organs, especially those that are small in size and heterogeneous in texture (large variations in intensity), such as… ▽ More Magnetic field inhomogeneity correction remains a challenging task in MRI analysis. Most established techniques are designed for brain MRI by supposing that image intensities in the identical tissue follow a uniform distribution. Such an assumption cannot be easily applied to other organs, especially those that are small in size and heterogeneous in texture (large variations in intensity), such as the prostate. To address this problem, this paper proposes a probabilistic Hadamard U-Net (PHU-Net) for prostate MRI bias field correction. First, a novel Hadamard U-Net (HU-Net) is introduced to extract the low-frequency scalar field, multiplied by the original input to obtain the prototypical corrected image. HU-Net converts the input image from the time domain into the frequency domain via Hadamard transform. In the frequency domain, high-frequency components are eliminated using the trainable filter (scaling layer), hard-thresholding layer, and sparsity penalty. Next, a conditional variational autoencoder is used to encode possible bias field-corrected variants into a low-dimensional latent space. Random samples drawn from latent space are then incorporated with a prototypical corrected image to generate multiple plausible images. Experimental results demonstrate the effectiveness of PHU-Net in correcting bias-field in prostate MRI with a fast inference speed. It has also been shown that prostate MRI segmentation accuracy improves with the high-quality corrected images from PHU-Net. The code will be available in the final version of this manuscript. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2312.05832 [pdf, other]

Spatial-wise Dynamic Distillation for MLP-like Efficient Visual Fault Detection of Freight Trains

Authors: Yang Zhang, Huilin Pan, Mingying Li, An Wang, Yang Zhou, Hongliang Ren

Abstract: Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting i… ▽ More Despite the successful application of convolutional neural networks (CNNs) in object detection tasks, their efficiency in detecting faults from freight train images remains inadequate for implementation in real-world engineering scenarios. Existing modeling shortcomings of spatial invariance and pooling layers in conventional CNNs often ignore the neglect of crucial global information, resulting in error localization for fault objection tasks of freight trains. To solve these problems, we design a spatial-wise dynamic distillation framework based on multi-layer perceptron (MLP) for visual fault detection of freight trains. We initially present the axial shift strategy, which allows the MLP-like architecture to overcome the challenge of spatial invariance and effectively incorporate both local and global cues. We propose a dynamic distillation method without a pre-training teacher, including a dynamic teacher mechanism that can effectively eliminate the semantic discrepancy with the student model. Such an approach mines more abundant details from lower-level feature appearances and higher-level label semantics as the extra supervision signal, which utilizes efficient instance embedding to model the global spatial and semantic information. In addition, the proposed dynamic teacher can jointly train with students to further enhance the distillation efficiency. Extensive experiments executed on six typical fault datasets reveal that our approach outperforms the current state-of-the-art detectors and achieves the highest accuracy with real-time detection at a lower computational cost. The source code will be available at \url{https://github.com/MVME-HBUT/SDD-FTI-FDet}. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 10 pages, 6 figures

arXiv:2311.16116 [pdf, ps, other]

Resource Scheduling for UAVs-aided D2D Networks: A Multi-objective Optimization Approach

Authors: Hongyang Pan, Yanheng Liu, Geng Sun, Pengfei Wang, Chau Yuen

Abstract: Unmanned aerial vehicles (UAVs)-aided device-todevice (D2D) networks have attracted great interests with the development of 5G/6G communications, while there are several challenges about resource scheduling in UAVs-aided D2D networks. In this work, we formulate a UAVs-aided D2D network resource scheduling optimization problem (NetResSOP) to comprehensively consider the number of deployed UAVs, UAV… ▽ More Unmanned aerial vehicles (UAVs)-aided device-todevice (D2D) networks have attracted great interests with the development of 5G/6G communications, while there are several challenges about resource scheduling in UAVs-aided D2D networks. In this work, we formulate a UAVs-aided D2D network resource scheduling optimization problem (NetResSOP) to comprehensively consider the number of deployed UAVs, UAV positions, UAV transmission powers, UAV flight velocities, communication channels, and UAV-device pair assignment so as to maximize the D2D network capacity, minimize the number of deployed UAVs, and minimize the average energy consumption over all UAVs simultaneously. The formulated NetResSOP is a mixed-integer programming problem (MIPP) and an NP-hard problem, which means that it is difficult to be solved in polynomial time. Moreover, there are trade-offs between the optimization objectives, and hence it is also difficult to find an optimal solution that can simultaneously make all objectives be optimal. Thus, we propose a non-dominated sorting genetic algorithm-III with a Flexible solution dimension mechanism, a Discrete part generation mechanism, and a UAV number adjustment mechanism (NSGA-III-FDU) for solving the problem comprehensively. Simulation results demonstrate the effectiveness and the stability of the proposed NSGA-III-FDU under different scales and settings of the D2D networks. △ Less

Submitted 30 September, 2023; originally announced November 2023.

arXiv:2310.02862 [pdf, other]

A novel asymmetrical autoencoder with a sparsifying discrete cosine Stockwell transform layer for gearbox sensor data compression

Authors: Xin Zhu, Daoguang Yang, Hongyi Pan, Hamid Reza Karimi, Didem Ozevin, Ahmet Enis Cetin

Abstract: The lack of an efficient compression model remains a challenge for the wireless transmission of gearbox data in non-contact gear fault diagnosis problems. In this paper, we present a signal-adaptive asymmetrical autoencoder with a transform domain layer to compress sensor signals. First, a new discrete cosine Stockwell transform (DCST) layer is introduced to replace linear layers in a multi-layer… ▽ More The lack of an efficient compression model remains a challenge for the wireless transmission of gearbox data in non-contact gear fault diagnosis problems. In this paper, we present a signal-adaptive asymmetrical autoencoder with a transform domain layer to compress sensor signals. First, a new discrete cosine Stockwell transform (DCST) layer is introduced to replace linear layers in a multi-layer autoencoder. A trainable filter is implemented in the DCST domain by utilizing the multiplication property of the convolution. A trainable hard-thresholding layer is applied to reduce redundant data in the DCST layer to make the feature map sparse. In comparison to the linear layer, the DCST layer reduces the number of trainable parameters and improves the accuracy of data reconstruction. Second, training the autoencoder with a sparsifying DCST layer only requires a small number of datasets. The proposed method is superior to other autoencoder-based methods on the University of Connecticut (UoC) and Southeast University (SEU) gearbox datasets, as the average quality score is improved by 2.00% at the lowest and 32.35% at the highest with a limited number of training samples △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2310.00396 [pdf, other]

Joint Scheduling and Trajectory Optimization of Charging UAV in Wireless Rechargeable Sensor Networks

Authors: Yanheng Liu, Hongyang Pan, Geng Sun, Aimin Wang, Jiahui Li, Shuang Liang

Abstract: Wireless rechargeable sensor networks with a charging unmanned aerial vehicle (CUAV) have the broad application prospects in the power supply of the rechargeable sensor nodes (SNs). However, how to schedule a CUAV and design the trajectory to improve the charging efficiency of the entire system is still a vital problem. In this paper, we formulate a joint-CUAV scheduling and trajectory optimizatio… ▽ More Wireless rechargeable sensor networks with a charging unmanned aerial vehicle (CUAV) have the broad application prospects in the power supply of the rechargeable sensor nodes (SNs). However, how to schedule a CUAV and design the trajectory to improve the charging efficiency of the entire system is still a vital problem. In this paper, we formulate a joint-CUAV scheduling and trajectory optimization problem (JSTOP) to simultaneously minimize the hovering points of CUAV, the number of the repeatedly covered SNs and the flying distance of CUAV for charging all SNs. Due to the complexity of JSTOP, it is decomposed into two optimization subproblems that are CUAV scheduling optimization problem (CSOP) and CUAV trajectory optimization problem (CTOP). CSOP is a hybrid optimization problem that consists of the continuous and discrete solution space, and the solution dimension in CSOP is not fixed since it should be changed with the number of hovering points of CUAV. Moreover, CTOP is a completely discrete optimization problem. Thus, we propose a particle swarm optimization (PSO) with a flexible dimension mechanism, a K-means operator and a punishment-compensation mechanism (PSOFKP) and a PSO with a discretization factor, a 2-opt operator and a path crossover reduction mechanism (PSOD2P) to solve the converted CSOP and CTOP, respectively. Simulation results evaluate the benefits of PSOFKP and PSOD2P under different scales and settings of the network, and the stability of the proposed algorithms is verified. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2310.00384 [pdf, ps, other]

Joint Power and 3D Trajectory Optimization for UAV-enabled Wireless Powered Communication Networks with Obstacles

Authors: Hongyang Pan, Yanheng Liu, Geng Sun, Junsong Fan, Shuang Liang, Chau Yuen

Abstract: Unmanned aerial vehicle (UAV)-enabled wireless powered communication networks (WPCNs) are promising technologies in 5G/6G wireless communications, while there are several challenges about UAV power allocation and scheduling to enhance the energy utilization efficiency, considering the existence of obstacles. In this work, we consider a UAV-enabled WPCN scenario that a UAV needs to cover the ground… ▽ More Unmanned aerial vehicle (UAV)-enabled wireless powered communication networks (WPCNs) are promising technologies in 5G/6G wireless communications, while there are several challenges about UAV power allocation and scheduling to enhance the energy utilization efficiency, considering the existence of obstacles. In this work, we consider a UAV-enabled WPCN scenario that a UAV needs to cover the ground wireless devices (WDs). During the coverage process, the UAV needs to collect data from the WDs and charge them simultaneously. To this end, we formulate a joint-UAV power and three-dimensional (3D) trajectory optimization problem (JUPTTOP) to simultaneously increase the total number of the covered WDs, increase the time efficiency, and reduce the total flying distance of UAV so as to improve the energy utilization efficiency in the network. Due to the difficulties and complexities, we decompose it into two sub optimization problems, which are the UAV power allocation optimization problem (UPAOP) and UAV 3D trajectory optimization problem (UTTOP), respectively. Then, we propose an improved non-dominated sorting genetic algorithm-II with K-means initialization operator and Variable dimension mechanism (NSGA-II-KV) for solving the UPAOP. For UTTOP, we first introduce a pretreatment method, and then use an improved particle swarm optimization with Normal distribution initialization, Genetic mechanism, Differential mechanism and Pursuit operator (PSO-NGDP) to deal with this sub optimization problem. Simulation results verify the effectiveness of the proposed strategies under different scales and settings of the networks. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.12201 [pdf, other]

Electroencephalogram Sensor Data Compression Using An Asymmetrical Sparse Autoencoder With A Discrete Cosine Transform Layer

Authors: Xin Zhu, Hongyi Pan, Shuaiang Rong, Ahmet Enis Cetin

Abstract: Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduc… ▽ More Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduce redundant data using hard-thresholding nonlinearity. Furthermore, the DCT layer includes trainable hard-thresholding parameters and scaling layers to give emphasis or de-emphasis on individual DCT coefficients. Finally, the one-by-one convolutional layer generates the latent space. The sparsity penalty-based cost function is employed to keep the feature map as sparse as possible in the latent space. The latent space data is transmitted to the receiver. The decoder module of the autoencoder is designed using the inverse DCT and two fully connected linear layers to improve the accuracy of data reconstruction. In comparison to other state-of-the-art methods, the proposed method significantly improves the average quality score in various data compression experiments. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.09866 [pdf, other]

Domain Generalization with Fourier Transform and Soft Thresholding

Authors: Hongyi Pan, Bin Wang, Zheyuan Zhang, Xin Zhu, Debesh Jha, Ahmet Enis Cetin, Concetto Spampinato, Ulas Bagci

Abstract: Domain generalization aims to train models on multiple source domains so that they can generalize well to unseen target domains. Among many domain generalization methods, Fourier-transform-based domain generalization methods have gained popularity primarily because they exploit the power of Fourier transformation to capture essential patterns and regularities in the data, making the model more rob… ▽ More Domain generalization aims to train models on multiple source domains so that they can generalize well to unseen target domains. Among many domain generalization methods, Fourier-transform-based domain generalization methods have gained popularity primarily because they exploit the power of Fourier transformation to capture essential patterns and regularities in the data, making the model more robust to domain shifts. The mainstream Fourier-transform-based domain generalization swaps the Fourier amplitude spectrum while preserving the phase spectrum between the source and the target images. However, it neglects background interference in the amplitude spectrum. To overcome this limitation, we introduce a soft-thresholding function in the Fourier domain. We apply this newly designed algorithm to retinal fundus image segmentation, which is important for diagnosing ocular diseases but the neural network's performance can degrade across different sources due to domain shifts. The proposed technique basically enhances fundus image augmentation by eliminating small values in the Fourier domain and providing better generalization. The innovative nature of the soft thresholding fused with Fourier-transform-based domain generalization improves neural network models' performance by reducing the target images' background interference significantly. Experiments on public data validate our approach's effectiveness over conventional and state-of-the-art methods with superior segmentation metrics. △ Less

Submitted 12 December, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: This paper was accepted to ICASSP 2024

arXiv:2309.08782 [pdf, other]

Stein Variational Gradient Descent-based Detection For Random Access With Preambles In MTC

Authors: Xin Zhu, Hongyi Pan, Salih Atici, Ahmet Enis Cetin

Abstract: Traditional preamble detection algorithms have low accuracy in the grant-based random access scheme in massive machine-type communication (mMTC). We present a novel preamble detection algorithm based on Stein variational gradient descent (SVGD) at the second step of the random access procedure. It efficiently leverages deterministic updates of particles for continuous inference. To further enhance… ▽ More Traditional preamble detection algorithms have low accuracy in the grant-based random access scheme in massive machine-type communication (mMTC). We present a novel preamble detection algorithm based on Stein variational gradient descent (SVGD) at the second step of the random access procedure. It efficiently leverages deterministic updates of particles for continuous inference. To further enhance the performance of the SVGD detector, especially in a dense user scenario, we propose a normalized SVGD detector with momentum. It utilizes the momentum and a bias correction term to reduce the preamble estimation errors during the gradient descent process. Simulation results show that the proposed algorithm performs better than Markov Chain Monte Carlo-based approaches in terms of detection accuracy. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2307.02779 [pdf, other]

Large Language Models Empowered Autonomous Edge AI for Connected Intelligence

Authors: Yifei Shen, Jiawei Shao, Xinjie Zhang, Zehong Lin, Hao Pan, Dongsheng Li, Jun Zhang, Khaled B. Letaief

Abstract: The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network… ▽ More The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network edge. This article presents a vision of autonomous edge AI systems that automatically organize, adapt, and optimize themselves to meet users' diverse requirements, leveraging the power of large language models (LLMs), i.e., Generative Pretrained Transformer (GPT). By exploiting the powerful abilities of GPT in language understanding, planning, and code generation, as well as incorporating classic wisdom such as task-oriented communication and edge federated learning, we present a versatile framework that efficiently coordinates edge AI models to cater to users' personal demands while automatically generating code to train new models in a privacy-preserving manner. Experimental results demonstrate the system's remarkable ability to accurately comprehend user demands, efficiently execute AI models with minimal cost, and effectively create high-performance AI models at edge servers. △ Less

Submitted 25 December, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: IEEE Communication Magazine

arXiv:2307.00701 [pdf, other]

Efficient Visual Fault Detection for Freight Train Braking System via Heterogeneous Self Distillation in the Wild

Authors: Yang Zhang, Huilin Pan, Yang Zhou, Mingying Li, Guodong Sun

Abstract: Efficient visual fault detection of freight trains is a critical part of ensuring the safe operation of railways under the restricted hardware environment. Although deep learning-based approaches have excelled in object detection, the efficiency of freight train fault detection is still insufficient to apply in real-world engineering. This paper proposes a heterogeneous self-distillation framework… ▽ More Efficient visual fault detection of freight trains is a critical part of ensuring the safe operation of railways under the restricted hardware environment. Although deep learning-based approaches have excelled in object detection, the efficiency of freight train fault detection is still insufficient to apply in real-world engineering. This paper proposes a heterogeneous self-distillation framework to ensure detection accuracy and speed while satisfying low resource requirements. The privileged information in the output feature knowledge can be transferred from the teacher to the student model through distillation to boost performance. We first adopt a lightweight backbone to extract features and generate a new heterogeneous knowledge neck. Such neck models positional information and long-range dependencies among channels through parallel encoding to optimize feature extraction capabilities. Then, we utilize the general distribution to obtain more credible and accurate bounding box estimates. Finally, we employ a novel loss function that makes the network easily concentrate on values near the label to improve learning efficiency. Experiments on four fault datasets reveal that our framework can achieve over 37 frames per second and maintain the highest accuracy in comparison with traditional distillation approaches. Moreover, compared to state-of-the-art methods, our framework demonstrates more competitive performance with lower memory usage and the smallest model size. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 12 pages, 9 figures

arXiv:2306.09650 [pdf, other]

Reconfigurable Intelligent Surface Assisted Semantic Communication Systems

Authors: Jiajia Shi, Tse-Tin Chan, Haoyuan Pan, Tat-Ming Lok

Abstract: Semantic communication, which focuses on conveying the meaning of information rather than exact bit reconstruction, has gained considerable attention in recent years. Meanwhile, reconfigurable intelligent surface (RIS) is a promising technology that can achieve high spectral and energy efficiency by dynamically reflecting incident signals through programmable passive components. In this paper, we… ▽ More Semantic communication, which focuses on conveying the meaning of information rather than exact bit reconstruction, has gained considerable attention in recent years. Meanwhile, reconfigurable intelligent surface (RIS) is a promising technology that can achieve high spectral and energy efficiency by dynamically reflecting incident signals through programmable passive components. In this paper, we put forth a semantic communication scheme aided by RIS. Using text transmission as an example, experimental results demonstrate that the RIS-assisted semantic communication system outperforms the point-to-point semantic communication system in terms of bilingual evaluation understudy (BLEU) scores in Rayleigh fading channels, especially at low signal-to-noise ratio (SNR) regimes. In addition, the RIS-assisted semantic communication system exhibits superior robustness against channel estimation errors compared to its point-to-point counterpart. RIS can improve performance as it provides extra line-of-sight (LoS) paths and enhances signal propagation conditions compared to point-to-point systems. △ Less

Submitted 29 June, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2305.17510 [pdf, other]

A Hybrid Quantum-Classical Approach based on the Hadamard Transform for the Convolutional Layer

Authors: Hongyi Pan, Xin Zhu, Salih Atici, Ahmet Enis Cetin

Abstract: In this paper, we propose a novel Hadamard Transform (HT)-based neural network layer for hybrid quantum-classical computing. It implements the regular convolutional layers in the Hadamard transform domain. The idea is based on the HT convolution theorem which states that the dyadic convolution between two vectors is equivalent to the element-wise multiplication of their HT representation. Computin… ▽ More In this paper, we propose a novel Hadamard Transform (HT)-based neural network layer for hybrid quantum-classical computing. It implements the regular convolutional layers in the Hadamard transform domain. The idea is based on the HT convolution theorem which states that the dyadic convolution between two vectors is equivalent to the element-wise multiplication of their HT representation. Computing the HT is simply the application of a Hadamard gate to each qubit individually, so the HT computations of our proposed layer can be implemented on a quantum computer. Compared to the regular Conv2D layer, the proposed HT-perceptron layer is computationally more efficient. Compared to a CNN with the same number of trainable parameters and 99.26\% test accuracy, our HT network reaches 99.31\% test accuracy with 57.1\% MACs reduced in the MNIST dataset; and in our ImageNet-1K experiments, our HT-based ResNet-50 exceeds the accuracy of the baseline ResNet-50 by 0.59\% center-crop top-1 accuracy using 11.5\% fewer parameters with 12.6\% fewer MACs. △ Less

Submitted 22 February, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: To be presented at International Conference on Machine Learning (ICML), 2023

arXiv:2305.11651 [pdf, other]

Channel Cycle Time: A New Measure of Short-term Fairness

Authors: Pengfei Shen, Yulin Shao, Haoyuan Pan, Lu Lu, Yonina C. Eldar

Abstract: This paper puts forth a new metric, dubbed channel cycle time (CCT), to measure the short-term fairness of communication networks. CCT characterizes the average duration between two consecutive successful transmissions of a user, during which all other users successfully accessed the channel at least once. In contrast to existing short-term fairness measures, CCT provides more comprehensive insigh… ▽ More This paper puts forth a new metric, dubbed channel cycle time (CCT), to measure the short-term fairness of communication networks. CCT characterizes the average duration between two consecutive successful transmissions of a user, during which all other users successfully accessed the channel at least once. In contrast to existing short-term fairness measures, CCT provides more comprehensive insight into the transient dynamics of communication networks, with a particular focus on users' delays and jitter. To validate the efficacy of our approach, we analytically characterize the CCTs for two classical communication protocols: slotted Aloha and CSMA/CA. The analysis demonstrates that CSMA/CA exhibits superior short-term fairness over slotted Aloha. Beyond its role as a measurement metric, CCT has broader implications as a guiding principle for the design of future communication networks by emphasizing factors like fairness, delay, and jitter in short-term behaviors. △ Less

Submitted 14 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

arXiv:2304.09373 [pdf, other]

Multi-scale Adaptive Fusion Network for Hyperspectral Image Denoising

Authors: Haodong Pan, Feng Gao, Junyu Dong, Qian Du

Abstract: Removing the noise and improving the visual quality of hyperspectral images (HSIs) is challenging in academia and industry. Great efforts have been made to leverage local, global or spectral context information for HSI denoising. However, existing methods still have limitations in feature interaction exploitation among multiple scales and rich spectral structure preservation. In view of this, we p… ▽ More Removing the noise and improving the visual quality of hyperspectral images (HSIs) is challenging in academia and industry. Great efforts have been made to leverage local, global or spectral context information for HSI denoising. However, existing methods still have limitations in feature interaction exploitation among multiple scales and rich spectral structure preservation. In view of this, we propose a novel solution to investigate the HSI denoising using a Multi-scale Adaptive Fusion Network (MAFNet), which can learn the complex nonlinear map** between clean and noisy HSI. Two key components contribute to improving the hyperspectral image denoising: A progressively multiscale information aggregation network and a co-attention fusion module. Specifically, we first generate a set of multiscale images and feed them into a coarse-fusion network to exploit the contextual texture correlation. Thereafter, a fine fusion network is followed to exchange the information across the parallel multiscale subnetworks. Furthermore, we design a co-attention fusion module to adaptively emphasize informative features from different scales, and thereby enhance the discriminative learning capability for denoising. Extensive experiments on synthetic and real HSI datasets demonstrate that the proposed MAFNet has achieved better denoising performance than other state-of-the-art techniques. Our codes are available at \verb'https://github.com/summitgao/MAFNet'. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: IEEE JSTASRS 2023, code at: https://github.com/summitgao/MAFNet

arXiv:2304.05119 [pdf, other]

Device Activity Detection in mMTC with Low-Resolution ADC: A New Protocol

Authors: Zhaorui Wang, Ya-Feng Liu, Ziyue Wang, Liang Liu, Haoyuan Pan, Shuguang Cui

Abstract: This paper investigates the effect of low-resolution analog-to-digital converters (ADCs) on device activity detection in massive machine-type communications (mMTC). The low-resolution ADCs induce two challenges on the device activity detection compared with the traditional setup with the assumption of infinite ADC resolution. First, the codebook design for signal quantization by the low-resolution… ▽ More This paper investigates the effect of low-resolution analog-to-digital converters (ADCs) on device activity detection in massive machine-type communications (mMTC). The low-resolution ADCs induce two challenges on the device activity detection compared with the traditional setup with the assumption of infinite ADC resolution. First, the codebook design for signal quantization by the low-resolution ADC is particularly important since a good design of the codebook can lead to small quantization error on the received signal, which in turn has significant influence on the activity detector performance. To this end, prior information about the received signal power is needed, which depends on the number of active devices $K$. This is sharply different from the activity detection problem in traditional setups, in which the knowledge of $K$ is not required by the BS as a prerequisite. Second, the covariance-based approach achieves good activity detection performance in traditional setups while it is not clear if it can still achieve good performance in this paper. To solve the above challenges, we propose a communication protocol that consists of an estimator for $K$ and a detector for active device identities: 1) For the estimator, the technical difficulty is that the design of the ADC quantizer and the estimation of $K$ are closely intertwined and doing one needs the information/execution from the other. We propose a progressive estimator which iteratively performs the estimation of $K$ and the design of the ADC quantizer; 2) For the activity detector, we propose a custom-designed stochastic gradient descent algorithm to estimate the active device identities. Numerical results demonstrate the effectiveness of the communication protocol. △ Less

Submitted 13 April, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: Submitted to IEEE for possible publication

arXiv:2303.06797 [pdf, other]

doi 10.1109/TNNLS.2024.3384316

Multichannel Orthogonal Transform-Based Perceptron Layers for Efficient ResNets

Authors: Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Salih Atici, Ahmet Enis Cetin

Abstract: In this paper, we propose a set of transform-based neural network layers as an alternative to the $3\times3$ Conv2D layers in Convolutional Neural Networks (CNNs). The proposed layers can be implemented based on orthogonal transforms such as the Discrete Cosine Transform (DCT), Hadamard transform (HT), and biorthogonal Block Wavelet Transform (BWT). Furthermore, by taking advantage of the convolut… ▽ More In this paper, we propose a set of transform-based neural network layers as an alternative to the $3\times3$ Conv2D layers in Convolutional Neural Networks (CNNs). The proposed layers can be implemented based on orthogonal transforms such as the Discrete Cosine Transform (DCT), Hadamard transform (HT), and biorthogonal Block Wavelet Transform (BWT). Furthermore, by taking advantage of the convolution theorems, convolutional filtering operations are performed in the transform domain using element-wise multiplications. Trainable soft-thresholding layers, that remove noise in the transform domain, bring nonlinearity to the transform domain layers. Compared to the Conv2D layer, which is spatial-agnostic and channel-specific, the proposed layers are location-specific and channel-specific. Moreover, these proposed layers reduce the number of parameters and multiplications significantly while improving the accuracy results of regular ResNets on the ImageNet-1K classification task. Furthermore, they can be inserted with a batch normalization layer before the global average pooling layer in the conventional ResNets as an additional layer to improve classification accuracy. △ Less

Submitted 22 April, 2024; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: This work is accepted to IEEE Transactions on Neural Networks and Learning Systems. The initial title is "Orthogonal Transform Domain Approaches for the Convolutional Layer". We changed it to "Multichannel Orthogonal Transform-Based Perceptron Layers for Efficient ResNets" based on reviewer's comment. arXiv admin note: text overlap with arXiv:2211.08577

arXiv:2212.09921 [pdf, other]

Input Normalized Stochastic Gradient Descent Training of Deep Neural Networks

Authors: Salih Atici, Hongyi Pan, Ahmet Enis Cetin

Abstract: In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our al… ▽ More In this paper, we propose a novel optimization algorithm for training machine learning models called Input Normalized Stochastic Gradient Descent (INSGD), inspired by the Normalized Least Mean Squares (NLMS) algorithm used in adaptive filtering. When training complex models on large datasets, the choice of optimizer parameters, particularly the learning rate, is crucial to avoid divergence. Our algorithm updates the network weights using stochastic gradient descent with $\ell_1$ and $\ell_2$-based normalizations applied to the learning rate, similar to NLMS. However, unlike existing normalization methods, we exclude the error term from the normalization process and instead normalize the update term using the input vector to the neuron. Our experiments demonstrate that our optimization algorithm achieves higher accuracy levels compared to different initialization settings. We evaluate the efficiency of our training algorithm on benchmark datasets using ResNet-18, WResNet-20, ResNet-50, and a toy neural network. Our INSGD algorithm improves the accuracy of ResNet-18 on CIFAR-10 from 92.42\% to 92.71\%, WResNet-20 on CIFAR-100 from 76.20\% to 77.39\%, and ResNet-50 on ImageNet-1K from 75.52\% to 75.67\%. △ Less

Submitted 26 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

arXiv:2212.00595 [pdf, other]

Ghost-free High Dynamic Range Imaging via Hybrid CNN-Transformer and Structure Tensor

Authors: Yu Yuan, Jiaqi Wu, Zhongliang **g, Henry Leung, Han Pan

Abstract: Eliminating ghosting artifacts due to moving objects is a challenging problem in high dynamic range (HDR) imaging. In this letter, we present a hybrid model consisting of a convolutional encoder and a Transformer decoder to generate ghost-free HDR images. In the encoder, a context aggregation network and non-local attention block are adopted to optimize multi-scale features and capture both global… ▽ More Eliminating ghosting artifacts due to moving objects is a challenging problem in high dynamic range (HDR) imaging. In this letter, we present a hybrid model consisting of a convolutional encoder and a Transformer decoder to generate ghost-free HDR images. In the encoder, a context aggregation network and non-local attention block are adopted to optimize multi-scale features and capture both global and local dependencies of multiple low dynamic range (LDR) images. The decoder based on Swin Transformer is utilized to improve the reconstruction capability of the proposed model. Motivated by the phenomenal difference between the presence and absence of artifacts under the field of structure tensor (ST), we integrate the ST information of LDR images as auxiliary inputs of the network and use ST loss to further constrain artifacts. Different from previous approaches, our network is capable of processing an arbitrary number of input LDR images. Qualitative and quantitative experiments demonstrate the effectiveness of the proposed method by comparing it with existing state-of-the-art HDR deghosting models. Codes are available at https://github.com/pandayuanyu/HSTHdr. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.14522 [pdf, other]

Visual Fault Detection of Multi-scale Key Components in Freight Trains

Authors: Yang Zhang, Yang Zhou, Huilin Pan, Bo Wu, Guodong Sun

Abstract: Fault detection for key components in the braking system of freight trains is critical for ensuring railway transportation safety. Despite the frequently employed methods based on deep learning, these fault detectors are highly reliant on hardware resources and are complex to implement. In addition, no train fault detectors consider the drop in accuracy induced by scale variation of fault parts. T… ▽ More Fault detection for key components in the braking system of freight trains is critical for ensuring railway transportation safety. Despite the frequently employed methods based on deep learning, these fault detectors are highly reliant on hardware resources and are complex to implement. In addition, no train fault detectors consider the drop in accuracy induced by scale variation of fault parts. This paper proposes a lightweight anchor-free framework to solve the above problems. Specifically, to reduce the amount of computation and model size, we introduce a lightweight backbone and adopt an anchor-free method for localization and regression. To improve detection accuracy for multi-scale parts, we design a feature pyramid network to generate rectangular layers of different sizes to map parts with similar aspect ratios. Experiments on four fault datasets show that our framework achieves 98.44% accuracy while the model size is only 22.5 MB, outperforming state-of-the-art detectors. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Comments: 9 pages, 4 figures

arXiv:2211.09206 [pdf, other]

Learning to Kindle the Starlight

Authors: Yu Yuan, Jiaqi Wu, Lindong Wang, Zhongliang **g, Henry Leung, Shuyuan Zhu, Han Pan

Abstract: Capturing highly appreciated star field images is extremely challenging due to light pollution, the requirements of specialized hardware, and the high level of photographic skills needed. Deep learning-based techniques have achieved remarkable results in low-light image enhancement (LLIE) but have not been widely applied to star field image enhancement due to the lack of training data. To address… ▽ More Capturing highly appreciated star field images is extremely challenging due to light pollution, the requirements of specialized hardware, and the high level of photographic skills needed. Deep learning-based techniques have achieved remarkable results in low-light image enhancement (LLIE) but have not been widely applied to star field image enhancement due to the lack of training data. To address this problem, we construct the first Star Field Image Enhancement Benchmark (SFIEB) that contains 355 real-shot and 854 semi-synthetic star field images, all having the corresponding reference images. Using the presented dataset, we propose the first star field image enhancement approach, namely StarDiffusion, based on conditional denoising diffusion probabilistic models (DDPM). We introduce dynamic stochastic corruptions to the inputs of conditional DDPM to improve the performance and generalization of the network on our small-scale dataset. Experiments show promising results of our method, which outperforms state-of-the-art low-light image enhancement algorithms. The dataset and codes will be open-sourced. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2211.08577 [pdf, other]

DCT Perceptron Layer: A Transform Domain Approach for Convolution Layer

Authors: Hongyi Pan, Xin Zhu, Salih Atici, Ahmet Enis Cetin

Abstract: In this paper, we propose a novel Discrete Cosine Transform (DCT)-based neural network layer which we call DCT-perceptron to replace the $3\times3$ Conv2D layers in the Residual neural Network (ResNet). Convolutional filtering operations are performed in the DCT domain using element-wise multiplications by taking advantage of the Fourier and DCT Convolution theorems. A trainable soft-thresholding… ▽ More In this paper, we propose a novel Discrete Cosine Transform (DCT)-based neural network layer which we call DCT-perceptron to replace the $3\times3$ Conv2D layers in the Residual neural Network (ResNet). Convolutional filtering operations are performed in the DCT domain using element-wise multiplications by taking advantage of the Fourier and DCT Convolution theorems. A trainable soft-thresholding layer is used as the nonlinearity in the DCT perceptron. Compared to ResNet's Conv2D layer which is spatial-agnostic and channel-specific, the proposed layer is location-specific and channel-specific. The DCT-perceptron layer reduces the number of parameters and multiplications significantly while maintaining comparable accuracy results of regular ResNets in CIFAR-10 and ImageNet-1K. Moreover, the DCT-perceptron layer can be inserted with a batch normalization layer before the global average pooling layer in the conventional ResNets as an additional layer to improve classification accuracy. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2211.08505 [pdf, other]

Classification of the Cervical Vertebrae Maturation (CVM) stages Using the Tripod Network

Authors: Salih Atici, Hongyi Pan, Mohammed H. Elnagar, Veerasathpurush Allareddy, Omar Suhaym, Rashid Ansari, Ahmet Enis Cetin

Abstract: We present a novel deep learning method for fully automated detection and classification of the Cervical Vertebrae Maturation (CVM) stages. The deep convolutional neural network consists of three parallel networks (TriPodNet) independently trained with different initialization parameters. They also have a built-in set of novel directional filters that highlight the Cervical Verte edges in X-ray im… ▽ More We present a novel deep learning method for fully automated detection and classification of the Cervical Vertebrae Maturation (CVM) stages. The deep convolutional neural network consists of three parallel networks (TriPodNet) independently trained with different initialization parameters. They also have a built-in set of novel directional filters that highlight the Cervical Verte edges in X-ray images. Outputs of the three parallel networks are combined using a fully connected layer. 1018 cephalometric radiographs were labeled, divided by gender, and classified according to the CVM stages. Resulting images, using different training techniques and patches, were used to train TripodNet together with a set of tunable directional edge enhancers. Data augmentation is implemented to avoid overfitting. TripodNet achieves the state-of-the-art accuracy of 81.18\% in female patients and 75.32\% in male patients. The proposed TripodNet achieves a higher accuracy in our dataset than the Swin Transformers and the previous network models that we investigated for CVM stage estimation. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2211.08491 [pdf, other]

Real-time Wireless ECG-derived Respiration Rate Estimation Using an Autoencoder with a DCT Layer

Authors: Hongyi Pan, Xin Zhu, Zhilu Ye, Pai-Yen Chen, Ahmet Enis Cetin

Abstract: In this paper, we present a wireless ECG-derived Respiration Rate (RR) estimation using an autoencoder with a DCT Layer. The wireless wearable system records the ECG data of the subject and the respiration rate is determined from the variations in the baseline level of the ECG data. A straightforward Fourier analysis of the ECG data obtained using the wireless wearable system may lead to incorrect… ▽ More In this paper, we present a wireless ECG-derived Respiration Rate (RR) estimation using an autoencoder with a DCT Layer. The wireless wearable system records the ECG data of the subject and the respiration rate is determined from the variations in the baseline level of the ECG data. A straightforward Fourier analysis of the ECG data obtained using the wireless wearable system may lead to incorrect results due to uneven breathing. To improve the estimation precision, we propose a neural network that uses a novel Discrete Cosine Transform (DCT) layer to denoise and decorrelates the data. The DCT layer has trainable weights and soft-thresholds in the transform domain. In our dataset, we improve the Mean Squared Error (MSE) and Mean Absolute Error (MAE) of the Fourier analysis-based approach using our novel neural network with the DCT layer. △ Less

Submitted 16 February, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: This paper was accepted to ICASSP 2023

arXiv:2205.12458 [pdf, other]

doi 10.1109/TIM.2022.3176901

A Lightweight NMS-free Framework for Real-time Visual Fault Detection System of Freight Trains

Authors: Guodong Sun, Yang Zhou, Huilin Pan, Bo Wu, Ye Hu, Yang Zhang

Abstract: Real-time vision-based system of fault detection (RVBS-FD) for freight trains is an essential part of ensuring railway transportation safety. Most existing vision-based methods still have high computational costs based on convolutional neural networks. The computational cost is mainly reflected in the backbone, neck, and post-processing, i.e., non-maximum suppression (NMS). In this paper, we propo… ▽ More Real-time vision-based system of fault detection (RVBS-FD) for freight trains is an essential part of ensuring railway transportation safety. Most existing vision-based methods still have high computational costs based on convolutional neural networks. The computational cost is mainly reflected in the backbone, neck, and post-processing, i.e., non-maximum suppression (NMS). In this paper, we propose a lightweight NMS-free framework to achieve real-time detection and high accuracy simultaneously. First, we use a lightweight backbone for feature extraction and design a fault detection pyramid to process features. This fault detection pyramid includes three novel individual modules using attention mechanism, bottleneck, and dilated convolution for feature enhancement and computation reduction. Instead of using NMS, we calculate different loss functions, including classification and location costs in the detection head, to further reduce computation. Experimental results show that our framework achieves over 83 frames per second speed with a smaller model size and higher accuracy than the state-of-the-art detectors. Meanwhile, the hardware resource requirements of our method are low during the training and testing process. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: 11 pages, 5 figures, accepted by IEEE Transactions on Instrumentation and Measurement

arXiv:2203.16954 [pdf, other]

An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer

Authors: Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng

Abstract: Text normalization, defined as a procedure transforming non standard words to spoken-form words, is crucial to the intelligibility of synthesized speech in text-to-speech system. Rule-based methods without considering context can not eliminate ambiguation, whereas sequence-to-sequence neural network based methods suffer from the unexpected and uninterpretable errors problem. Recently proposed hybr… ▽ More Text normalization, defined as a procedure transforming non standard words to spoken-form words, is crucial to the intelligibility of synthesized speech in text-to-speech system. Rule-based methods without considering context can not eliminate ambiguation, whereas sequence-to-sequence neural network based methods suffer from the unexpected and uninterpretable errors problem. Recently proposed hybrid system treats rule-based model and neural model as two cascaded sub-modules, where limited interaction capability makes neural network model cannot fully utilize expert knowledge contained in the rules. Inspired by Flat-LAttice Transformer (FLAT), we propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input and integrates expert knowledge contained in rules into the neural network, both contribute to the superior performance of proposed model for the text normalization task. We also release a first publicly accessible largescale dataset for Chinese text normalization. Our proposed model has achieved excellent results on this dataset. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: Accepted by ICASSP 2022

arXiv:2203.02100 [pdf, other]

Learning Incrementally to Segment Multiple Organs in a CT Image

Authors: Pengbo Liu, Xia Wang, Mengsi Fan, Hongli Pan, Minmin Yin, Xiaohong Zhu, Dandan Du, Xiaoying Zhao, Li Xiao, Lian Ding, Xingwang Wu, S. Kevin Zhou

Abstract: There exists a large number of datasets for organ segmentation, which are partially annotated and sequentially constructed. A typical dataset is constructed at a certain time by curating medical images and annotating the organs of interest. In other words, new datasets with annotations of new organ categories are built over time. To unleash the potential behind these partially labeled, sequentiall… ▽ More There exists a large number of datasets for organ segmentation, which are partially annotated and sequentially constructed. A typical dataset is constructed at a certain time by curating medical images and annotating the organs of interest. In other words, new datasets with annotations of new organ categories are built over time. To unleash the potential behind these partially labeled, sequentially-constructed datasets, we propose to incrementally learn a multi-organ segmentation model. In each incremental learning (IL) stage, we lose the access to previous data and annotations, whose knowledge is assumingly captured by the current model, and gain the access to a new dataset with annotations of new organ categories, from which we learn to update the organ segmentation model to include the new organs. While IL is notorious for its `catastrophic forgetting' weakness in the context of natural image analysis, we experimentally discover that such a weakness mostly disappears for CT multi-organ segmentation. To further stabilize the model performance across the IL stages, we introduce a light memory module and some loss functions to restrain the representation of different categories in feature space, aggregating feature representation of the same class and separating feature representation of different classes. Extensive experiments on five open-sourced datasets are conducted to illustrate the effectiveness of our method. △ Less

Submitted 3 March, 2022; originally announced March 2022.

Comments: arXiv admin note: text overlap with arXiv:2103.04526

arXiv:2201.02711 [pdf, other]

Block Walsh-Hadamard Transform Based Binary Layers in Deep Neural Networks

Authors: Hongyi Pan, Diaa Badawi, Ahmet Enis Cetin

Abstract: Convolution has been the core operation of modern deep neural networks. It is well-known that convolutions can be implemented in the Fourier Transform domain. In this paper, we propose to use binary block Walsh-Hadamard transform (WHT) instead of the Fourier transform. We use WHT-based binary layers to replace some of the regular convolution layers in deep neural networks. We utilize both one-dime… ▽ More Convolution has been the core operation of modern deep neural networks. It is well-known that convolutions can be implemented in the Fourier Transform domain. In this paper, we propose to use binary block Walsh-Hadamard transform (WHT) instead of the Fourier transform. We use WHT-based binary layers to replace some of the regular convolution layers in deep neural networks. We utilize both one-dimensional (1-D) and two-dimensional (2-D) binary WHTs in this paper. In both 1-D and 2-D layers, we compute the binary WHT of the input feature map and denoise the WHT domain coefficients using a nonlinearity which is obtained by combining soft-thresholding with the tanh function. After denoising, we compute the inverse WHT. We use 1D-WHT to replace the $1\times 1$ convolutional layers, and 2D-WHT layers can replace the 3$\times$3 convolution layers and Squeeze-and-Excite layers. 2D-WHT layers with trainable weights can be also inserted before the Global Average Pooling (GAP) layers to assist the dense layers. In this way, we can reduce the number of trainable parameters significantly with a slight decrease in trainable parameters. In this paper, we implement the WHT layers into MobileNet-V2, MobileNet-V3-Large, and ResNet to reduce the number of parameters significantly with negligible accuracy loss. Moreover, according to our speed test, the 2D-FWHT layer runs about 24 times as fast as the regular $3\times 3$ convolution with 19.51\% less RAM usage in an NVIDIA Jetson Nano experiment. △ Less

Submitted 27 January, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: This paper has been accepted by ACM Transactions on Embedded Computing Systems

arXiv:2201.02709 [pdf, other]

Detecting Anomaly in Chemical Sensors via L1-Kernels based Principal Component Analysis

Authors: Hongyi Pan, Diaa Badawi, Ishaan Bassi, Sule Ozev, Ahmet Enis Cetin

Abstract: We propose a kernel-PCA based method to detect anomaly in chemical sensors. We use temporal signals produced by chemical sensors to form vectors to perform the Principal Component Analysis (PCA). We estimate the kernel-covariance matrix of the sensor data and compute the eigenvector corresponding to the largest eigenvalue of the covariance matrix. The anomaly can be detected by comparing the diffe… ▽ More We propose a kernel-PCA based method to detect anomaly in chemical sensors. We use temporal signals produced by chemical sensors to form vectors to perform the Principal Component Analysis (PCA). We estimate the kernel-covariance matrix of the sensor data and compute the eigenvector corresponding to the largest eigenvalue of the covariance matrix. The anomaly can be detected by comparing the difference between the actual sensor data and the reconstructed data from the dominant eigenvector. In this paper, we introduce a new multiplication-free kernel, which is related to the l1-norm for the anomaly detection task. The l1-kernel PCA is not only computationally efficient but also energy-efficient because it does not require any actual multiplications during the kernel covariance matrix computation. Our experimental results show that our kernel-PCA method achieves a higher area under curvature (AUC) score (0.7483) than the baseline regular PCA method (0.7366). △ Less

Submitted 28 September, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: This paper has been accepted to IEEE Sensors Letters

arXiv:2111.02977 [pdf, other]

Safe, efficient and socially-compatible decision of automated vehicles: a case study of unsignalized intersection driving

Authors: Daofei Li, Ao Liu, Hao Pan, Wentao Chen

Abstract: Safe and smooth interacting with other vehicles is one of the ultimate goals of driving automation. However, recent reports of demonstrative deployments of automated vehicles (AVs) indicate that AVs are still difficult to meet the expectation of other interacting drivers, which leads to several AV accidents involving human-driven vehicles (HVs). This is most likely due to the lack of understanding… ▽ More Safe and smooth interacting with other vehicles is one of the ultimate goals of driving automation. However, recent reports of demonstrative deployments of automated vehicles (AVs) indicate that AVs are still difficult to meet the expectation of other interacting drivers, which leads to several AV accidents involving human-driven vehicles (HVs). This is most likely due to the lack of understanding about the dynamic interaction process, especially about the human drivers. By investigating the causes of 4,300 video clips of traffic accidents, we find that the limited dynamic visual field of drivers is one leading factor in inter-vehicle interaction accidents, especially in those involving trucks. A game-theoretic decision algorithm considering social compatibility is proposed to handle the interaction with a human-driven truck at an unsignalized intersection. Starting from a probabilistic model for the visual field characteristics of truck drivers, social fitness and reciprocal altruism in the decision are incorporated in the game payoff design. Human-in-the-loop experiments are carried out, in which 24 subjects are invited to drive and interact with AVs deployed with the proposed algorithm and two comparison algorithms. Totally 207 cases of intersection interactions are obtained and analyzed, which shows that the proposed decision-making algorithm can not only improve both safety and time efficiency, but also make AV decisions more in line with the expectation of interacting human drivers. These findings can help inform the design of automated driving decision algorithms, to ensure that AVs can be safely and efficiently integrated into the human-dominated traffic. △ Less

Submitted 10 May, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

Comments: 23 pages,15 figures

arXiv:2110.12065 [pdf, other]

Multiplication-Avoiding Variant of Power Iteration with Applications

Authors: Hongyi Pan, Diaa Badawi, Runxuan Miao, Erdem Koyuncu, Ahmet Enis Cetin

Abstract: Power iteration is a fundamental algorithm in data analysis. It extracts the eigenvector corresponding to the largest eigenvalue of a given matrix. Applications include ranking algorithms, recommendation systems, principal component analysis (PCA), among many others. In this paper, we introduce multiplication-avoiding power iteration (MAPI), which replaces the standard $\ell_2$-inner products that… ▽ More Power iteration is a fundamental algorithm in data analysis. It extracts the eigenvector corresponding to the largest eigenvalue of a given matrix. Applications include ranking algorithms, recommendation systems, principal component analysis (PCA), among many others. In this paper, we introduce multiplication-avoiding power iteration (MAPI), which replaces the standard $\ell_2$-inner products that appear at the regular power iteration (RPI) with multiplication-free vector products which are Mercer-type kernel operations related with the $\ell_1$ norm. Precisely, for an $n\times n$ matrix, MAPI requires $n$ multiplications, while RPI needs $n^2$ multiplications per iteration. Therefore, MAPI provides a significant reduction of the number of multiplication operations, which are known to be costly in terms of energy consumption. We provide applications of MAPI to PCA-based image reconstruction as well as to graph-based ranking algorithms. When compared to RPI, MAPI not only typically converges much faster, but also provides superior performance. △ Less

Submitted 31 January, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: This is the technique report for the paper "MULTIPLICATION-AVOIDING VARIANT OF POWER ITERATION WITH APPLICATIONS", which has been accepted by ICASSP 2022

arXiv:2109.02920 [pdf, other]

FDA: Feature Decomposition and Aggregation for Robust Airway Segmentation

Authors: Minghui Zhang, Xin Yu, Hanxiao Zhang, Hao Zheng, Weihao Yu, Hong Pan, Xiangran Cai, Yun Gu

Abstract: 3D Convolutional Neural Networks (CNNs) have been widely adopted for airway segmentation. The performance of 3D CNNs is greatly influenced by the dataset while the public airway datasets are mainly clean CT scans with coarse annotation, thus difficult to be generalized to noisy CT scans (e.g. COVID-19 CT scans). In this work, we proposed a new dual-stream network to address the variability between… ▽ More 3D Convolutional Neural Networks (CNNs) have been widely adopted for airway segmentation. The performance of 3D CNNs is greatly influenced by the dataset while the public airway datasets are mainly clean CT scans with coarse annotation, thus difficult to be generalized to noisy CT scans (e.g. COVID-19 CT scans). In this work, we proposed a new dual-stream network to address the variability between the clean domain and noisy domain, which utilizes the clean CT scans and a small amount of labeled noisy CT scans for airway segmentation. We designed two different encoders to extract the transferable clean features and the unique noisy features separately, followed by two independent decoders. Further on, the transferable features are refined by the channel-wise feature recalibration and Signed Distance Map (SDM) regression. The feature recalibration module emphasizes critical features and the SDM pays more attention to the bronchi, which is beneficial to extracting the transferable topological features robust to the coarse labels. Extensive experimental results demonstrated the obvious improvement brought by our proposed method. Compared to other state-of-the-art transfer learning methods, our method accurately segmented more bronchi in the noisy CT scans. △ Less

Submitted 7 September, 2021; originally announced September 2021.

Comments: Accepted at MICCAI2021-DART

arXiv:2105.11634 [pdf, other]

Robust Principal Component Analysis Using a Novel Kernel Related with the L1-Norm

Authors: Hongyi Pan, Diaa Badawi, Erdem Koyuncu, A. Enis Cetin

Abstract: We consider a family of vector dot products that can be implemented using sign changes and addition operations only. The dot products are energy-efficient as they avoid the multiplication operation entirely. Moreover, the dot products induce the $\ell_1$-norm, thus providing robustness to impulsive noise. First, we analytically prove that the dot products yield symmetric, positive semi-definite ge… ▽ More We consider a family of vector dot products that can be implemented using sign changes and addition operations only. The dot products are energy-efficient as they avoid the multiplication operation entirely. Moreover, the dot products induce the $\ell_1$-norm, thus providing robustness to impulsive noise. First, we analytically prove that the dot products yield symmetric, positive semi-definite generalized covariance matrices, thus enabling principal component analysis (PCA). Moreover, the generalized covariance matrices can be constructed in an Energy Efficient (EEF) manner due to the multiplication-free property of the underlying vector products. We present image reconstruction examples in which our EEF PCA method result in the highest peak signal-to-noise ratios compared to the ordinary $\ell_2$-PCA and the recursive $\ell_1$-PCA. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: 6 pages, 3 tables and one figure

arXiv:2104.07085 [pdf, other]

Fast Walsh-Hadamard Transform and Smooth-Thresholding Based Binary Layers in Deep Neural Networks

Authors: Hongyi Pan, Diaa Dabawi, Ahmet Enis Cetin

Abstract: In this paper, we propose a novel layer based on fast Walsh-Hadamard transform (WHT) and smooth-thresholding to replace $1\times 1$ convolution layers in deep neural networks. In the WHT domain, we denoise the transform domain coefficients using the new smooth-thresholding non-linearity, a smoothed version of the well-known soft-thresholding operator. We also introduce a family of multiplication-f… ▽ More In this paper, we propose a novel layer based on fast Walsh-Hadamard transform (WHT) and smooth-thresholding to replace $1\times 1$ convolution layers in deep neural networks. In the WHT domain, we denoise the transform domain coefficients using the new smooth-thresholding non-linearity, a smoothed version of the well-known soft-thresholding operator. We also introduce a family of multiplication-free operators from the basic 2$\times$2 Hadamard transform to implement $3\times 3$ depthwise separable convolution layers. Using these two types of layers, we replace the bottleneck layers in MobileNet-V2 to reduce the network's number of parameters with a slight loss in accuracy. For example, by replacing the final third bottleneck layers, we reduce the number of parameters from 2.270M to 540K. This reduces the accuracy from 95.21\% to 92.98\% on the CIFAR-10 dataset. Our approach significantly improves the speed of data processing. The fast Walsh-Hadamard transform has a computational complexity of $O(m\log_2 m)$. As a result, it is computationally more efficient than the $1\times1$ convolution layer. The fast Walsh-Hadamard layer processes a tensor in $\mathbb{R}^{10\times32\times32\times1024}$ about 2 times faster than $1\times1$ convolution layer on NVIDIA Jetson Nano computer board. △ Less

Submitted 29 October, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

Comments: The paper (v1) has been accepted to CVPR 2021 BiVision Workshop. We notice the final Conv2D is also a 1x1 convolution layer so we update the result with changing the layer in v2. In v3, we update citation 37 because its authorship changes. In v4, we propose the improved version of smooth thresholding called "weighted smooth thresholding"

arXiv:2010.00893 [pdf, other]

Weight Encode Reconstruction Network for Computed Tomography in a Semi-Case-Wise and Learning-Based Way

Authors: Hujie Pan, Xuesong Li, Min Xu

Abstract: Classic algebraic reconstruction technology (ART) for computed tomography requires pre-determined weights of the voxels for projecting pixel values. However, such weight cannot be accurately obtained due to the limitation of the physical understanding and computation resources. In this study, we propose a semi-case-wise learning-based method named Weight Encode Reconstruction Network (WERNet) to t… ▽ More Classic algebraic reconstruction technology (ART) for computed tomography requires pre-determined weights of the voxels for projecting pixel values. However, such weight cannot be accurately obtained due to the limitation of the physical understanding and computation resources. In this study, we propose a semi-case-wise learning-based method named Weight Encode Reconstruction Network (WERNet) to tackle the issues mentioned above. The model is trained in a self-supervised manner without the label of a voxel set. It contains two branches, including the voxel weight encoder and the voxel attention part. Using gradient normalization, we are able to co-train the encoder and voxel set numerically stably. With WERNet, the reconstructed result was obtained with a cosine similarity greater than 0.999 with the ground truth. Moreover, the model shows the extraordinary capability of denoising comparing to the classic ART method. In the generalization test of the model, the encoder is transferable from a voxel set with complex structure to the unseen cases without the deduction of the accuracy. △ Less

Submitted 2 October, 2020; originally announced October 2020.

arXiv:2009.13015 [pdf]

Cloud Removal for Remote Sensing Imagery via Spatial Attention Generative Adversarial Network

Authors: Heng Pan

Abstract: Optical remote sensing imagery has been widely used in many fields due to its high resolution and stable geometric properties. However, remote sensing imagery is inevitably affected by climate, especially clouds. Removing the cloud in the high-resolution remote sensing satellite image is an indispensable pre-processing step before analyzing it. For the sake of large-scale training data, neural net… ▽ More Optical remote sensing imagery has been widely used in many fields due to its high resolution and stable geometric properties. However, remote sensing imagery is inevitably affected by climate, especially clouds. Removing the cloud in the high-resolution remote sensing satellite image is an indispensable pre-processing step before analyzing it. For the sake of large-scale training data, neural networks have been successful in many image processing tasks, but the use of neural networks to remove cloud in remote sensing imagery is still relatively small. We adopt generative adversarial network to solve this task and introduce the spatial attention mechanism into the remote sensing imagery cloud removal task, proposes a model named spatial attention generative adversarial network (SpA GAN), which imitates the human visual mechanism, and recognizes and focuses the cloud area with local-to-global spatial attention, thereby enhancing the information recovery of these areas and generating cloudless images with better quality... △ Less

Submitted 14 November, 2020; v1 submitted 27 September, 2020; originally announced September 2020.

arXiv:2007.12136 [pdf, other]

doi 10.1109/LWC.2020.3045457

Is Multichannel Access Useful in Timely Information Update?

Authors: Jiaxin Liang, Haoyuan Pan, Soung Chang Liew

Abstract: This paper investigates information freshness of multichannel access in information update systems. Age of information (AoI) is a fundamentally important metric to characterize information freshness, defined as the time elapsed since the generation of the last successfully received update. When multiple devices share the same wireless channel to send updates to a common receiver, an interesting qu… ▽ More This paper investigates information freshness of multichannel access in information update systems. Age of information (AoI) is a fundamentally important metric to characterize information freshness, defined as the time elapsed since the generation of the last successfully received update. When multiple devices share the same wireless channel to send updates to a common receiver, an interesting question is whether dividing the whole channel into several subchannels will lead to better AoI performance. Given the same frequency band, dividing it into different numbers of subchannels lead to different transmission times and packet error rates (PER) of short update packets, thus affecting information freshness. We focus on a multichannel access system where different devices take turns to transmit with a cyclic schedule repeated over time. We first derive the average AoI by estimating the PERs of short packets. Then we examine bounded AoI, for which the instantaneous AoI is required to be below a threshold a large percentage of the time. Simulation results indicate that multichannel access can provide low average AoI and uniform bounded AoI simultaneously across different received powers. Overall, our investigations provide insights into practical designs of multichannel access systems with AoI requirements. △ Less

Submitted 23 July, 2020; originally announced July 2020.

Comments: 13 pages, 6 figures, submitted to Wireless Communication Letter

arXiv:2007.01001 [pdf, other]

PGD-UNet: A Position-Guided Deformable Network for Simultaneous Segmentation of Organs and Tumors

Authors: Ziqiang Li, Hong Pan, Ya** Zhu, A. K. Qin

Abstract: Precise segmentation of organs and tumors plays a crucial role in clinical applications. It is a challenging task due to the irregular shapes and various sizes of organs and tumors as well as the significant class imbalance between the anatomy of interest (AOI) and the background region. In addition, in most situation tumors and normal organs often overlap in medical images, but current approaches… ▽ More Precise segmentation of organs and tumors plays a crucial role in clinical applications. It is a challenging task due to the irregular shapes and various sizes of organs and tumors as well as the significant class imbalance between the anatomy of interest (AOI) and the background region. In addition, in most situation tumors and normal organs often overlap in medical images, but current approaches fail to delineate both tumors and organs accurately. To tackle such challenges, we propose a position-guided deformable UNet, namely PGD-UNet, which exploits the spatial deformation capabilities of deformable convolution to deal with the geometric transformation of both organs and tumors. Position information is explicitly encoded into the network to enhance the capabilities of deformation. Meanwhile, we introduce a new pooling module to preserve position information lost in conventional max-pooling operation. Besides, due to unclear boundaries between different structures as well as the subjectivity of annotations, labels are not necessarily accurate for medical image segmentation tasks. It may cause the overfitting of the trained network due to label noise. To address this issue, we formulate a novel loss function to suppress the influence of potential label noise on the training process. Our method was evaluated on two challenging segmentation tasks and achieved very promising segmentation accuracy in both tasks. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: Accepted by the 2020 International Joint Conference on Neural Networks (IJCNN 2020)

arXiv:1911.02241 [pdf, other]

Information Update: TDMA or FDMA?

Authors: Haoyuan Pan, Soung Chang Liew

Abstract: This paper studies information freshness in information update systems operated with TDMA and FDMA. Information freshness is characterized by a recently introduced metric, age of information (AoI), defined as the time elapsed since the generation of the last successfully received update. In an update system with multiple users sharing the same wireless channel to send updates to a common receiver,… ▽ More This paper studies information freshness in information update systems operated with TDMA and FDMA. Information freshness is characterized by a recently introduced metric, age of information (AoI), defined as the time elapsed since the generation of the last successfully received update. In an update system with multiple users sharing the same wireless channel to send updates to a common receiver, how to divide the channel among users affects information freshness. We investigate the AoI performances of two fundamental multiple access schemes, TDMA and FDMA. We first derive the time-averaged AoI by estimating the packet error rate of short update packets based on Gallager's random coding bound. For time-critical systems, we further define a new AoI metric, termed bounded AoI, which corresponds to an AoI threshold for the instantaneous AoI. Specifically, the instantaneous AoI is below the bounded AoI a large percentage of the time. We give a theoretical upper bound for bounded AoI. Our simulation results are consistent with our theoretical analysis. Although TDMA outperforms FDMA in terms of average AoI, FDMA is more robust against varying channel conditions since it gives a more stable bounded AoI across different received powers. Overall, our findings give insight to the design of practical multiple access systems with AoI requirements. △ Less

Submitted 6 November, 2019; originally announced November 2019.

arXiv:1810.08554 [pdf, other]

doi 10.1109/BHI.2016.7455966

HeartBEAT: Heart Beat Estimation through Adaptive Tracking

Authors: Huijie Pan, Dogancan Temel, Ghassan AlRegib

Abstract: In this paper, we propose an algorithm denoted as HeartBEAT that tracks heart rate from wrist-type photoplethysmography (PPG) signals and simultaneously recorded three-axis acceleration data. HeartBEAT contains three major parts: spectrum estimation of PPG signals and acceleration data, elimination of motion artifacts in PPG signals using recursive least Square (RLS) adaptive filters, and auxiliar… ▽ More In this paper, we propose an algorithm denoted as HeartBEAT that tracks heart rate from wrist-type photoplethysmography (PPG) signals and simultaneously recorded three-axis acceleration data. HeartBEAT contains three major parts: spectrum estimation of PPG signals and acceleration data, elimination of motion artifacts in PPG signals using recursive least Square (RLS) adaptive filters, and auxiliary heuristics. We tested HeartBEAT on the 22 datasets provided in the 2015 IEEE Signal Processing Cup. The first ten datasets were recorded from subjects performing forearm and upper-arm exercises, jum**, or pushing-up. The last twelve datasets were recorded from subjects running on tread mills. The experimental results were compared to the ground truth heart rate, which comes from simultaneously recorded electrocardiogram (ECG) signals. Compared to state-of-the-art algorithms, HeartBEAT not only produces comparable Pearson's correlation and mean absolute error, but also higher Spearman's rho and Kendall's tau. △ Less

Submitted 13 November, 2018; v1 submitted 19 October, 2018; originally announced October 2018.

Comments: 4 pages, 5 figures, 2 tables

Journal ref: H. Pan, D. Temel and G. AlRegib, "HeartBEAT: Heart beat estimation through adaptive tracking," 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), Las Vegas, NV, 2016, pp. 587-590

Showing 1–48 of 48 results for author: Pan, H