-
Cardiac Copilot: Automatic Probe Guidance for Echocardiography with World Model
Authors:
Haojun Jiang,
Zhenguo Sun,
Ning Jia,
Meng Li,
Yu Sun,
Shaqi Luo,
Shiji Song,
Gao Huang
Abstract:
Echocardiography is the only technique capable of real-time imaging of the heart and is vital for diagnosing the majority of cardiac diseases. However, there is a severe shortage of experienced cardiac sonographers, due to the heart's complex structure and significant operational challenges. To mitigate this situation, we present a Cardiac Copilot system capable of providing real-time probe moveme…
▽ More
Echocardiography is the only technique capable of real-time imaging of the heart and is vital for diagnosing the majority of cardiac diseases. However, there is a severe shortage of experienced cardiac sonographers, due to the heart's complex structure and significant operational challenges. To mitigate this situation, we present a Cardiac Copilot system capable of providing real-time probe movement guidance to assist less experienced sonographers in conducting freehand echocardiography. This system can enable non-experts, especially in primary departments and medically underserved areas, to perform cardiac ultrasound examinations, potentially improving global healthcare delivery. The core innovation lies in proposing a data-driven world model, named Cardiac Dreamer, for representing cardiac spatial structures. This world model can provide structure features of any cardiac planes around the current probe position in the latent space, serving as an precise navigation map for autonomous plane localization. We train our model with real-world ultrasound data and corresponding probe motion from 110 routine clinical scans with 151K sample pairs by three certified sonographers. Evaluations on three standard planes with 37K sample pairs demonstrate that the world model can reduce navigation errors by up to 33\% and exhibit more stable performance.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Precise Analysis of Covariance Identifiability for Activity Detection in Grant-Free Random Access
Authors:
Shengsong Luo,
Junjie Ma,
Chongbin Xu,
Xin Wang
Abstract:
We consider the identifiability issue of maximum likelihood based activity detection in massive MIMO based grant-free random access. A prior work by Chen et al. indicates that the identifiability undergoes a phase transition for commonly-used random signatures. In this paper, we provide an analytical characterization of the boundary of the phase transition curve. Our theoretical results agree well…
▽ More
We consider the identifiability issue of maximum likelihood based activity detection in massive MIMO based grant-free random access. A prior work by Chen et al. indicates that the identifiability undergoes a phase transition for commonly-used random signatures. In this paper, we provide an analytical characterization of the boundary of the phase transition curve. Our theoretical results agree well with the numerical experiments.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
VLST: Virtual Lung Screening Trial for Lung Cancer Detection Using Virtual Imaging Trial
Authors:
Fakrul Islam Tushar,
Liesbeth Vancoillie,
Cindy McCabe,
Amareswararao Kavuri,
Lavsen Dahal,
Brian Harrawood,
Milo Fryling,
Mojtaba Zarei,
Saman Sotoudeh-Paima,
Fong Chi Ho,
Dhrubajyoti Ghosh,
Sheng Luo,
W. Paul Segars,
Ehsan Abadi,
Kyle J. Lafata,
Ehsan Samei,
Joseph Y. Lo
Abstract:
Importance: The efficacy of lung cancer screening can be significantly impacted by the imaging modality used. This Virtual Lung Screening Trial (VLST) addresses the critical need for precision in lung cancer diagnostics and the potential for reducing unnecessary radiation exposure in clinical settings.
Objectives: To establish a virtual imaging trial (VIT) platform that accurately simulates real…
▽ More
Importance: The efficacy of lung cancer screening can be significantly impacted by the imaging modality used. This Virtual Lung Screening Trial (VLST) addresses the critical need for precision in lung cancer diagnostics and the potential for reducing unnecessary radiation exposure in clinical settings.
Objectives: To establish a virtual imaging trial (VIT) platform that accurately simulates real-world lung screening trials (LSTs) to assess the diagnostic accuracy of CT and CXR modalities.
Design, Setting, and Participants: Utilizing computational models and machine learning algorithms, we created a diverse virtual patient population. The cohort, designed to mirror real-world demographics, was assessed using virtual imaging techniques that reflect historical imaging technologies.
Main Outcomes and Measures: The primary outcome was the difference in the Area Under the Curve (AUC) for CT and CXR modalities across lesion types and sizes.
Results: The study analyzed 298 CT and 313 CXR simulated images from 313 virtual patients, with a lesion-level AUC of 0.81 (95% CI: 0.78-0.84) for CT and 0.55 (95% CI: 0.53-0.56) for CXR. At the patient level, CT demonstrated an AUC of 0.85 (95% CI: 0.80-0.89), compared to 0.53 (95% CI: 0.47-0.60) for CXR. Subgroup analyses indicated CT's superior performance in detecting homogeneous lesions (AUC of 0.97 for lesion-level) and heterogeneous lesions (AUC of 0.71 for lesion-level) as well as in identifying larger nodules (AUC of 0.98 for nodules > 8 mm).
Conclusion and Relevance: The VIT platform validated the superior diagnostic accuracy of CT over CXR, especially for smaller nodules, underscoring its potential to replicate real clinical imaging trials. These findings advocate for the integration of virtual trials in the evaluation and improvement of imaging-based diagnostic tools.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
SAM-dPCR: Real-Time and High-throughput Absolute Quantification of Biological Samples Using Zero-Shot Segment Anything Model
Authors:
Yuanyuan Wei,
Shanhang Luo,
Changran Xu,
Yingqi Fu,
Qingyue Dong,
Yi Zhang,
Fuyang Qu,
Guangyao Cheng,
Yi-** Ho,
Ho-Pui Ho,
Wu Yuan
Abstract:
Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM…
▽ More
Digital PCR (dPCR) has revolutionized nucleic acid diagnostics by enabling absolute quantification of rare mutations and target sequences. However, current detection methodologies face challenges, as flow cytometers are costly and complex, while fluorescence imaging methods, relying on software or manual counting, are time-consuming and prone to errors. To address these limitations, we present SAM-dPCR, a novel self-supervised learning-based pipeline that enables real-time and high-throughput absolute quantification of biological samples. Leveraging the zero-shot SAM model, SAM-dPCR efficiently analyzes diverse microreactors with over 97.7% accuracy within a rapid processing time of 3.16 seconds. By utilizing commonly available lab fluorescence microscopes, SAM-dPCR facilitates the quantification of sample concentrations. The accuracy of SAM-dPCR is validated by the strong linear relationship observed between known and inferred sample concentrations. Additionally, SAM-dPCR demonstrates versatility through comprehensive verification using various samples and reactor morphologies. This accessible, cost-effective tool transcends the limitations of traditional detection methods or fully supervised AI models, marking the first application of SAM in nucleic acid detection or molecular diagnostics. By eliminating the need for annotated training data, SAM-dPCR holds great application potential for nucleic acid quantification in resource-limited settings.
△ Less
Submitted 22 January, 2024;
originally announced March 2024.
-
RadCloud: Real-Time High-Resolution Point Cloud Generation Using Low-Cost Radars for Aerial and Ground Vehicles
Authors:
David Hunt,
Shaocheng Luo,
Amir Khazraei,
Xiao Zhang,
Spencer Hallyburton,
Tingjun Chen,
Miroslav Pajic
Abstract:
In this work, we present RadCloud, a novel real time framework for directly obtaining higher-resolution lidar-like 2D point clouds from low-resolution radar frames on resource-constrained platforms commonly used in unmanned aerial and ground vehicles (UAVs and UGVs, respectively); such point clouds can then be used for accurate environmental map**, navigating unknown environments, and other robo…
▽ More
In this work, we present RadCloud, a novel real time framework for directly obtaining higher-resolution lidar-like 2D point clouds from low-resolution radar frames on resource-constrained platforms commonly used in unmanned aerial and ground vehicles (UAVs and UGVs, respectively); such point clouds can then be used for accurate environmental map**, navigating unknown environments, and other robotics tasks. While high-resolution sensing using radar data has been previously reported, existing methods cannot be used on most UAVs, which have limited computational power and energy; thus, existing demonstrations focus on offline radar processing. RadCloud overcomes these challenges by using a radar configuration with 1/4th of the range resolution and employing a deep learning model with 2.25x fewer parameters. Additionally, RadCloud utilizes a novel chirp-based approach that makes obtained point clouds resilient to rapid movements (e.g., aggressive turns or spins), which commonly occur during UAV flights. In real-world experiments, we demonstrate the accuracy and applicability of RadCloud on commercially available UAVs and UGVs, with off-the-shelf radar platforms on-board.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
A Multi-Agent Security Testbed for the Analysis of Attacks and Defenses in Collaborative Sensor Fusion
Authors:
R. Spencer Hallyburton,
David Hunt,
Shaocheng Luo,
Miroslav Pajic
Abstract:
The performance and safety of autonomous vehicles (AVs) deteriorates under adverse environments and adversarial actors. The investment in multi-sensor, multi-agent (MSMA) AVs is meant to promote improved efficiency of travel and mitigate safety risks. Unfortunately, minimal investment has been made to develop security-aware MSMA sensor fusion pipelines leaving them vulnerable to adversaries. To ad…
▽ More
The performance and safety of autonomous vehicles (AVs) deteriorates under adverse environments and adversarial actors. The investment in multi-sensor, multi-agent (MSMA) AVs is meant to promote improved efficiency of travel and mitigate safety risks. Unfortunately, minimal investment has been made to develop security-aware MSMA sensor fusion pipelines leaving them vulnerable to adversaries. To advance security analysis of AVs, we develop the Multi-Agent Security Testbed, MAST, in the Robot Operating System (ROS2). Our framework is scalable for general AV scenarios and is integrated with recent multi-agent datasets. We construct the first bridge between AVstack and ROS and develop automated AV pipeline builds to enable rapid AV prototy**. We tackle the challenge of deploying variable numbers of agent/adversary nodes at launch-time with dynamic topic remap**. Using this testbed, we motivate the need for security-aware AV architectures by exposing the vulnerability of centralized multi-agent fusion pipelines to (un)coordinated adversary models in case studies and Monte Carlo analysis.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
BER Analysis of SCMA-OFDM Systems in the Presence of Carrier Frequency Offset
Authors:
Haibo Liu,
Qu Luo,
Zilong Liu,
Shan Luo,
Pei Xiao,
Rong** Lin
Abstract:
Sparse code multiple access (SCMA) building upon orthogonal frequency division multiplexing (OFDM) is a promising wireless technology for supporting massive connectivity in future machine-type communication networks. However, the sensitivity of OFDM to carrier frequency offset (CFO) poses a major challenge because it leads to orthogonality loss and incurs intercarrier interference (ICI). In this p…
▽ More
Sparse code multiple access (SCMA) building upon orthogonal frequency division multiplexing (OFDM) is a promising wireless technology for supporting massive connectivity in future machine-type communication networks. However, the sensitivity of OFDM to carrier frequency offset (CFO) poses a major challenge because it leads to orthogonality loss and incurs intercarrier interference (ICI). In this paper, we investigate the bit error rate (BER) performance of SCMA-OFDM systems in the presence of CFO over both Gaussian and multipath Rayleigh fading channels. We first model the ICI in SCMA-OFDM as Gaussian variables conditioned on a single channel realization for fading channels. The BER is then evaluated by averaging over all codeword pairs considering the fading statistics. Through simulations, we validate the accuracy of our BER analysis and reveal that there is a significant BER degradation for SCMA-OFDM systems when the normalized CFO exceeds 0.02.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Auto-ICell: An Accessible and Cost-Effective Integrative Droplet Microfluidic System for Real-Time Single-Cell Morphological and Apoptotic Analysis
Authors:
Yuanyuan Wei,
Meiai Lin,
Shanhang Luo,
Syed Muhammad Tariq Abbasi,
Liwei Tan,
Guangyao Cheng,
Bijie Bai,
Yi-** Ho,
Scott Wu Yuan,
Ho-Pui Ho
Abstract:
The Auto-ICell system, a novel, and cost-effective integrated droplet microfluidic system, is introduced for real-time analysis of single-cell morphology and apoptosis. This system integrates a 3D-printed microfluidic chip with image analysis algorithms, enabling the generation of uniform droplet reactors and immediate image analysis. The system employs a color-based image analysis algorithm in th…
▽ More
The Auto-ICell system, a novel, and cost-effective integrated droplet microfluidic system, is introduced for real-time analysis of single-cell morphology and apoptosis. This system integrates a 3D-printed microfluidic chip with image analysis algorithms, enabling the generation of uniform droplet reactors and immediate image analysis. The system employs a color-based image analysis algorithm in the bright field for droplet content analysis. Meanwhile, in the fluorescence field, cell apoptosis is quantitatively measured through a combination of deep-learning-enabled multiple fluorescent channel analysis and a live/dead cell stain kit. Breast cancer cells are encapsulated within uniform droplets, with diameters ranging from 70 μm to 240 μm, generated at a high throughput of 1,500 droplets per minute. Real-time image analysis results are displayed within 2 seconds on a custom graphical user interface (GUI). The system provides an automatic calculation of the distribution and ratio of encapsulated dyes in the bright field, and in the fluorescent field, cell blebbing and cell circularity are observed and quantified respectively. The Auto-ICell system is non-invasive and provides online detection, offering a robust, time-efficient, user-friendly, and cost-effective solution for single-cell analysis. It significantly enhances the detection throughput of droplet single-cell analysis by reducing setup costs and improving operational performance. This study highlights the potential of the Auto-ICell system in advancing biological research and personalized disease treatment, with promising applications in cell culture, biochemical microreactors, drug carriers, cell-based assays, synthetic biology, and point-of-care diagnostics.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading
Authors:
Songtao Luo,
Shuang Yang,
Shiguang Shan,
Xilin Chen
Abstract:
In this paper, we propose a novel method for speaker adaptation in lip reading, motivated by two observations. Firstly, a speaker's own characteristics can always be portrayed well by his/her few facial images or even a single image with shallow networks, while the fine-grained dynamic features associated with speech content expressed by the talking face always need deep sequential networks to rep…
▽ More
In this paper, we propose a novel method for speaker adaptation in lip reading, motivated by two observations. Firstly, a speaker's own characteristics can always be portrayed well by his/her few facial images or even a single image with shallow networks, while the fine-grained dynamic features associated with speech content expressed by the talking face always need deep sequential networks to represent accurately. Therefore, we treat the shallow and deep layers differently for speaker adaptive lip reading. Secondly, we observe that a speaker's unique characteristics ( e.g. prominent oral cavity and mandible) have varied effects on lip reading performance for different words and pronunciations, necessitating adaptive enhancement or suppression of the features for robust lip reading. Based on these two observations, we propose to take advantage of the speaker's own characteristics to automatically learn separable hidden unit contributions with different targets for shallow layers and deep layers respectively. For shallow layers where features related to the speaker's characteristics are stronger than the speech content related features, we introduce speaker-adaptive features to learn for enhancing the speech content features. For deep layers where both the speaker's features and the speech content features are all expressed well, we introduce the speaker-adaptive features to learn for suppressing the speech content irrelevant noise for robust lip reading. Our approach consistently outperforms existing methods, as confirmed by comprehensive analysis and comparison across different settings. Besides the evaluation on the popular LRW-ID and GRID datasets, we also release a new dataset for evaluation, CAS-VSR-S68h, to further assess the performance in an extreme setting where just a few speakers are available but the speech content covers a large and diversified range.
△ Less
Submitted 30 April, 2024; v1 submitted 8 October, 2023;
originally announced October 2023.
-
Jade: A Differentiable Physics Engine for Articulated Rigid Bodies with Intersection-Free Frictional Contact
Authors:
Gang Yang,
Siyuan Luo,
Lin Shao
Abstract:
We present Jade, a differentiable physics engine for articulated rigid bodies. Jade models contacts as the Linear Complementarity Problem (LCP). Compared to existing differentiable simulations, Jade offers features including intersection-free collision simulation and stable LCP solutions for multiple frictional contacts. We use continuous collision detection to detect the time of impact and adopt…
▽ More
We present Jade, a differentiable physics engine for articulated rigid bodies. Jade models contacts as the Linear Complementarity Problem (LCP). Compared to existing differentiable simulations, Jade offers features including intersection-free collision simulation and stable LCP solutions for multiple frictional contacts. We use continuous collision detection to detect the time of impact and adopt the backtracking strategy to prevent intersection between bodies with complex geometry shapes. We derive the gradient calculation to ensure the whole simulation process is differentiable under the backtracking mechanism. We modify the popular Dantzig algorithm to get valid solutions under multiple frictional contacts. We conduct extensive experiments to demonstrate the effectiveness of our differentiable physics simulation over a variety of contact-rich tasks.
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer With Adaptive Channel Expansion
Authors:
Shenghong Luo,
Xuhang Chen,
Weiwen Chen,
Zinuo Li,
Shuqiang Wang,
Chi-Man Pun
Abstract:
Vignetting commonly occurs as a degradation in images resulting from factors such as lens design, improper lens hood usage, and limitations in camera sensors. This degradation affects image details, color accuracy, and presents challenges in computational photography. Existing vignetting removal algorithms predominantly rely on ideal physics assumptions and hand-crafted parameters, resulting in th…
▽ More
Vignetting commonly occurs as a degradation in images resulting from factors such as lens design, improper lens hood usage, and limitations in camera sensors. This degradation affects image details, color accuracy, and presents challenges in computational photography. Existing vignetting removal algorithms predominantly rely on ideal physics assumptions and hand-crafted parameters, resulting in the ineffective removal of irregular vignetting and suboptimal results. Moreover, the substantial lack of real-world vignetting datasets hinders the objective and comprehensive evaluation of vignetting removal. To address these challenges, we present Vigset, a pioneering dataset for vignetting removal. Vigset includes 983 pairs of both vignetting and vignetting-free high-resolution ($5340\times3697$) real-world images under various conditions. In addition, We introduce DeVigNet, a novel frequency-aware Transformer architecture designed for vignetting removal. Through the Laplacian Pyramid decomposition, we propose the Dual Aggregated Fusion Transformer to handle global features and remove vignetting in the low-frequency domain. Additionally, we propose the Adaptive Channel Expansion Module to enhance details in the high-frequency domain. The experiments demonstrate that the proposed model outperforms existing state-of-the-art methods. The code, models, and dataset are available at \url{https://github.com/CXH-Research/DeVigNet}.
△ Less
Submitted 20 December, 2023; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Crucial Feature Capture and Discrimination for Limited Training Data SAR ATR
Authors:
Chenwei Wang,
Siyi Luo,
Jifang Pei,
Yulin Huang,
Yin Zhang,
Jianyu Yang
Abstract:
Although deep learning-based methods have achieved excellent performance on SAR ATR, the fact that it is difficult to acquire and label a lot of SAR images makes these methods, which originally performed well, perform weakly. This may be because most of them consider the whole target images as input, but the researches find that, under limited training data, the deep learning model can't capture d…
▽ More
Although deep learning-based methods have achieved excellent performance on SAR ATR, the fact that it is difficult to acquire and label a lot of SAR images makes these methods, which originally performed well, perform weakly. This may be because most of them consider the whole target images as input, but the researches find that, under limited training data, the deep learning model can't capture discriminative image regions in the whole images, rather focus on more useless even harmful image regions for recognition. Therefore, the results are not satisfactory. In this paper, we design a SAR ATR framework under limited training samples, which mainly consists of two branches and two modules, global assisted branch and local enhanced branch, feature capture module and feature discrimination module. In every training process, the global assisted branch first finishes the initial recognition based on the whole image. Based on the initial recognition results, the feature capture module automatically searches and locks the crucial image regions for correct recognition, which we named as the golden key of image. Then the local extract the local features from the captured crucial image regions. Finally, the overall features and local features are input into the classifier and dynamically weighted using the learnable voting parameters to collaboratively complete the final recognition under limited training samples. The model soundness experiments demonstrate the effectiveness of our method through the improvement of feature distribution and recognition probability. The experimental results and comparisons on MSTAR and OPENSAR show that our method has achieved superior recognition performance.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
An Entropy-Awareness Meta-Learning Method for SAR Open-Set ATR
Authors:
Chenwei Wang,
Siyi Luo,
Jifang Pei,
Xiaoyu Liu,
Yulin Huang,
Yin Zhang,
Jianyu Yang
Abstract:
Existing synthetic aperture radar automatic target recognition (SAR ATR) methods have been effective for the classification of seen target classes. However, it is more meaningful and challenging to distinguish the unseen target classes, i.e., open set recognition (OSR) problem, which is an urgent problem for the practical SAR ATR. The key solution of OSR is to effectively establish the exclusivene…
▽ More
Existing synthetic aperture radar automatic target recognition (SAR ATR) methods have been effective for the classification of seen target classes. However, it is more meaningful and challenging to distinguish the unseen target classes, i.e., open set recognition (OSR) problem, which is an urgent problem for the practical SAR ATR. The key solution of OSR is to effectively establish the exclusiveness of feature distribution of known classes. In this letter, we propose an entropy-awareness meta-learning method that improves the exclusiveness of feature distribution of known classes which means our method is effective for not only classifying the seen classes but also encountering the unseen other classes. Through meta-learning tasks, the proposed method learns to construct a feature space of the dynamic-assigned known classes. This feature space is required by the tasks to reject all other classes not belonging to the known classes. At the same time, the proposed entropy-awareness loss helps the model to enhance the feature space with effective and robust discrimination between the known and unknown classes. Therefore, our method can construct a dynamic feature space with discrimination between the known and unknown classes to simultaneously classify the dynamic-assigned known classes and reject the unknown classes. Experiments conducted on the moving and stationary target acquisition and recognition (MSTAR) dataset have shown the effectiveness of our method for SAR OSR.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
SAR Ship Target Recognition via Selective Feature Discrimination and Multifeature Center Classifier
Authors:
Chenwei Wang,
Siyi Luo,
Jifang Pei,
Yulin Huang,
Yin Zhang,
Jianyu Yang
Abstract:
Maritime surveillance is not only necessary for every country, such as in maritime safeguarding and fishing controls, but also plays an essential role in international fields, such as in rescue support and illegal immigration control. Most of the existing automatic target recognition (ATR) methods directly send the extracted whole features of SAR ships into one classifier. The classifiers of most…
▽ More
Maritime surveillance is not only necessary for every country, such as in maritime safeguarding and fishing controls, but also plays an essential role in international fields, such as in rescue support and illegal immigration control. Most of the existing automatic target recognition (ATR) methods directly send the extracted whole features of SAR ships into one classifier. The classifiers of most methods only assign one feature center to each class. However, the characteristics of SAR ship images, large inner-class variance, and small interclass difference lead to the whole features containing useless partial features and a single feature center for each class in the classifier failing with large inner-class variance. We proposes a SAR ship target recognition method via selective feature discrimination and multifeature center classifier. The selective feature discrimination automatically finds the similar partial features from the most similar interclass image pairs and the dissimilar partial features from the most dissimilar inner-class image pairs. It then provides a loss to enhance these partial features with more interclass separability. Motivated by divide and conquer, the multifeature center classifier assigns multiple learnable feature centers for each ship class. In this way, the multifeature centers divide the large inner-class variance into several smaller variances and conquered by combining all feature centers of one ship class. Finally, the probability distribution over all feature centers is considered comprehensively to achieve an accurate recognition of SAR ship images. The ablation experiments and experimental results on OpenSARShip and FUSAR-Ship datasets show that our method has achieved superior recognition performance under decreasing training SAR ship samples.
△ Less
Submitted 8 November, 2023; v1 submitted 20 August, 2023;
originally announced August 2023.
-
SAR Ship Target Recognition Via Multi-Scale Feature Attention and Adaptive-Weighed Classifier
Authors:
Chenwei Wang,
Jifang Pei,
Siyi Luo,
Weibo Huo,
Yulin Huang,
Yin Zhang,
Jianyu Yang
Abstract:
Maritime surveillance is indispensable for civilian fields, including national maritime safeguarding, channel monitoring, and so on, in which synthetic aperture radar (SAR) ship target recognition is a crucial research field. The core problem to realizing accurate SAR ship target recognition is the large inner-class variance and inter-class overlap of SAR ship features, which limits the recognitio…
▽ More
Maritime surveillance is indispensable for civilian fields, including national maritime safeguarding, channel monitoring, and so on, in which synthetic aperture radar (SAR) ship target recognition is a crucial research field. The core problem to realizing accurate SAR ship target recognition is the large inner-class variance and inter-class overlap of SAR ship features, which limits the recognition performance. Most existing methods plainly extract multi-scale features of the network and utilize equally each feature scale in the classification stage. However, the shallow multi-scale features are not discriminative enough, and each scale feature is not equally effective for recognition. These factors lead to the limitation of recognition performance. Therefore, we proposed a SAR ship recognition method via multi-scale feature attention and adaptive-weighted classifier to enhance features in each scale, and adaptively choose the effective feature scale for accurate recognition. We first construct an in-network feature pyramid to extract multi-scale features from SAR ship images. Then, the multi-scale feature attention can extract and enhance the principal components from the multi-scale features with more inner-class compactness and inter-class separability. Finally, the adaptive weighted classifier chooses the effective feature scales in the feature pyramid to achieve the final precise recognition. Through experiments and comparisons under OpenSARship data set, the proposed method is validated to achieve state-of-the-art performance for SAR ship recognition.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
SAR ATR Method with Limited Training Data via an Embedded Feature Augmenter and Dynamic Hierarchical-Feature Refiner
Authors:
Chenwei Wang,
Siyi Luo,
Yulin Huang,
Jifang Pei,
Yin Zhang,
Jianyu Yang
Abstract:
Without sufficient data, the quantity of information available for supervised training is constrained, as obtaining sufficient synthetic aperture radar (SAR) training data in practice is frequently challenging. Therefore, current SAR automatic target recognition (ATR) algorithms perform poorly with limited training data availability, resulting in a critical need to increase SAR ATR performance. In…
▽ More
Without sufficient data, the quantity of information available for supervised training is constrained, as obtaining sufficient synthetic aperture radar (SAR) training data in practice is frequently challenging. Therefore, current SAR automatic target recognition (ATR) algorithms perform poorly with limited training data availability, resulting in a critical need to increase SAR ATR performance. In this study, a new method to improve SAR ATR when training data are limited is proposed. First, an embedded feature augmenter is designed to enhance the extracted virtual features located far away from the class center. Based on the relative distribution of the features, the algorithm pulls the corresponding virtual features with different strengths toward the corresponding class center. The designed augmenter increases the amount of information available for supervised training and improves the separability of the extracted features. Second, a dynamic hierarchical-feature refiner is proposed to capture the discriminative local features of the samples. Through dynamically generated kernels, the proposed refiner integrates the discriminative local features of different dimensions into the global features, further enhancing the inner-class compactness and inter-class separability of the extracted features. The proposed method not only increases the amount of information available for supervised training but also extracts the discriminative features from the samples, resulting in superior ATR performance in problems with limited SAR training data. Experimental results on the moving and stationary target acquisition and recognition (MSTAR), OpenSARShip, and FUSAR-Ship benchmark datasets demonstrate the robustness and outstanding ATR performance of the proposed method in response to limited SAR training data.
△ Less
Submitted 1 September, 2023; v1 submitted 20 August, 2023;
originally announced August 2023.
-
Causal SAR ATR with Limited Data via Dual Invariance
Authors:
Chenwei Wang,
You Qin,
Li Li,
Siyi Luo,
Yulin Huang,
Jifang Pei,
Yin Zhang,
Jianyu Yang
Abstract:
Synthetic aperture radar automatic target recognition (SAR ATR) with limited data has recently been a hot research topic to enhance weak generalization. Despite many excellent methods being proposed, a fundamental theory is lacked to explain what problem the limited SAR data causes, leading to weak generalization of ATR. In this paper, we establish a causal ATR model demonstrating that noise $N$ t…
▽ More
Synthetic aperture radar automatic target recognition (SAR ATR) with limited data has recently been a hot research topic to enhance weak generalization. Despite many excellent methods being proposed, a fundamental theory is lacked to explain what problem the limited SAR data causes, leading to weak generalization of ATR. In this paper, we establish a causal ATR model demonstrating that noise $N$ that could be blocked with ample SAR data, becomes a confounder with limited data for recognition. As a result, it has a detrimental causal effect damaging the efficacy of feature $X$ extracted from SAR images, leading to weak generalization of SAR ATR with limited data. The effect of $N$ on feature can be estimated and eliminated by using backdoor adjustment to pursue the direct causality between $X$ and the predicted class $Y$. However, it is difficult for SAR images to precisely estimate and eliminated the effect of $N$ on $X$. The limited SAR data scarcely powers the majority of existing optimization losses based on empirical risk minimization (ERM), thus making it difficult to effectively eliminate $N$'s effect. To tackle with difficult estimation and elimination of $N$'s effect, we propose a dual invariance comprising the inner-class invariant proxy and the noise-invariance loss. Motivated by tackling change with invariance, the inner-class invariant proxy facilitates precise estimation of $N$'s effect on $X$ by obtaining accurate invariant features for each class with the limited data. The noise-invariance loss transitions the ERM's data quantity necessity into a need for noise environment annotations, effectively eliminating $N$'s effect on $X$ by cleverly applying the previous $N$'s estimation as the noise environment annotations. Experiments on three benchmark datasets indicate that the proposed method achieves superior performance.
△ Less
Submitted 10 November, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Unveiling Causalities in SAR ATR: A Causal Interventional Approach for Limited Data
Authors:
Chenwei Wang,
Xin Chen,
You Qin,
Siyi Luo,
Yulin Huang,
Jifang Pei,
Jianyu Yang
Abstract:
Synthetic aperture radar automatic target recognition (SAR ATR) methods fall short with limited training data. In this letter, we propose a causal interventional ATR method (CIATR) to formulate the problem of limited SAR data which helps us uncover the ever-elusive causalities among the key factors in ATR, and thus pursue the desired causal effect without changing the imaging conditions. A structu…
▽ More
Synthetic aperture radar automatic target recognition (SAR ATR) methods fall short with limited training data. In this letter, we propose a causal interventional ATR method (CIATR) to formulate the problem of limited SAR data which helps us uncover the ever-elusive causalities among the key factors in ATR, and thus pursue the desired causal effect without changing the imaging conditions. A structural causal model (SCM) is comprised using causal inference to help understand how imaging conditions acts as a confounder introducing spurious correlation when SAR data is limited. This spurious correlation among SAR images and the predicted classes can be fundamentally tackled with the conventional backdoor adjustments. An effective implement of backdoor adjustments is proposed by firstly using data augmentation with spatial-frequency domain hybrid transformation to estimate the potential effect of varying imaging conditions on SAR images. Then, a feature discrimination approach with hybrid similarity measurement is introduced to measure and mitigate the structural and vector angle impacts of varying imaging conditions on the extracted features from SAR images. Thus, our CIATR can pursue the true causality between SAR images and the corresponding classes even with limited SAR data. Experiments and comparisons conducted on the moving and stationary target acquisition and recognition (MSTAR) and OpenSARship datasets have shown the effectiveness of our method with limited SAR data.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
A Message Passing Detection based Affine Frequency Division Multiplexing Communication System
Authors:
Lifan Wu,
Shan Luo,
Dongxiao Song,
Fan Yang,
Rong** Lin
Abstract:
The next generation of wireless communication technology is anticipated to address the communication reliability challenges encountered in high-speed mobile communication scenarios. An Orthogonal Time Frequency Space (OTFS) system has been introduced as a solution that effectively mitigates these issues. However, OTFS is associated with relatively high pilot overhead and multiuser multiplexing ove…
▽ More
The next generation of wireless communication technology is anticipated to address the communication reliability challenges encountered in high-speed mobile communication scenarios. An Orthogonal Time Frequency Space (OTFS) system has been introduced as a solution that effectively mitigates these issues. However, OTFS is associated with relatively high pilot overhead and multiuser multiplexing overhead. In response to these concerns within the OTFS framework, a novel modulation technology known as Affine Frequency Division Multiplexing (AFDM) which is based on the discrete affine Fourier transform has emerged. AFDM effectively resolves the challenges by achieving full diversity through parameter adjustments aligned with the channel's delay-Doppler profile. Consequently, AFDM is capable of achieving performance levels comparable to OTFS. As the research on AFDM detection is currently limited, we present a low-complexity yet efficient message passing (MP) algorithm. This algorithm handles joint interference cancellation and detection while capitalizing on the inherent sparsity of the channel. Based on simulation results, the MP detection algorithm outperforms Minimum Mean Square Error (MMSE) and Maximal Ratio Combining (MRC) detection techniques.
△ Less
Submitted 30 August, 2023; v1 submitted 29 July, 2023;
originally announced July 2023.
-
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Authors:
Simian Luo,
Chuanhao Yan,
Chenxu Hu,
Hang Zhao
Abstract:
The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronized Video-to-Audio synthesis method with a latent diffusi…
▽ More
The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronized Video-to-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and audio-visual relevance. We adopt contrastive audio-visual pretraining (CAVP) to learn more temporally and semantically aligned features, then train an LDM with CAVP-aligned visual features on spectrogram latent space. The CAVP-aligned features enable LDM to capture the subtler audio-visual correlation via a cross-attention module. We further significantly improve sample quality with `double guidance'. Diff-Foley achieves state-of-the-art V2A performance on current large scale V2A dataset. Furthermore, we demonstrate Diff-Foley practical applicability and generalization capabilities via downstream finetuning. Project Page: see https://diff-foley.github.io/
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
SAR ATR under Limited Training Data Via MobileNetV3
Authors:
Chenwei Wang,
Siyi Luo,
Lin Liu,
Yin Zhang,
Jifang Pei,
Yulin Huang,
Jianyu Yang
Abstract:
In recent years, deep learning has been widely used to solve the bottleneck problem of synthetic aperture radar (SAR) automatic target recognition (ATR). However, most current methods rely heavily on a large number of training samples and have many parameters which lead to failure under limited training samples. In practical applications, the SAR ATR method needs not only superior performance unde…
▽ More
In recent years, deep learning has been widely used to solve the bottleneck problem of synthetic aperture radar (SAR) automatic target recognition (ATR). However, most current methods rely heavily on a large number of training samples and have many parameters which lead to failure under limited training samples. In practical applications, the SAR ATR method needs not only superior performance under limited training data but also real-time performance. Therefore, we try to use a lightweight network for SAR ATR under limited training samples, which has fewer parameters, less computational effort, and shorter inference time than normal networks. At the same time, the lightweight network combines the advantages of existing lightweight networks and uses a combination of MnasNet and NetAdapt algorithms to find the optimal neural network architecture for a given problem. Through experiments and comparisons under the moving and stationary target acquisition and recognition (MSTAR) dataset, the lightweight network is validated to have excellent recognition performance for SAR ATR on limited training samples and be very computationally small, reflecting the great potential of this network structure for practical applications.
△ Less
Submitted 10 August, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Deep Learning-Aided Perturbation Model-Based Fiber Nonlinearity Compensation
Authors:
Shenghang Luo,
Sunish Kumar Orappanpara Soman,
Lutz Lampe,
Jeebak Mitra
Abstract:
Fiber nonlinearity effects cap achievable rates and ranges in long-haul optical fiber communication links. Conventional nonlinearity compensation methods, such as perturbation theory-based nonlinearity compensation (PB-NLC), attempt to compensate for the nonlinearity by approximating analytical solutions to the signal propagation over optical fibers. However, their practical usability is limited b…
▽ More
Fiber nonlinearity effects cap achievable rates and ranges in long-haul optical fiber communication links. Conventional nonlinearity compensation methods, such as perturbation theory-based nonlinearity compensation (PB-NLC), attempt to compensate for the nonlinearity by approximating analytical solutions to the signal propagation over optical fibers. However, their practical usability is limited by model mismatch and the immense computational complexity associated with the analytical computation of perturbation triplets and the nonlinearity distortion field. Recently, machine learning techniques have been used to optimise parameters of PB-based approaches, which traditionally have been determined analytically from physical models. It has been claimed in the literature that the learned PB-NLC approaches have improved performance and/or reduced computational complexity over their non-learned counterparts. In this paper, we first revisit the acclaimed benefits of the learned PB-NLC approaches by carefully carrying out a comprehensive performance-complexity analysis utilizing state-of-the-art complexity reduction methods. Interestingly, our results show that least squares-based PB-NLC with clustering quantization has the best performance-complexity trade-off among the learned PB-NLC approaches. Second, we advance the state-of-the-art of learned PB-NLC by proposing and designing a fully learned structure. We apply a bi-directional recurrent neural network for learning perturbation triplets that are alike those obtained from the analytical computation and are used as input features for the neural network to estimate the nonlinearity distortion field. Finally, we demonstrate through numerical simulations that our proposed fully learned approach achieves an improved performance-complexity trade-off compared to the existing learned and non-learned PB-NLC techniques.
△ Less
Submitted 15 June, 2023; v1 submitted 19 November, 2022;
originally announced November 2022.
-
Rate-Distortion Modeling for Bit Rate Constrained Point Cloud Compression
Authors:
Pan Gao,
Shengzhou Luo,
Manoranjan Paul
Abstract:
As being one of the main representation formats of 3D real world and well-suited for virtual reality and augmented reality applications, point clouds have gained a lot of popularity. In order to reduce the huge amount of data, a considerable amount of research on point cloud compression has been done. However, given a target bit rate, how to properly choose the color and geometry quantization para…
▽ More
As being one of the main representation formats of 3D real world and well-suited for virtual reality and augmented reality applications, point clouds have gained a lot of popularity. In order to reduce the huge amount of data, a considerable amount of research on point cloud compression has been done. However, given a target bit rate, how to properly choose the color and geometry quantization parameters for compressing point clouds is still an open issue. In this paper, we propose a rate-distortion model based quantization parameter selection scheme for bit rate constrained point cloud compression. Firstly, to overcome the measurement uncertainty in evaluating the distortion of the point clouds, we propose a unified model to combine the geometry distortion and color distortion. In this model, we take into account the correlation between geometry and color variables of point clouds and derive a dimensionless quantity to represent the overall quality degradation. Then, we derive the relationships of overall distortion and bit rate with the quantization parameters. Finally, we formulate the bit rate constrained point cloud compression as a constrained minimization problem using the derived polynomial models and deduce the solution via an iterative numerical method. Experimental results show that the proposed algorithm can achieve optimal decoded point cloud quality at various target bit rates, and substantially outperform the video-rate-distortion model based point cloud compression scheme.
△ Less
Submitted 19 November, 2022;
originally announced November 2022.
-
Learning for Perturbation-Based Fiber Nonlinearity Compensation
Authors:
Shenghang Luo,
Sunish Kumar Orappanpara Soman,
Lutz Lampe,
Jeebak Mitra,
Chuandong Li
Abstract:
Several machine learning inspired methods for perturbation-based fiber nonlinearity (PBNLC) compensation have been presented in recent literature. We critically revisit acclaimed benefits of those over non-learned methods. Numerical results suggest that learned linear processing of perturbation triplets of PB-NLC is preferable over feedforward neural-network solutions.
Several machine learning inspired methods for perturbation-based fiber nonlinearity (PBNLC) compensation have been presented in recent literature. We critically revisit acclaimed benefits of those over non-learned methods. Numerical results suggest that learned linear processing of perturbation triplets of PB-NLC is preferable over feedforward neural-network solutions.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Universal Segmentation of 33 Anatomies
Authors:
Pengbo Liu,
Yang Deng,
Ce Wang,
Yuan Hui,
Qian Li,
Jun Li,
Shiwei Luo,
Mengke Sun,
Quan Quan,
Shuxin Yang,
You Hao,
Honghu Xiao,
Chunpeng Zhao,
Xinbao Wu,
S. Kevin Zhou
Abstract:
In the paper, we present an approach for learning a single model that universally segments 33 anatomical structures, including vertebrae, pelvic bones, and abdominal organs. Our model building has to address the following challenges. Firstly, while it is ideal to learn such a model from a large-scale, fully-annotated dataset, it is practically hard to curate such a dataset. Thus, we resort to lear…
▽ More
In the paper, we present an approach for learning a single model that universally segments 33 anatomical structures, including vertebrae, pelvic bones, and abdominal organs. Our model building has to address the following challenges. Firstly, while it is ideal to learn such a model from a large-scale, fully-annotated dataset, it is practically hard to curate such a dataset. Thus, we resort to learn from a union of multiple datasets, with each dataset containing the images that are partially labeled. Secondly, along the line of partial labelling, we contribute an open-source, large-scale vertebra segmentation dataset for the benefit of spine analysis community, CTSpine1K, boasting over 1,000 3D volumes and over 11K annotated vertebrae. Thirdly, in a 3D medical image segmentation task, due to the limitation of GPU memory, we always train a model using cropped patches as inputs instead a whole 3D volume, which limits the amount of contextual information to be learned. To this, we propose a cross-patch transformer module to fuse more information in adjacent patches, which enlarges the aggregated receptive field for improved segmentation performance. This is especially important for segmenting, say, the elongated spine. Based on 7 partially labeled datasets that collectively contain about 2,800 3D volumes, we successfully learn such a universal model. Finally, we evaluate the universal model on multiple open-source datasets, proving that our model has a good generalization performance and can potentially serve as a solid foundation for downstream tasks.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
Evolutionary Multi-Objective Reinforcement Learning Based Trajectory Control and Task Offloading in UAV-Assisted Mobile Edge Computing
Authors:
Fuhong Song,
Huanlai Xing,
Xinhan Wang,
Shouxi Luo,
Penglin Dai,
Zhiwen Xiao,
Bowen Zhao
Abstract:
This paper studies the trajectory control and task offloading (TCTO) problem in an unmanned aerial vehicle (UAV)-assisted mobile edge computing system, where a UAV flies along a planned trajectory to collect computation tasks from smart devices (SDs). We consider a scenario that SDs are not directly connected by the base station (BS) and the UAV has two roles to play: MEC server or wireless relay.…
▽ More
This paper studies the trajectory control and task offloading (TCTO) problem in an unmanned aerial vehicle (UAV)-assisted mobile edge computing system, where a UAV flies along a planned trajectory to collect computation tasks from smart devices (SDs). We consider a scenario that SDs are not directly connected by the base station (BS) and the UAV has two roles to play: MEC server or wireless relay. The UAV makes task offloading decisions online, in which the collected tasks can be executed locally on the UAV or offloaded to the BS for remote processing. The TCTO problem involves multi-objective optimization as its objectives are to minimize the task delay and the UAV's energy consumption, and maximize the number of tasks collected by the UAV, simultaneously. This problem is challenging because the three objectives conflict with each other. The existing reinforcement learning (RL) algorithms, either single-objective RLs or single-policy multi-objective RLs, cannot well address the problem since they cannot output multiple policies for various preferences (i.e. weights) across objectives in a single run. This paper adapts the evolutionary multi-objective RL (EMORL), a multi-policy multi-objective RL, to the TCTO problem. This algorithm can output multiple optimal policies in just one run, each optimizing a certain preference. The simulation results demonstrate that the proposed algorithm can obtain more excellent nondominated policies by striking a balance between the three objectives regarding policy quality, compared with two evolutionary and two multi-policy RL algorithms.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Systematic dispersion compensation for spectral domain optical coherence tomography using time-frequency analysis and iterative optimization for iridocorneal angle imaging
Authors:
Shangbang Luo,
Guy Holland,
Eric Mikula,
Samantha Bradford,
Reza Khazaeinezhad,
James V Jester,
Tibor Juhasz
Abstract:
Dispersion is a common phenomenon in optics due to the frequency dependence of the refractive index in polychromatic light. This issue, if left untreated in optical coherence tomography (OCT) imaging, leads to signal broadening of the coherence length and deterioration of the axial resolution. We report a new numeric method for the systematic dispersion compensation in a spectral-domain (SD) OCT f…
▽ More
Dispersion is a common phenomenon in optics due to the frequency dependence of the refractive index in polychromatic light. This issue, if left untreated in optical coherence tomography (OCT) imaging, leads to signal broadening of the coherence length and deterioration of the axial resolution. We report a new numeric method for the systematic dispersion compensation in a spectral-domain (SD) OCT for imaging the iridocorneal angle of human cadaver eyes. The dispersion compensation for our OCT system is calculated by an automated iterative process that minimizes the wavenumber-dependent variance of the ridge extracted from the energy distribution of a mirror's spectral interferogram using Short-Time Fourier Transform (STFT) Time-Frequency Analysis (TFA). The average axial resolution of 2.7 um in air was achieved at a range of depths up to 2 mm. Compensated OCT images of the iridocorneal angle in human cadaver eyes were much clearer than non-compensated images. We demonstrate the feasibility, effectiveness, and robustness of the proposed method for dispersion compensation in an SD-OCT by evaluating both the mirror and human cadaver eye measurements. We also verified that our imaging system is able to visualize the iridocorneal angle details, such as trabecular meshwork (TM), Schlemm's canal (SC), and collector channels (CCs), which are important ocular outflow structures and play a crucial role in glaucoma managements.
△ Less
Submitted 30 December, 2021;
originally announced December 2021.
-
Perturbation Theory-Aided Learned Digital Back-Propagation Scheme for Optical Fiber Nonlinearity Compensation
Authors:
Xiang Lin,
Shenghang Luo,
Sunish Kumar Orappanpara Soman,
Octavia A. Dobre,
Lutz Lampe,
Deyuan Chang,
Chuandong Li
Abstract:
Derived from the regular perturbation treatment of the nonlinear Schrodinger equation, a machine learning-based scheme to mitigate the intra-channel optical fiber nonlinearity is proposed. Referred to as the perturbation theory-aided (PA) learned digital back-propagation (LDBP), the proposed scheme constructs a deep neural network (DNN) in a way similar to the split-step Fourier method: linear and…
▽ More
Derived from the regular perturbation treatment of the nonlinear Schrodinger equation, a machine learning-based scheme to mitigate the intra-channel optical fiber nonlinearity is proposed. Referred to as the perturbation theory-aided (PA) learned digital back-propagation (LDBP), the proposed scheme constructs a deep neural network (DNN) in a way similar to the split-step Fourier method: linear and nonlinear operations alternate. Inspired by the perturbation analysis, the intra-channel cross-phase modulation term is conveniently represented by matrix operations in the DNN. The introduction of this term in each nonlinear operation considerably improves the performance, as well as enables the flexibility of PA-LDBP by adjusting the numbers of spans per step. The proposed scheme is evaluated by numerical simulations of a single carrier optical fiber communication system operating at 32 Gbaud with 64-quadrature amplitude modulation and 20*80 km transmission distance. The results show that the proposed scheme achieves approximately 3.5 dB, 1.8 dB, 1.4 dB, and 0.5 dB performance gain in terms of Q2 factor over the linear compensation, when the numbers of spans per step are 1, 2, 4, and 10, respectively. Two methods are proposed to reduce the complexity of PALDBP, i.e., pruning the number of perturbation coefficients and chromatic dispersion compensation in the frequency domain for multi-span per step cases. Investigation of the performance and complexity suggests that PA-LDBP attains improved performance gains with reduced complexity when compared to LDBP in the cases of 4 and 10 spans per step.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
CTSpine1K: A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography
Authors:
Yang Deng,
Ce Wang,
Yuan Hui,
Qian Li,
Jun Li,
Shiwei Luo,
Mengke Sun,
Quan Quan,
Shuxin Yang,
You Hao,
Pengbo Liu,
Honghu Xiao,
Chunpeng Zhao,
Xinbao Wu,
S. Kevin Zhou
Abstract:
Spine-related diseases have high morbidity and cause a huge burden of social cost. Spine imaging is an essential tool for noninvasively visualizing and assessing spinal pathology. Segmenting vertebrae in computed tomography (CT) images is the basis of quantitative medical image analysis for clinical diagnosis and surgery planning of spine diseases. Current publicly available annotated datasets on…
▽ More
Spine-related diseases have high morbidity and cause a huge burden of social cost. Spine imaging is an essential tool for noninvasively visualizing and assessing spinal pathology. Segmenting vertebrae in computed tomography (CT) images is the basis of quantitative medical image analysis for clinical diagnosis and surgery planning of spine diseases. Current publicly available annotated datasets on spinal vertebrae are small in size. Due to the lack of a large-scale annotated spine image dataset, the mainstream deep learning-based segmentation methods, which are data-driven, are heavily restricted. In this paper, we introduce a large-scale spine CT dataset, called CTSpine1K, curated from multiple sources for vertebra segmentation, which contains 1,005 CT volumes with over 11,100 labeled vertebrae belonging to different spinal conditions. Based on this dataset, we conduct several spinal vertebrae segmentation experiments to set the first benchmark. We believe that this large-scale dataset will facilitate further research in many spine-related image analysis tasks, including but not limited to vertebrae segmentation, labeling, 3D spine reconstruction from biplanar radiographs, image super-resolution, and enhancement.
△ Less
Submitted 5 July, 2021; v1 submitted 31 May, 2021;
originally announced May 2021.
-
Deep Neural Network Assisted Second-Order Perturbation-Based Nonlinearity Compensation
Authors:
O. S. Sunish Kumar,
Lutz Lampe,
Shenghang Luo,
Mrinmoy Jana,
Jeebak Mitra,
Chuandong Li
Abstract:
We propose a fiber nonlinearity post-compensation technique using the DNN and the second-order perturbation theory. We achieve 1 dB Q-factor improvement for a 32 Gbaud PDM-64-QAM at 1200 km compared to the linear dispersion compensation.
We propose a fiber nonlinearity post-compensation technique using the DNN and the second-order perturbation theory. We achieve 1 dB Q-factor improvement for a 32 Gbaud PDM-64-QAM at 1200 km compared to the linear dispersion compensation.
△ Less
Submitted 27 June, 2021; v1 submitted 19 May, 2021;
originally announced May 2021.
-
A lightweight deep learning based cloud detection method for Sentinel-2A imagery fusing multi-scale spectral and spatial features
Authors:
Jun Li,
Zhaocong Wu,
Zhongwen Hu,
Canliang Jian,
Shaojie Luo,
Lichao Mou,
Xiao Xiang Zhu,
Matthieu Molinier
Abstract:
Clouds are a very important factor in the availability of optical remote sensing images. Recently, deep learning-based cloud detection methods have surpassed classical methods based on rules and physical models of clouds. However, most of these deep models are very large which limits their applicability and explainability, while other models do not make use of the full spectral information in mult…
▽ More
Clouds are a very important factor in the availability of optical remote sensing images. Recently, deep learning-based cloud detection methods have surpassed classical methods based on rules and physical models of clouds. However, most of these deep models are very large which limits their applicability and explainability, while other models do not make use of the full spectral information in multi-spectral images such as Sentinel-2. In this paper, we propose a lightweight network for cloud detection, fusing multi-scale spectral and spatial features (CDFM3SF) and tailored for processing all spectral bands in Sentinel- 2A images. The proposed method consists of an encoder and a decoder. In the encoder, three input branches are designed to handle spectral bands at their native resolution and extract multiscale spectral features. Three novel components are designed: a mixed depth-wise separable convolution (MDSC) and a shared and dilated residual block (SDRB) to extract multi-scale spatial features, and a concatenation and sum (CS) operation to fuse multi-scale spectral and spatial features with little calculation and no additional parameters. The decoder of CD-FM3SF outputs three cloud masks at the same resolution as input bands to enhance the supervision information of small, middle and large clouds. To validate the performance of the proposed method, we manually labeled 36 Sentinel-2A scenes evenly distributed over mainland China. The experiment results demonstrate that CD-FM3SF outperforms traditional cloud detection methods and state-of-theart deep learning-based methods in both accuracy and speed.
△ Less
Submitted 29 April, 2021;
originally announced May 2021.
-
Generative-Adversarial-Networks-based Ghost Recognition
Authors:
Yuchen He,
Yibing Chen,
Sheng Luo,
Hui Chen,
Jianxing Li,
Zhuo Xu
Abstract:
Nowadays, target recognition technique plays an important role in many fields. However, the current target image information based methods suffer from the influence of image quality and the time cost of image reconstruction. In this paper, we propose a novel imaging-free target recognition method combining ghost imaging (GI) and generative adversarial networks (GAN). Based on the mechanism of GI,…
▽ More
Nowadays, target recognition technique plays an important role in many fields. However, the current target image information based methods suffer from the influence of image quality and the time cost of image reconstruction. In this paper, we propose a novel imaging-free target recognition method combining ghost imaging (GI) and generative adversarial networks (GAN). Based on the mechanism of GI, a set of random speckles sequence is employed to illuminate target, and a bucket detector without resolution is utilized to receive echo signal. The bucket signal sequence formed after continuous detections is constructed into a bucket signal array, which is regarded as the sample of GAN. Then, conditional GAN is used to map bucket signal array and target category. In practical application, the speckles sequence in training step is employed to illuminate target, and the bucket signal array is input GAN for recognition. The proposed method can improve the problems caused by conventional recognition methods that based on target image information, and provide a certain turbulence-free ability. Extensive experiments show that the proposed method achieves promising performance.
△ Less
Submitted 6 September, 2021; v1 submitted 25 March, 2021;
originally announced March 2021.
-
Speech Recognition by Simply Fine-tuning BERT
Authors:
Wen-Chin Huang,
Chia-Hua Wu,
Shang-Bao Luo,
Kuan-Yu Chen,
Hsin-Min Wang,
Tomoki Toda
Abstract:
We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual representations. Our assumption is that given a history context sequence, a powerful LM can narrow the range of possible choices and the speech signal can be used as a simple clue. Hence, comparing to conve…
▽ More
We propose a simple method for automatic speech recognition (ASR) by fine-tuning BERT, which is a language model (LM) trained on large-scale unlabeled text data and can generate rich contextual representations. Our assumption is that given a history context sequence, a powerful LM can narrow the range of possible choices and the speech signal can be used as a simple clue. Hence, comparing to conventional ASR systems that train a powerful acoustic model (AM) from scratch, we believe that speech recognition is possible by simply fine-tuning a BERT model. As an initial study, we demonstrate the effectiveness of the proposed idea on the AISHELL dataset and show that stacking a very simple AM on top of BERT can yield reasonable performance.
△ Less
Submitted 30 January, 2021;
originally announced February 2021.
-
f2IMU-R: Pedestrian Navigation by Low-cost Foot-Mounted Dual IMUs and Inter-foot Ranging
Authors:
Maoran Zhu,
Yuanxin Wu,
Shitu Luo
Abstract:
Foot-mounted inertial sensors become popular in many indoor or GPS-denied applications, including but not limited to medical monitoring, gait analysis, soldier and first responder positioning. However, the foot-mounted inertial navigation relies largely on the aid of Zero Velocity Update (ZUPT) and has encountered inherent problems such as heading drift. This paper implements a pedestrian navigati…
▽ More
Foot-mounted inertial sensors become popular in many indoor or GPS-denied applications, including but not limited to medical monitoring, gait analysis, soldier and first responder positioning. However, the foot-mounted inertial navigation relies largely on the aid of Zero Velocity Update (ZUPT) and has encountered inherent problems such as heading drift. This paper implements a pedestrian navigation system based on dual foot-mounted low-cost inertial measurement units (IMU) and inter-foot ultrasonic ranging. The observability analysis of the system is performed to investigate the roles of the ZUPT measurement and the foot-to-foot ranging measurement in improving the state estimability. A Kalman-based estimation algorithm is mechanized in the Earth frame, rather than in the common local-level frame, which is found to be effective in depressing the linearization error in Kalman filtering. An ellipsoid constraint in the Earth frame is also proposed to further restrict the height drift. Simulation and real field experiments show that the proposed method has better robustness and positioning accuracy (about 0.1-0.2% travelled distance) than the traditional pedestrian navigation schemes do.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Radiologist-level Performance by Using Deep Learning for Segmentation of Breast Cancers on MRI Scans
Authors:
Lukas Hirsch,
Yu Huang,
Shaojun Luo,
Carolina Rossi Saccarelli,
Roberto Lo Gullo,
Isaac Daimiel Naranjo,
Almir G. V. Bitencourt,
Natsuko Onishi,
Eun Sook Ko,
Doris Leithner,
Daly Avendano,
Sarah Eskreis-Winkler,
Mary Hughes,
Danny F. Martinez,
Katja Pinker,
Krishna Juluru,
Amin E. El-Rowmeim,
Pierre Elnajjar,
Elizabeth A. Morris,
Hernan A. Makse,
Lucas C Parra,
Elizabeth J. Sutton
Abstract:
Purpose: To develop a deep network architecture that would achieve fully automated radiologist-level segmentation of cancers at breast MRI.
Materials and Methods: In this retrospective study, 38229 examinations (composed of 64063 individual breast scans from 14475 patients) were performed in female patients (age range, 12-94 years; mean age, 52 years +/- 10 [standard deviation]) who presented betw…
▽ More
Purpose: To develop a deep network architecture that would achieve fully automated radiologist-level segmentation of cancers at breast MRI.
Materials and Methods: In this retrospective study, 38229 examinations (composed of 64063 individual breast scans from 14475 patients) were performed in female patients (age range, 12-94 years; mean age, 52 years +/- 10 [standard deviation]) who presented between 2002 and 2014 at a single clinical site. A total of 2555 breast cancers were selected that had been segmented on two-dimensional (2D) images by radiologists, as well as 60108 benign breasts that served as examples of noncancerous tissue; all these were used for model training. For testing, an additional 250 breast cancers were segmented independently on 2D images by four radiologists. Authors selected among several three-dimensional (3D) deep convolutional neural network architectures, input modalities, and harmonization methods. The outcome measure was the Dice score for 2D segmentation, which was compared between the network and radiologists by using the Wilcoxon signed rank test and the two one-sided test procedure.
Results: The highest-performing network on the training set was a 3D U-Net with dynamic contrast-enhanced MRI as input and with intensity normalized for each examination. In the test set, the median Dice score of this network was 0.77 (interquartile range, 0.26). The performance of the network was equivalent to that of the radiologists (two one-sided test procedures with radiologist performance of 0.69-0.84 as equivalence bounds, P <= .001 for both; n = 250).
Conclusion: When trained on a sufficiently large dataset, the developed 3D U-Net performed as well as fellowship-trained radiologists in detailed 2D segmentation of breast cancers at routine clinical MRI.
△ Less
Submitted 12 April, 2022; v1 submitted 21 September, 2020;
originally announced September 2020.
-
Structured Sparsity Modeling for Improved Multivariate Statistical Analysis based Fault Isolation
Authors:
Wei Chen,
Jiusun Zeng,
Xiaobin Xu,
Shihua Luo,
Chuanhou Gao
Abstract:
In order to improve the fault diagnosis capability of multivariate statistical methods, this article introduces a fault isolation framework based on structured sparsity modeling. The developed method relies on the reconstruction based contribution analysis and the process structure information can be incorporated into the reconstruction objective function in the form of structured sparsity regular…
▽ More
In order to improve the fault diagnosis capability of multivariate statistical methods, this article introduces a fault isolation framework based on structured sparsity modeling. The developed method relies on the reconstruction based contribution analysis and the process structure information can be incorporated into the reconstruction objective function in the form of structured sparsity regularization terms. The structured sparsity terms allow selection of fault variables over structures like blocks or networks of process variables, hence more accurate fault isolation can be achieved. Four structured sparsity terms corresponding to different kinds of process information are considered, namely, partially known sparse support, block sparsity, clustered sparsity and tree-structured sparsity. The optimization problems involving the structured sparsity terms can be solved using the Alternating Direction Method of Multipliers (ADMM) algorithm, which is fast and efficient. Through a simulation example and an application study to a coal-fired power plant, it is verified that the proposed method can better isolate faulty variables by incorporating process structure information.
△ Less
Submitted 21 December, 2020; v1 submitted 5 September, 2020;
originally announced September 2020.
-
Efficient Pig Counting in Crowds with Keypoints Tracking and Spatial-aware Temporal Response Filtering
Authors:
Guang Chen,
Shiwen Shen,
Longyin Wen,
Si Luo,
Liefeng Bo
Abstract:
Pig counting is a crucial task for large-scale pig farming, which is usually completed by human visually. But this process is very time-consuming and error-prone. Few studies in literature developed automated pig counting method. Existing methods only focused on pig counting using single image, and its accuracy is challenged by several factors, including pig movements, occlusion and overlap**. E…
▽ More
Pig counting is a crucial task for large-scale pig farming, which is usually completed by human visually. But this process is very time-consuming and error-prone. Few studies in literature developed automated pig counting method. Existing methods only focused on pig counting using single image, and its accuracy is challenged by several factors, including pig movements, occlusion and overlap**. Especially, the field of view of a single image is very limited, and could not meet the requirements of pig counting for large pig grou** houses. To that end, we presented a real-time automated pig counting system in crowds using only one monocular fisheye camera with an inspection robot. Our system showed that it produces accurate results surpassing human. Our pipeline began with a novel bottom-up pig detection algorithm to avoid false negatives due to overlap**, occlusion and deformation of pigs. A deep convolution neural network (CNN) is designed to detect keypoints of pig body part and associate the keypoints to identify individual pigs. After that, an efficient on-line tracking method is used to associate pigs across video frames. Finally, a novel spatial-aware temporal response filtering (STRF) method is proposed to predict the counts of pigs, which is effective to suppress false positives caused by pig or camera movements or tracking failures. The whole pipeline has been deployed in an edge computing device, and demonstrated the effectiveness.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
STDPG: A Spatio-Temporal Deterministic Policy Gradient Agent for Dynamic Routing in SDN
Authors:
Juan Chen,
Zhiwen Xiao,
Huanlai Xing,
Penglin Dai,
Shouxi Luo,
Muhammad Azhar Iqbal
Abstract:
Dynamic routing in software-defined networking (SDN) can be viewed as a centralized decision-making problem. Most of the existing deep reinforcement learning (DRL) agents can address it, thanks to the deep neural network (DNN)incorporated. However, fully-connected feed-forward neural network (FFNN) is usually adopted, where spatial correlation and temporal variation of traffic flows are ignored. T…
▽ More
Dynamic routing in software-defined networking (SDN) can be viewed as a centralized decision-making problem. Most of the existing deep reinforcement learning (DRL) agents can address it, thanks to the deep neural network (DNN)incorporated. However, fully-connected feed-forward neural network (FFNN) is usually adopted, where spatial correlation and temporal variation of traffic flows are ignored. This drawback usually leads to significantly high computational complexity due to large number of training parameters. To overcome this problem, we propose a novel model-free framework for dynamic routing in SDN, which is referred to as spatio-temporal deterministic policy gradient (STDPG) agent. Both the actor and critic networks are based on identical DNN structure, where a combination of convolutional neural network (CNN) and long short-term memory network (LSTM) with temporal attention mechanism, CNN-LSTM-TAM, is devised. By efficiently exploiting spatial and temporal features, CNNLSTM-TAM helps the STDPG agent learn better from the experience transitions. Furthermore, we employ the prioritized experience replay (PER) method to accelerate the convergence of model training. The experimental results show that STDPG can automatically adapt for current network environment and achieve robust convergence. Compared with a number state-ofthe-art DRL agents, STDPG achieves better routing solutions in terms of the average end-to-end delay.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Convex Shape Representation with Binary Labels for Image Segmentation: Models and Fast Algorithms
Authors:
Shousheng Luo,
Xue-Cheng Tai,
Yang Wang
Abstract:
We present a novel and effective binary representation for convex shapes. We show the equivalence between the shape convexity and some properties of the associated indicator function. The proposed method has two advantages. Firstly, the representation is based on a simple inequality constraint on the binary function rather than the definition of convex shapes, which allows us to obtain efficient a…
▽ More
We present a novel and effective binary representation for convex shapes. We show the equivalence between the shape convexity and some properties of the associated indicator function. The proposed method has two advantages. Firstly, the representation is based on a simple inequality constraint on the binary function rather than the definition of convex shapes, which allows us to obtain efficient algorithms for various applications with convexity prior. Secondly, this method is independent of the dimension of the concerned shape. In order to show the effectiveness of the proposed representation approach, we incorporate it with a probability based model for object segmentation with convexity prior. Efficient algorithms are given to solve the proposed models using Lagrange multiplier methods and linear approximations. Various experiments are given to show the superiority of the proposed methods.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
Data-driven Method for 3D Axis-symmetric Object Reconstruction from Single Cone-beam Projection Data
Authors:
Shousheng Luo,
Ruyue Meng,
Suhua Wei,
Jianfeng Cai,
Xuecheng Tai,
Yang Wang
Abstract:
In this paper we consider 3D axis-symmetric (AS) object reconstruction from single cone-beam x-ray projection data. Traditional x-ray CT fails to capture fleeting state of material due to the long time for data acquisition at all angles. Therefore, AS object is devised to investigate the instant deformation of material under pulse change of environment because single projection data is enough to r…
▽ More
In this paper we consider 3D axis-symmetric (AS) object reconstruction from single cone-beam x-ray projection data. Traditional x-ray CT fails to capture fleeting state of material due to the long time for data acquisition at all angles. Therefore, AS object is devised to investigate the instant deformation of material under pulse change of environment because single projection data is enough to reconstruct its inner structure. Previous reconstruction methods are layer by layer, and ignore the longitudinal tilt of x-ray paths. We propose a regularization method using adaptive tight frame to reconstruct the 3D AS object structure simultaneously. Alternating direction method is adopted to solve the proposed model. More importantly, a numerical algorithm is developed to compute imaging matrix. Experiments on simulation data verify the effectiveness of our method
△ Less
Submitted 16 December, 2019;
originally announced December 2019.
-
Semi-supervised Learning Approach to Generate Neuroimaging Modalities with Adversarial Training
Authors:
Harrison Nguyen,
Simon Luo,
Fabio Ramos
Abstract:
Magnetic Resonance Imaging (MRI) of the brain can come in the form of different modalities such as T1-weighted and Fluid Attenuated Inversion Recovery (FLAIR) which has been used to investigate a wide range of neurological disorders. Current state-of-the-art models for brain tissue segmentation and disease classification require multiple modalities for training and inference. However, the acquisit…
▽ More
Magnetic Resonance Imaging (MRI) of the brain can come in the form of different modalities such as T1-weighted and Fluid Attenuated Inversion Recovery (FLAIR) which has been used to investigate a wide range of neurological disorders. Current state-of-the-art models for brain tissue segmentation and disease classification require multiple modalities for training and inference. However, the acquisition of all of these modalities are expensive, time-consuming, inconvenient and the required modalities are often not available. As a result, these datasets contain large amounts of \emph{unpaired} data, where examples in the dataset do not contain all modalities. On the other hand, there is smaller fraction of examples that contain all modalities (\emph{paired} data) and furthermore each modality is high dimensional when compared to number of datapoints. In this work, we develop a method to address these issues with semi-supervised learning in translating between two neuroimaging modalities. Our proposed model, Semi-Supervised Adversarial CycleGAN (SSA-CGAN), uses an adversarial loss to learn from \emph{unpaired} data points, cycle loss to enforce consistent reconstructions of the map**s and another adversarial loss to take advantage of \emph{paired} data points. Our experiments demonstrate that our proposed framework produces an improvement in reconstruction error and reduced variance for the pairwise translation of multiple modalities and is more robust to thermal noise when compared to existing methods.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
I2C Management Based on IPbus
Authors:
Shiyu Luo,
Junfeng Yang,
Kezhu Song,
Hongwei Yu,
Tengfei Chen,
Tianbo Xu,
Cheng Tang
Abstract:
CBM (Compressed Baryonic Matter) is mainly used to study QCD phase diagram of strong interactions in high and moderate temperature region. Before the next generation GBTx based CBM DAQ system is built up, the DPB (Data Processing Board) layer is used in data readout and data pre-processing, where a general FPGA FMC carrier board named AFCK is used. This paper mainly describes the management of the…
▽ More
CBM (Compressed Baryonic Matter) is mainly used to study QCD phase diagram of strong interactions in high and moderate temperature region. Before the next generation GBTx based CBM DAQ system is built up, the DPB (Data Processing Board) layer is used in data readout and data pre-processing, where a general FPGA FMC carrier board named AFCK is used. This paper mainly describes the management of the Inter-integrated Circuit (I2C) devices on AFCK and the FMCs it carries via IPBus, an FPGA-based slow control bus used in CBM DAQ system. On AFCK, the connection of IPBus depends on the correct initialization of a set of I2C devices, including the I2C-bus multiplexer (choosing correct I2C bus), the clock crosspoint switch (providing the 125MHz needed by 1000BASE-X/SGMII), the serial EEPROM with a EUI-48 address (providing the AFCK MAC address). An independent initial module can execute an I2C command sequence stored in a ROM, through which the FPGA can write to/read from the I2C devices without IPBus, so that the related I2C devices are correctly initialized and the necessary preparation for the IPBus start-up is fulfilled. After the initialization, a Wishbone I2C master core is used as an IPbus slave and all other I2C devices can be configured directly via IPBus. All the design has been fully tested in the CBM DPB design.
△ Less
Submitted 26 June, 2018;
originally announced June 2018.
-
A multi-channel DAQ system based on FPGA for long-distance transmission in nuclear physics experiments
Authors:
Hongwei Yu,
Kezhu Song,
Junfeng Yang,
Kehan Li,
Tengfei Chen,
Shiyu Luo,
Cheng Tang,
Han Yu
Abstract:
As the development of electronic science and technology, electronic data acquisition (DAQ) system is more and more widely applied to nuclear physics experiments. Workstations are often utilized for data storage, data display, data processing and data analysis by researchers. Nevertheless, the workstations are ordinarily separated from detectors in nuclear physics experiments by several kilometers…
▽ More
As the development of electronic science and technology, electronic data acquisition (DAQ) system is more and more widely applied to nuclear physics experiments. Workstations are often utilized for data storage, data display, data processing and data analysis by researchers. Nevertheless, the workstations are ordinarily separated from detectors in nuclear physics experiments by several kilometers or even tens of kilometers. Thus a DAQ system that can transmit data for long distance is in demand. In this paper, we designed a DAQ system suitable for high-speed and high-precision sampling for remote data transfer. An 8-channel, 24-bit simultaneous sampling analog-to-digital converter(ADC) named AD7779 was utilized for high-speed and high-precision sampling, the maximum operating speed of which runs up to 16 kilo samples per second(KSPS). ADC is responsible for collecting signals from detectors, which is sent to Field Programmable Gate Array(FPGA) for processing and long-distance transmission to the workstation through optical fiber. As the central processing unit of DAQ system, FPGA provides powerful computing capability and has enough flexibility. The most prominent feature of the system is real-time mass data transfer based on streaming transmission mode, highly reliable data transmission based on error detection and correction and high-speed high-precision data acquisition. The results of our tests show that the system is able to transmit data stably at the bandwidth of 1Gbps.
△ Less
Submitted 23 June, 2018;
originally announced June 2018.