-
SCKansformer: Fine-Grained Classification of Bone Marrow Cells via Kansformer Backbone and Hierarchical Attention Mechanisms
Authors:
Yifei Chen,
Zhu Zhu,
Shenghao Zhu,
Linwei Qiu,
Binfeng Zou,
Fan Jia,
Yunpeng Zhu,
Chenyan Zhang,
Zhaojie Fang,
Feiwei Qin,
** Fan,
Changmiao Wang,
Yu Gao,
Gang Yu
Abstract:
The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redund…
▽ More
The incidence and mortality rates of malignant tumors, such as acute leukemia, have risen significantly. Clinically, hospitals rely on cytological examination of peripheral blood and bone marrow smears to diagnose malignant tumors, with accurate blood cell counting being crucial. Existing automated methods face challenges such as low feature expression capability, poor interpretability, and redundant feature extraction when processing high-dimensional microimage data. We propose a novel fine-grained classification model, SCKansformer, for bone marrow blood cells, which addresses these challenges and enhances classification accuracy and efficiency. The model integrates the Kansformer Encoder, SCConv Encoder, and Global-Local Attention Encoder. The Kansformer Encoder replaces the traditional MLP layer with the KAN, improving nonlinear feature representation and interpretability. The SCConv Encoder, with its Spatial and Channel Reconstruction Units, enhances feature representation and reduces redundancy. The Global-Local Attention Encoder combines Multi-head Self-Attention with a Local Part module to capture both global and local features. We validated our model using the Bone Marrow Blood Cell Fine-Grained Classification Dataset (BMCD-FGCD), comprising over 10,000 samples and nearly 40 classifications, developed with a partner hospital. Comparative experiments on our private dataset, as well as the publicly available PBC and ALL-IDB datasets, demonstrate that SCKansformer outperforms both typical and advanced microcell classification methods across all datasets. Our source code and private BMCD-FGCD dataset are available at https://github.com/JustlfC03/SCKansformer.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Rough Set improved Therapy-Based Metaverse Assisting System
Authors:
** Cao,
Yanhui Jiang,
Chang Yu,
Feiwei Qin,
Zekun Jiang
Abstract:
Chronic neck and shoulder pain (CNSP) is a major global public health issue. Traditional treatments like physiotherapy and rehabilitation have drawbacks, including high costs, low precision, and user discomfort. This paper presents an interactive system based on Cognitive Therapy Theory (CBT) for CNSP treatment. The system includes a pain detection module using EMG and IMU to monitor pain and opti…
▽ More
Chronic neck and shoulder pain (CNSP) is a major global public health issue. Traditional treatments like physiotherapy and rehabilitation have drawbacks, including high costs, low precision, and user discomfort. This paper presents an interactive system based on Cognitive Therapy Theory (CBT) for CNSP treatment. The system includes a pain detection module using EMG and IMU to monitor pain and optimize data with Rough Set theory, and a cognitive therapy module that processes this data further for CBT-based interventions, including massage and heating therapy. An experimental plan is outlined to evaluate the system's effectiveness and performance. The goal is to create an accessible device for CNSP therapy. Additionally, the paper explores the system's application in a metaverse environment, enhancing treatment immersion and personalization. The metaverse platform simulates treatment environments and responds to real-time patient data, allowing for continuous monitoring and adjustment of treatment plans.
△ Less
Submitted 10 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Infrared Image Super-Resolution via Lightweight Information Split Network
Authors:
Shijie Liu,
Kang Yan,
Feiwei Qin,
Changmiao Wang,
Ruiquan Ge,
Kai Zhang,
Jie Huang,
Yong Peng,
** Cao
Abstract:
Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory…
▽ More
Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory demands. These issues become particularly pronounced in the context of infrared image SR, where infrared devices often have stringent storage and computational constraints. To mitigate these challenges, we introduce a novel, efficient, and precise single infrared image SR model, termed the Lightweight Information Split Network (LISN). The LISN comprises four main components: shallow feature extraction, deep feature extraction, dense feature fusion, and high-resolution infrared image reconstruction. A key innovation within this model is the introduction of the Lightweight Information Split Block (LISB) for deep feature extraction. The LISB employs a sequential process to extract hierarchical features, which are then aggregated based on the relevance of the features under consideration. By integrating channel splitting and shift operations, the LISB successfully strikes an optimal balance between enhanced SR performance and a lightweight framework. Comprehensive experimental evaluations reveal that the proposed LISN achieves superior performance over contemporary state-of-the-art methods in terms of both SR quality and model complexity, affirming its efficacy for practical deployment in resource-constrained infrared imaging applications.
△ Less
Submitted 27 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
VQ-NeRV: A Vector Quantized Neural Representation for Videos
Authors:
Yunjie Xu,
Xiang Feng,
Feiwei Qin,
Ruiquan Ge,
Yong Peng,
Changmiao Wang
Abstract:
Implicit neural representations (INR) excel in encoding videos within neural networks, showcasing promise in computer vision tasks like video compression and denoising. INR-based approaches reconstruct video frames from content-agnostic embeddings, which hampers their efficacy in video frame regression and restricts their generalization ability for video interpolation. To address these deficiencie…
▽ More
Implicit neural representations (INR) excel in encoding videos within neural networks, showcasing promise in computer vision tasks like video compression and denoising. INR-based approaches reconstruct video frames from content-agnostic embeddings, which hampers their efficacy in video frame regression and restricts their generalization ability for video interpolation. To address these deficiencies, Hybrid Neural Representation for Videos (HNeRV) was introduced with content-adaptive embeddings. Nevertheless, HNeRV's compression ratios remain relatively low, attributable to an oversight in leveraging the network's shallow features and inter-frame residual information. In this work, we introduce an advanced U-shaped architecture, Vector Quantized-NeRV (VQ-NeRV), which integrates a novel component--the VQ-NeRV Block. This block incorporates a codebook mechanism to discretize the network's shallow residual features and inter-frame residual information effectively. This approach proves particularly advantageous in video compression, as it results in smaller size compared to quantized features. Furthermore, we introduce an original codebook optimization technique, termed shallow codebook optimization, designed to refine the utility and efficiency of the codebook. The experimental evaluations indicate that VQ-NeRV outperforms HNeRV on video regression tasks, delivering superior reconstruction quality (with an increase of 1-2 dB in Peak Signal-to-Noise Ratio (PSNR)), better bit per pixel (bpp) efficiency, and improved video inpainting outcomes.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
PE-MVCNet: Multi-view and Cross-modal Fusion Network for Pulmonary Embolism Prediction
Authors:
Zhaoxin Guo,
Zhipeng Wang,
Ruiquan Ge,
Jianxun Yu,
Feiwei Qin,
Yuan Tian,
Yuqing Peng,
Yonghong Li,
Changmiao Wang
Abstract:
The early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates. Both image-based and non-image-based features are of utmost importance in medical classification tasks. In a clinical setting, physicians tend to rely on the contextual information provided by Electronic Medical Records (EMR) to interpret medical imaging. However, very few models effectively integrat…
▽ More
The early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates. Both image-based and non-image-based features are of utmost importance in medical classification tasks. In a clinical setting, physicians tend to rely on the contextual information provided by Electronic Medical Records (EMR) to interpret medical imaging. However, very few models effectively integrate clinical information with imaging data. To address this shortcoming, we suggest a multimodal fusion methodology, termed PE-MVCNet, which capitalizes on Computed Tomography Pulmonary Angiography imaging and EMR data. This method comprises the Image-only module with an integrated multi-view block, the EMR-only module, and the Cross-modal Attention Fusion (CMAF) module. These modules cooperate to extract comprehensive features that subsequently generate predictions for PE. We conducted experiments using the publicly accessible Stanford University Medical Center dataset, achieving an AUROC of 94.1%, an accuracy rate of 90.2%, and an F1 score of 90.6%. Our proposed model outperforms existing methodologies, corroborating that our multimodal fusion model excels compared to models that use a single data modality. Our source code is available at https://github.com/LeavingStarW/PE-MVCNET.
△ Less
Submitted 17 April, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
SRNDiff: Short-term Rainfall Nowcasting with Condition Diffusion Model
Authors:
Xudong Ling,
Chaorong Li,
Fengqing Qin,
Peng Yang,
Yuanyuan Huang
Abstract:
Diffusion models are widely used in image generation because they can generate high-quality and realistic samples. This is in contrast to generative adversarial networks (GANs) and variational autoencoders (VAEs), which have some limitations in terms of image quality.We introduce the diffusion model to the precipitation forecasting task and propose a short-term precipitation nowcasting with condit…
▽ More
Diffusion models are widely used in image generation because they can generate high-quality and realistic samples. This is in contrast to generative adversarial networks (GANs) and variational autoencoders (VAEs), which have some limitations in terms of image quality.We introduce the diffusion model to the precipitation forecasting task and propose a short-term precipitation nowcasting with condition diffusion model based on historical observational data, which is referred to as SRNDiff. By incorporating an additional conditional decoder module in the denoising process, SRNDiff achieves end-to-end conditional rainfall prediction. SRNDiff is composed of two networks: a denoising network and a conditional Encoder network. The conditional network is composed of multiple independent UNet networks. These networks extract conditional feature maps at different resolutions, providing accurate conditional information that guides the diffusion model for conditional generation.SRNDiff surpasses GANs in terms of prediction accuracy, although it requires more computational resources.The SRNDiff model exhibits higher stability and efficiency during training than GANs-based approaches, and generates high-quality precipitation distribution samples that better reflect future actual precipitation conditions. This fully validates the advantages and potential of diffusion models in precipitation forecasting, providing new insights for enhancing rainfall prediction.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Two-stage Rainfall-Forecasting Diffusion Model
Authors:
XuDong Ling,
ChaoRong Li,
FengQing Qin,
LiHong Zhu,
Yuanyuan Huang
Abstract:
Deep neural networks have made great achievements in rainfall prediction.However, the current forecasting methods have certain limitations, such as with blurry generated images and incorrect spatial positions. To overcome these challenges, we propose a Two-stage Rainfall-Forecasting Diffusion Model (TRDM) aimed at improving the accuracy of long-term rainfall forecasts and addressing the imbalance…
▽ More
Deep neural networks have made great achievements in rainfall prediction.However, the current forecasting methods have certain limitations, such as with blurry generated images and incorrect spatial positions. To overcome these challenges, we propose a Two-stage Rainfall-Forecasting Diffusion Model (TRDM) aimed at improving the accuracy of long-term rainfall forecasts and addressing the imbalance in performance between temporal and spatial modeling. TRDM is a two-stage method for rainfall prediction tasks. The task of the first stage is to capture robust temporal information while preserving spatial information under low-resolution conditions. The task of the second stage is to reconstruct the low-resolution images generated in the first stage into high-resolution images. We demonstrate state-of-the-art results on the MRMS and Swedish radar datasets. Our project is open source and available on GitHub at: \href{https://github.com/clearlyzerolxd/TRDM}{https://github.com/clearlyzerolxd/TRDM}.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Semi-supervised Medical Image Segmentation Method Based on Cross-pseudo Labeling Leveraging Strong and Weak Data Augmentation Strategies
Authors:
Yifei Chen,
Chenyan Zhang,
Yifan Ke,
Yiyu Huang,
Xuezhou Dai,
Feiwei Qin,
Yongquan Zhang,
Xiaodong Zhang,
Changmiao Wang
Abstract:
Traditional supervised learning methods have historically encountered certain constraints in medical image segmentation due to the challenging collection process, high labeling cost, low signal-to-noise ratio, and complex features characterizing biomedical images. This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept. This significantly enhances t…
▽ More
Traditional supervised learning methods have historically encountered certain constraints in medical image segmentation due to the challenging collection process, high labeling cost, low signal-to-noise ratio, and complex features characterizing biomedical images. This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept. This significantly enhances the model's performance and generalizability through data augmentation processing, employing varied strategies for unlabeled data. Concurrently, the model design gives appropriate emphasis to the generation, filtration, and refinement processes of pseudo-labels. The novel concept of cross-pseudo-supervision is introduced, integrating consistency learning with self-training. This enables the model to fully leverage pseudo-labels from multiple perspectives, thereby enhancing training diversity. The DFCPS model is compared with both baseline and advanced models using the publicly accessible Kvasir-SEG dataset. Across all four subdivisions containing different proportions of unlabeled data, our model consistently exhibits superior performance. Our source code is available at https://github.com/JustlfC03/DFCPS.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
LKFormer: Large Kernel Transformer for Infrared Image Super-Resolution
Authors:
Feiwei Qin,
Kang Yan,
Changmiao Wang,
Ruiquan Ge,
Yong Peng,
Kai Zhang
Abstract:
Given the broad application of infrared technology across diverse fields, there is an increasing emphasis on investigating super-resolution techniques for infrared images within the realm of deep learning. Despite the impressive results of current Transformer-based methods in image super-resolution tasks, their reliance on the self-attentive mechanism intrinsic to the Transformer architecture resu…
▽ More
Given the broad application of infrared technology across diverse fields, there is an increasing emphasis on investigating super-resolution techniques for infrared images within the realm of deep learning. Despite the impressive results of current Transformer-based methods in image super-resolution tasks, their reliance on the self-attentive mechanism intrinsic to the Transformer architecture results in images being treated as one-dimensional sequences, thereby neglecting their inherent two-dimensional structure. Moreover, infrared images exhibit a uniform pixel distribution and a limited gradient range, posing challenges for the model to capture effective feature information. Consequently, we suggest a potent Transformer model, termed Large Kernel Transformer (LKFormer), to address this issue. Specifically, we have designed a Large Kernel Residual Attention (LKRA) module with linear complexity. This mainly employs depth-wise convolution with large kernels to execute non-local feature modeling, thereby substituting the standard self-attentive layer. Additionally, we have devised a novel feed-forward network structure called Gated-Pixel Feed-Forward Network (GPFN) to augment the LKFormer's capacity to manage the information flow within the network. Comprehensive experimental results reveal that our method surpasses the most advanced techniques available, using fewer parameters and yielding considerably superior performance.The source code will be available at https://github.com/sad192/large-kernel-Transformer.
△ Less
Submitted 24 January, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Accurate Leukocyte Detection Based on Deformable-DETR and Multi-Level Feature Fusion for Aiding Diagnosis of Blood Diseases
Authors:
Yifei Chen,
Chenyan Zhang,
Ben Chen,
Yiyu Huang,
Yifei Sun,
Changmiao Wang,
Xianjun Fu,
Yuxing Dai,
Feiwei Qin,
Yong Peng,
Yu Gao
Abstract:
In standard hospital blood tests, the traditional process requires doctors to manually isolate leukocytes from microscopic images of patients' blood using microscopes. These isolated leukocytes are then categorized via automatic leukocyte classifiers to determine the proportion and volume of different types of leukocytes present in the blood samples, aiding disease diagnosis. This methodology is n…
▽ More
In standard hospital blood tests, the traditional process requires doctors to manually isolate leukocytes from microscopic images of patients' blood using microscopes. These isolated leukocytes are then categorized via automatic leukocyte classifiers to determine the proportion and volume of different types of leukocytes present in the blood samples, aiding disease diagnosis. This methodology is not only time-consuming and labor-intensive, but it also has a high propensity for errors due to factors such as image quality and environmental conditions, which could potentially lead to incorrect subsequent classifications and misdiagnosis. To address these issues, this paper proposes an innovative method of leukocyte detection: the Multi-level Feature Fusion and Deformable Self-attention DETR (MFDS-DETR). To tackle the issue of leukocyte scale disparity, we designed the High-level Screening-feature Fusion Pyramid (HS-FPN), enabling multi-level fusion. This model uses high-level features as weights to filter low-level feature information via a channel attention module and then merges the screened information with the high-level features, thus enhancing the model's feature expression capability. Further, we address the issue of leukocyte feature scarcity by incorporating a multi-scale deformable self-attention module in the encoder and using the self-attention and cross-deformable attention mechanisms in the decoder, which aids in the extraction of the global features of the leukocyte feature maps. The effectiveness, superiority, and generalizability of the proposed MFDS-DETR method are confirmed through comparisons with other cutting-edge leukocyte detection models using the private WBCDD, public LISC and BCCD datasets. Our source code and private WBCCD dataset are available at https://github.com/JustlfC03/MFDS-DETR.
△ Less
Submitted 10 January, 2024; v1 submitted 1 January, 2024;
originally announced January 2024.
-
SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation
Authors:
Yifei Chen,
Binfeng Zou,
Zhaoxin Guo,
Yiyu Huang,
Yifan Huang,
Feiwei Qin,
Qinhai Li,
Changmiao Wang
Abstract:
Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can p…
▽ More
Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can produce noises similar to PE, making confirmation of its presence time-consuming and prone to overdiagnosis. Nevertheless, the traditional segmentation method of PE can not fully consider the hierarchical structure of features, local and global spatial features of PE CT images. In this paper, we propose an automatic PE segmentation method called SCUNet++ (Swin Conv UNet++). This method incorporates multiple fusion dense skip connections between the encoder and decoder, utilizing the Swin Transformer as the encoder. And fuses features of different scales in the decoder subnetwork to compensate for spatial information loss caused by the inevitable downsampling in Swin-UNet or other state-of-the-art methods, effectively solving the above problem. We provide a theoretical analysis of this method in detail and validate it on publicly available PE CT image datasets FUMPE and CAD-PE. The experimental results indicate that our proposed method achieved a Dice similarity coefficient (DSC) of 83.47% and a Hausdorff distance 95th percentile (HD95) of 3.83 on the FUMPE dataset, as well as a DSC of 83.42% and an HD95 of 5.10 on the CAD-PE dataset. These findings demonstrate that our method exhibits strong performance in PE segmentation tasks, potentially enhancing the accuracy of automatic segmentation of PE and providing a powerful diagnostic tool for clinical physicians. Our source code and new FUMPE dataset are available at https://github.com/JustlfC03/SCUNet-plusplus.
△ Less
Submitted 2 January, 2024; v1 submitted 22 December, 2023;
originally announced December 2023.
-
ZS-SRT: An Efficient Zero-Shot Super-Resolution Training Method for Neural Radiance Fields
Authors:
Xiang Feng,
Yongbo He,
Yubo Wang,
Chengkai Wang,
Zhenzhong Kuang,
Jiajun Ding,
Feiwei Qin,
Jun Yu,
Jian** Fan
Abstract:
Neural Radiance Fields (NeRF) have achieved great success in the task of synthesizing novel views that preserve the same resolution as the training views. However, it is challenging for NeRF to synthesize high-quality high-resolution novel views with low-resolution training data. To solve this problem, we propose a zero-shot super-resolution training framework for NeRF. This framework aims to guid…
▽ More
Neural Radiance Fields (NeRF) have achieved great success in the task of synthesizing novel views that preserve the same resolution as the training views. However, it is challenging for NeRF to synthesize high-quality high-resolution novel views with low-resolution training data. To solve this problem, we propose a zero-shot super-resolution training framework for NeRF. This framework aims to guide the NeRF model to synthesize high-resolution novel views via single-scene internal learning rather than requiring any external high-resolution training data. Our approach consists of two stages. First, we learn a scene-specific degradation map** by performing internal learning on a pretrained low-resolution coarse NeRF. Second, we optimize a super-resolution fine NeRF by conducting inverse rendering with our map** function so as to backpropagate the gradients from low-resolution 2D space into the super-resolution 3D sampling space. Then, we further introduce a temporal ensemble strategy in the inference phase to compensate for the scene estimation errors. Our method is featured on two points: (1) it does not consume high-resolution views or additional scene data to train super-resolution NeRF; (2) it can speed up the training process by adopting a coarse-to-fine strategy. By conducting extensive experiments on public datasets, we have qualitatively and quantitatively demonstrated the effectiveness of our method.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT
Authors:
Fangbo Qin,
Taogang Hou,
Shan Lin,
Kaiyuan Wang,
Michael C. Yip,
Shan Yu
Abstract:
Towards flexible object-centric visual perception, we propose a one-shot instance-aware object keypoint (OKP) extraction approach, AnyOKP, which leverages the powerful representation ability of pretrained vision transformer (ViT), and can obtain keypoints on multiple object instances of arbitrary category after learning from a support image. An off-the-shelf petrained ViT is directly deployed for…
▽ More
Towards flexible object-centric visual perception, we propose a one-shot instance-aware object keypoint (OKP) extraction approach, AnyOKP, which leverages the powerful representation ability of pretrained vision transformer (ViT), and can obtain keypoints on multiple object instances of arbitrary category after learning from a support image. An off-the-shelf petrained ViT is directly deployed for generalizable and transferable feature extraction, which is followed by training-free feature enhancement. The best-prototype pairs (BPPs) are searched for in support and query images based on appearance similarity, to yield instance-unaware candidate keypoints.Then, the entire graph with all candidate keypoints as vertices are divided to sub-graphs according to the feature distributions on the graph edges. Finally, each sub-graph represents an object instance. AnyOKP is evaluated on real object images collected with the cameras of a robot arm, a mobile robot, and a surgical robot, which not only demonstrates the cross-category flexibility and instance awareness, but also show remarkable robustness to domain shift and viewpoint change.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks
Authors:
Xiaohui Wan,
Zheng Zheng,
Fangyun Qin,
Xuhui Lu
Abstract:
Defect prediction is crucial for software quality assurance and has been extensively researched over recent decades. However, prior studies rarely focus on data complexity in defect prediction tasks, and even less on understanding the difficulties of these tasks from the perspective of data complexity. In this paper, we conduct an empirical study to estimate the hardness of over 33,000 instances,…
▽ More
Defect prediction is crucial for software quality assurance and has been extensively researched over recent decades. However, prior studies rarely focus on data complexity in defect prediction tasks, and even less on understanding the difficulties of these tasks from the perspective of data complexity. In this paper, we conduct an empirical study to estimate the hardness of over 33,000 instances, employing a set of measures to characterize the inherent difficulty of instances and the characteristics of defect datasets. Our findings indicate that: (1) instance hardness in both classes displays a right-skewed distribution, with the defective class exhibiting a more scattered distribution; (2) class overlap is the primary factor influencing instance hardness and can be characterized through feature, structural, instance, and multiresolution overlap; (3) no universal preprocessing technique is applicable to all datasets, and it may not consistently reduce data complexity, fortunately, dataset complexity measures can help identify suitable techniques for specific datasets; (4) integrating data complexity information into the learning process can enhance an algorithm's learning capacity. In summary, this empirical study highlights the crucial role of data complexity in defect prediction tasks, and provides a novel perspective for advancing research in defect prediction techniques.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
SA4U: Practical Static Analysis for Unit Type Error Detection
Authors:
Max Taylor,
Johnathon Aurand,
Feng Qin,
Xiaorui Wang,
Brandon Henry,
Xiangyu Zhang
Abstract:
Unit type errors, where values with physical unit types (e.g., meters, hours) are used incorrectly in a computation, are common in today's unmanned aerial system (UAS) firmware. Recent studies show that unit type errors represent over 10% of bugs in UAS firmware. Moreover, the consequences of unit type errors are severe. Over 30% of unit type errors cause UAS crashes. This paper proposes SA4U: a p…
▽ More
Unit type errors, where values with physical unit types (e.g., meters, hours) are used incorrectly in a computation, are common in today's unmanned aerial system (UAS) firmware. Recent studies show that unit type errors represent over 10% of bugs in UAS firmware. Moreover, the consequences of unit type errors are severe. Over 30% of unit type errors cause UAS crashes. This paper proposes SA4U: a practical system for detecting unit type errors in real-world UAS firmware. SA4U requires no modifications to firmware or developer annotations. It deduces the unit types of program variables by analyzing simulation traces and protocol definitions. SA4U uses the deduced unit types to identify when unit type errors occur. SA4U is effective: it identified 14 previously undetected bugs in two popular open-source firmware (ArduPilot & PX4.)
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
MVSTER: Epipolar Transformer for Efficient Multi-View Stereo
Authors:
Xiaofeng Wang,
Zheng Zhu,
Fangbo Qin,
Yun Ye,
Guan Huang,
Xu Chi,
Yijia He,
Xingang Wang
Abstract:
Learning-based Multi-View Stereo (MVS) methods warp source images into the reference camera frustum to form 3D volumes, which are fused as a cost volume to be regularized by subsequent networks. The fusing step plays a vital role in bridging 2D semantics and 3D spatial associations. However, previous methods utilize extra networks to learn 2D information as fusing cues, underusing 3D spatial corre…
▽ More
Learning-based Multi-View Stereo (MVS) methods warp source images into the reference camera frustum to form 3D volumes, which are fused as a cost volume to be regularized by subsequent networks. The fusing step plays a vital role in bridging 2D semantics and 3D spatial associations. However, previous methods utilize extra networks to learn 2D information as fusing cues, underusing 3D spatial correlations and bringing additional computation costs. Therefore, we present MVSTER, which leverages the proposed epipolar Transformer to learn both 2D semantics and 3D spatial associations efficiently. Specifically, the epipolar Transformer utilizes a detachable monocular depth estimator to enhance 2D semantics and uses cross-attention to construct data-dependent 3D associations along epipolar line. Additionally, MVSTER is built in a cascade structure, where entropy-regularized optimal transport is leveraged to propagate finer depth estimations in each stage. Extensive experiments show MVSTER achieves state-of-the-art reconstruction performance with significantly higher efficiency: Compared with MVSNet and CasMVSNet, our MVSTER achieves 34% and 14% relative improvements on the DTU benchmark, with 80% and 51% relative reductions in running time. MVSTER also ranks first on Tanks&Temples-Advanced among all published works. Code is released at https://github.com/JeffWang987.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
Practical Issues and Challenges in CSI-based Integrated Sensing and Communication
Authors:
Daqing Zhang,
Dan Wu,
Kai Niu,
Xuanzhi Wang,
Fusang Zhang,
Jian Yao,
Dajie Jiang,
Fei Qin
Abstract:
Next-generation mobile communication network (i.e., 6G) has been envisioned to go beyond classical communication functionality and provide integrated sensing and communication (ISAC) capability to enable more emerging applications, such as smart cities, connected vehicles, AIoT and health care/elder care. Among all the ISAC proposals, the most practical and promising approach is to empower existin…
▽ More
Next-generation mobile communication network (i.e., 6G) has been envisioned to go beyond classical communication functionality and provide integrated sensing and communication (ISAC) capability to enable more emerging applications, such as smart cities, connected vehicles, AIoT and health care/elder care. Among all the ISAC proposals, the most practical and promising approach is to empower existing wireless network (e.g., WiFi, 4G/5G) with the augmented ability to sense the surrounding human and environment, and evolve wireless communication networks into intelligent communication and sensing network (e.g., 6G). In this paper, based on our experience on CSI-based wireless sensing with WiFi/4G/5G signals, we intend to identify ten major practical and theoretical problems that hinder real deployment of ISAC applications, and provide possible solutions to those critical challenges. Hopefully, this work will inspire further research to evolve existing WiFi/4G/5G networks into next-generation intelligent wireless network (i.e., 6G).
△ Less
Submitted 17 March, 2022;
originally announced April 2022.
-
CEKD:Cross Ensemble Knowledge Distillation for Augmented Fine-grained Data
Authors:
Ke Zhang,
** Fan,
Shaoli Huang,
Yongliang Qiao,
Xiaofeng Yu,
Feiwei Qin
Abstract:
Data augmentation has been proved effective in training deep models. Existing data augmentation methods tackle the fine-grained problem by blending image pairs and fusing corresponding labels according to the statistics of mixed pixels, which produces additional noise harmful to the performance of networks. Motivated by this, we present a simple yet effective cross ensemble knowledge distillation…
▽ More
Data augmentation has been proved effective in training deep models. Existing data augmentation methods tackle the fine-grained problem by blending image pairs and fusing corresponding labels according to the statistics of mixed pixels, which produces additional noise harmful to the performance of networks. Motivated by this, we present a simple yet effective cross ensemble knowledge distillation (CEKD) model for fine-grained feature learning. We innovatively propose a cross distillation module to provide additional supervision to alleviate the noise problem, and propose a collaborative ensemble module to overcome the target conflict problem. The proposed model can be trained in an end-to-end manner, and only requires image-level label supervision. Extensive experiments on widely used fine-grained benchmarks demonstrate the effectiveness of our proposed model. Specifically, with the backbone of ResNet-101, CEKD obtains the accuracy of 89.59%, 95.96% and 94.56% in three datasets respectively, outperforming state-of-the-art API-Net by 0.99%, 1.06% and 1.16%.
△ Less
Submitted 12 March, 2022;
originally announced March 2022.
-
Improving Early Sepsis Prediction with Multi Modal Learning
Authors:
Fred Qin,
Vivek Madan,
Ujjwal Ratan,
Zohar Karnin,
Vishaal Kapoor,
Parminder Bhatia,
Taha Kass-Hout
Abstract:
Sepsis is a life-threatening disease with high morbidity, mortality and healthcare costs. The early prediction and administration of antibiotics and intravenous fluids is considered crucial for the treatment of sepsis and can save potentially millions of lives and billions in health care costs. Professional clinical care practitioners have proposed clinical criterion which aid in early detection o…
▽ More
Sepsis is a life-threatening disease with high morbidity, mortality and healthcare costs. The early prediction and administration of antibiotics and intravenous fluids is considered crucial for the treatment of sepsis and can save potentially millions of lives and billions in health care costs. Professional clinical care practitioners have proposed clinical criterion which aid in early detection of sepsis; however, performance of these criterion is often limited. Clinical text provides essential information to estimate the severity of the sepsis in addition to structured clinical data. In this study, we explore how clinical text can complement structured data towards early sepsis prediction task. In this paper, we propose multi modal model which incorporates both structured data in the form of patient measurements as well as textual notes on the patient. We employ state-of-the-art NLP models such as BERT and a highly specialized NLP model in Amazon Comprehend Medical to represent the text. On the MIMIC-III dataset containing records of ICU admissions, we show that by using these notes, one achieves an improvement of 6.07 points in a standard utility score for Sepsis prediction and 2.89% in AUROC score. Our methods significantly outperforms a clinical criteria suggested by experts, qSOFA, as well as the winning model of the PhysioNet Computing in Cardiology Challenge for predicting Sepsis.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Deep Learning Based OFDM Channel Estimation Using Frequency-Time Division and Attention Mechanism
Authors:
Ang Yang,
Peng Sun,
Tamrakar Rakesh,
Bule Sun,
Fei Qin
Abstract:
In this paper, we propose a frequency-time division network (FreqTimeNet) to improve the performance of deep learning (DL) based OFDM channel estimation. This FreqTimeNet is designed based on the orthogonality between the frequency domain and the time domain. In FreqTimeNet, the input is processed by parallel frequency blocks and parallel time blocks sequentially. By introducing the attention mech…
▽ More
In this paper, we propose a frequency-time division network (FreqTimeNet) to improve the performance of deep learning (DL) based OFDM channel estimation. This FreqTimeNet is designed based on the orthogonality between the frequency domain and the time domain. In FreqTimeNet, the input is processed by parallel frequency blocks and parallel time blocks sequentially. By introducing the attention mechanism using the SNR information, an attention based FreqTimeNet (AttenFreqTimeNet) is proposed. Using 3rd Generation Partnership Project (3GPP) channel models, the mean square error (MSE) performance of FreqTimeNet and AttenFreqTimeNet under different scenarios is evaluated. A method for constructing mixed training data is proposed, which could address the generalization problem in DL. It is observed that AttenFreqTimeNet outperforms FreqTimeNet, and FreqTimeNet outperforms other DL networks with reasonable complexity.
△ Less
Submitted 30 September, 2021; v1 submitted 15 July, 2021;
originally announced July 2021.
-
Avis: In-Situ Model Checking for Unmanned Aerial Vehicles
Authors:
Max Taylor,
Haicheng Chen,
Feng Qin,
Christopher Stewart
Abstract:
Control firmware in unmanned aerial vehicles (UAVs) uses sensors to model and manage flight operations, from takeoff to landing to flying between waypoints. However, sensors can fail at any time during a flight. If control firmware mishandles sensor failures, UAVs can crash, fly away, or suffer other unsafe conditions. In-situ model checking finds sensor failures that could lead to unsafe conditio…
▽ More
Control firmware in unmanned aerial vehicles (UAVs) uses sensors to model and manage flight operations, from takeoff to landing to flying between waypoints. However, sensors can fail at any time during a flight. If control firmware mishandles sensor failures, UAVs can crash, fly away, or suffer other unsafe conditions. In-situ model checking finds sensor failures that could lead to unsafe conditions by systematically failing sensors. However, the type of sensor failure and its timing within a flight affect its manifestation, creating a large search space. We propose Avis, an in-situ model checker to quickly uncover UAV sensor failures that lead to unsafe conditions. Widely used control firmware already support operating modes. Avis injects sensor failures as the control firmware transitions between modes - a key execution point where mishandled software exceptions can trigger unsafe conditions. We implemented Avis and applied it to ArduPilot and PX4. Avis found unsafe conditions 2.4X faster than Bayesian Fault Injection, the leading, state-of-the-art approach. Within the current code base of ArduPilot and PX4, Avis discovered 10 previously unknown software bugs that lead to unsafe conditions. Additionally, we reinserted 5 known bugs that caused serious, unsafe conditions and Avis correctly reported all of them.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Recurrent Neural Network from Adder's Perspective: Carry-lookahead RNN
Authors:
Haowei Jiang,
Feiwei Qin,
** Cao,
Yong Peng,
Yanli Shao
Abstract:
The recurrent network architecture is a widely used model in sequence modeling, but its serial dependency hinders the computation parallelization, which makes the operation inefficient. The same problem was encountered in serial adder at the early stage of digital electronics. In this paper, we discuss the similarities between recurrent neural network (RNN) and serial adder. Inspired by carry-look…
▽ More
The recurrent network architecture is a widely used model in sequence modeling, but its serial dependency hinders the computation parallelization, which makes the operation inefficient. The same problem was encountered in serial adder at the early stage of digital electronics. In this paper, we discuss the similarities between recurrent neural network (RNN) and serial adder. Inspired by carry-lookahead adder, we introduce carry-lookahead module to RNN, which makes it possible for RNN to run in parallel. Then, we design the method of parallel RNN computation, and finally Carry-lookahead RNN (CL-RNN) is proposed. CL-RNN takes advantages in parallelism and flexible receptive field. Through a comprehensive set of tests, we verify that CL-RNN can perform better than existing typical RNNs in sequence modeling tasks which are specially designed for RNNs.
△ Less
Submitted 24 August, 2021; v1 submitted 22 June, 2021;
originally announced June 2021.
-
ELSD: Efficient Line Segment Detector and Descriptor
Authors:
Haotian Zhang,
Yicheng Luo,
Fangbo Qin,
Yijia He,
Xiao Liu
Abstract:
We present the novel Efficient Line Segment Detector and Descriptor (ELSD) to simultaneously detect line segments and extract their descriptors in an image. Unlike the traditional pipelines that conduct detection and description separately, ELSD utilizes a shared feature extractor for both detection and description, to provide the essential line features to the higher-level tasks like SLAM and ima…
▽ More
We present the novel Efficient Line Segment Detector and Descriptor (ELSD) to simultaneously detect line segments and extract their descriptors in an image. Unlike the traditional pipelines that conduct detection and description separately, ELSD utilizes a shared feature extractor for both detection and description, to provide the essential line features to the higher-level tasks like SLAM and image matching in real time. First, we design the one-stage compact model, and propose to use the mid-point, angle and length as the minimal representation of line segment, which also guarantees the center-symmetry. The non-centerness suppression is proposed to filter out the fragmented line segments caused by lines' intersections. The fine offset prediction is designed to refine the mid-point localization. Second, the line descriptor branch is integrated with the detector branch, and the two branches are jointly trained in an end-to-end manner. In the experiments, the proposed ELSD achieves the state-of-the-art performance on the Wireframe dataset and YorkUrban dataset, in both accuracy and efficiency. The line description ability of ELSD also outperforms the previous works on the line matching task.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Sequential Random Network for Fine-grained Image Classification
Authors:
Chaorong Li,
Malu Zhang,
Wei Huang,
Fengqing Qin,
An** Zeng,
Yuanyuan Huang
Abstract:
Deep Convolutional Neural Network (DCNN) and Transformer have achieved remarkable successes in image recognition. However, their performance in fine-grained image recognition is still difficult to meet the requirements of actual needs. This paper proposes a Sequence Random Network (SRN) to enhance the performance of DCNN. The output of DCNN is one-dimensional features. This one-dimensional feature…
▽ More
Deep Convolutional Neural Network (DCNN) and Transformer have achieved remarkable successes in image recognition. However, their performance in fine-grained image recognition is still difficult to meet the requirements of actual needs. This paper proposes a Sequence Random Network (SRN) to enhance the performance of DCNN. The output of DCNN is one-dimensional features. This one-dimensional feature abstractly represents image information, but it does not express well the detailed information of image. To address this issue, we use the proposed SRN which composed of BiLSTM and several Tanh-Dropout blocks (called BiLSTM-TDN), to further process DCNN one-dimensional features for highlighting the detail information of image. After the feature transform by BiLSTM-TDN, the recognition performance has been greatly improved. We conducted the experiments on six fine-grained image datasets. Except for FGVC-Aircraft, the accuracies of the proposed methods on the other datasets exceeded 99%. Experimental results show that BiLSTM-TDN is far superior to the existing state-of-the-art methods. In addition to DCNN, BiLSTM-TDN can also be extended to other models, such as Transformer.
△ Less
Submitted 7 June, 2021; v1 submitted 12 March, 2021;
originally announced March 2021.
-
Inertial based Integration with Transformed INS Mechanization in Earth Frame
Authors:
Lubin Chang,
**gbo Di,
Fangjun Qin
Abstract:
This paper proposes to use a newly-derived transformed inertial navigation system (INS) mechanization to fuse INS with other complementary navigation systems. Through formulating the attitude, velocity and position as one group state of group of double direct spatial isometries SE2(3), the transformed INS mechanization has proven to be group affine, which means that the corresponding vector error…
▽ More
This paper proposes to use a newly-derived transformed inertial navigation system (INS) mechanization to fuse INS with other complementary navigation systems. Through formulating the attitude, velocity and position as one group state of group of double direct spatial isometries SE2(3), the transformed INS mechanization has proven to be group affine, which means that the corresponding vector error state model will be trajectory-independent. In order to make use of the transformed INS mechanization in inertial based integration, both the right and left vector error state models are derived. The INS/GPS and INS/Odometer integration are investigated as two representatives of inertial based integration. Some application aspects of the derived error state models in the two applications are presented, which include how to select the error state model, initialization of the SE2(3) based error state covariance and feedback correction corresponding to the error state definitions. Extensive Monte Carlo simulations and land vehicle experiments are conducted to evaluate the performance of the derived error state models. It is shown that the most striking superiority of using the derived error state models is their ability to handle the large initial attitude misalignments, which is just the result of log-linearity property of the derived error state models. Therefore, the derived error state models can be used in the so-called attitude alignment for the two applications. Moreover, the derived right error state-space model is also very preferred for long-endurance INS/Odometer integration due to the filtering consistency caused by its less dependence on the global state estimate.
△ Less
Submitted 17 March, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Strapdown Inertial Navigation System Initial Alignment based on Group of Double Direct Spatial Isometries
Authors:
Lubin Chang,
Fangjun Qin,
Jiangning Xu
Abstract:
The task of strapdown inertial navigation system (SINS) initial alignment is to calculate the attitude transformation matrix from body frame to navigation frame. In this paper, such attitude transformation matrix is divided into two parts through introducing the initial inertially fixed navigation frame as inertial frame. The attitude changes of the navigation frame corresponding to the defined in…
▽ More
The task of strapdown inertial navigation system (SINS) initial alignment is to calculate the attitude transformation matrix from body frame to navigation frame. In this paper, such attitude transformation matrix is divided into two parts through introducing the initial inertially fixed navigation frame as inertial frame. The attitude changes of the navigation frame corresponding to the defined inertial frame can be exactly calculated with known velocity and position provided by GNSS. The attitude from body frame to the defined inertial frame is estimated based on the SINS mechanization in inertial frame. The attitude, velocity and position in inertial frame are formulated together as element of the group of double direct spatial isometries.It is proven that the group state model in inertial frame satisfies a particular "group affine" property and the corresponding error model satisfies a "log linear" autonomous differential equation on the Lie algebra. Based on such striking property, the attitude from body frame to the defined inertial frame can be estimated based on the linear error model with even extreme large misalignments. Two different error state vectors are extracted based on right and left matrix multiplications and the detailed linear state space models are derived based on the right and left errors for SINS mechanization in inertial frame. With the derived linear state space models, the explicit initial alignment procedures have been presented. Extensive simulation and field tests indicate that the initial alignment based on the left error model can perform quite well within a wide range of initial attitude errors, although the used filter is still a type of linear Kalman filter. This method is promising in practical products abandoning the traditional coarse alignment stage.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
Multi-frame Feature Aggregation for Real-time Instrument Segmentation in Endoscopic Video
Authors:
Shan Lin,
Fangbo Qin,
Haonan Peng,
Randall A. Bly,
Kris S. Moe,
Blake Hannaford
Abstract:
Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the…
▽ More
Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the presence of blood. We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially in a recurrent mode. By distributing the computation load of deep feature extraction over sequential frames, we can use a lightweight encoder to reduce the computation costs at each time step. Moreover, public surgical videos usually are not labeled frame by frame, so we develop a method that can randomly synthesize a surgical frame sequence from a single labeled frame to assist network training. We demonstrate that our approach achieves superior performance to corresponding deeper segmentation models on two public surgery datasets.
△ Less
Submitted 25 July, 2021; v1 submitted 17 November, 2020;
originally announced November 2020.
-
Contour Primitive of Interest Extraction Network Based on One-Shot Learning for Object-Agnostic Vision Measurement
Authors:
Fangbo Qin,
Jie Qin,
Siyu Huang,
De Xu
Abstract:
Image contour based vision measurement is widely applied in robot manipulation and industrial automation. It is appealing to realize object-agnostic vision system, which can be conveniently reused for various types of objects. We propose the contour primitive of interest extraction network (CPieNet) based on the one-shot learning framework. First, CPieNet is featured by that its contour primitive…
▽ More
Image contour based vision measurement is widely applied in robot manipulation and industrial automation. It is appealing to realize object-agnostic vision system, which can be conveniently reused for various types of objects. We propose the contour primitive of interest extraction network (CPieNet) based on the one-shot learning framework. First, CPieNet is featured by that its contour primitive of interest (CPI) output, a designated regular contour part lying on a specified object, provides the essential geometric information for vision measurement. Second, CPieNet has the one-shot learning ability, utilizing a support sample to assist the perception of the novel object. To realize lower-cost training, we generate support-query sample pairs from unpaired online public images, which cover a wide range of object categories. To obtain single-pixel wide contour for precise measurement, the Gabor-filters based non-maximum suppression is designed to thin the raw contour. For the novel CPI extraction task, we built the Object Contour Primitives dataset using online public images, and the Robotic Object Contour Measurement dataset using a camera mounted on a robot. The effectiveness of the proposed methods is validated by a series of experiments.
△ Less
Submitted 24 March, 2021; v1 submitted 7 October, 2020;
originally announced October 2020.
-
TP-LSD: Tri-Points Based Line Segment Detector
Authors:
Siyu Huang,
Fangbo Qin,
Pengfei Xiong,
Ning Ding,
Yijia He,
Xiao Liu
Abstract:
This paper proposes a novel deep convolutional model, Tri-Points Based Line Segment Detector (TP-LSD), to detect line segments in an image at real-time speed. The previous related methods typically use the two-step strategy, relying on either heuristic post-process or extra classifier. To realize one-step detection with a faster and more compact model, we introduce the tri-points representation, c…
▽ More
This paper proposes a novel deep convolutional model, Tri-Points Based Line Segment Detector (TP-LSD), to detect line segments in an image at real-time speed. The previous related methods typically use the two-step strategy, relying on either heuristic post-process or extra classifier. To realize one-step detection with a faster and more compact model, we introduce the tri-points representation, converting the line segment detection to the end-to-end prediction of a root-point and two endpoints for each line segment. TP-LSD has two branches: tri-points extraction branch and line segmentation branch. The former predicts the heat map of root-points and the two displacement maps of endpoints. The latter segments the pixels on straight lines out from background. Moreover, the line segmentation map is reused in the first branch as structural prior. We propose an additional novel evaluation metric and evaluate our method on Wireframe and YorkUrban datasets, demonstrating not only the competitive accuracy compared to the most recent methods, but also the real-time run speed up to 78 FPS with the $320\times 320$ input.
△ Less
Submitted 11 September, 2020;
originally announced September 2020.
-
LC-GAN: Image-to-image Translation Based on Generative Adversarial Network for Endoscopic Images
Authors:
Shan Lin,
Fangbo Qin,
Yangming Li,
Randall A. Bly,
Kris S. Moe,
Blake Hannaford
Abstract:
Intelligent vision is appealing in computer-assisted and robotic surgeries. Vision-based analysis with deep learning usually requires large labeled datasets, but manual data labeling is expensive and time-consuming in medical problems. We investigate a novel cross-domain strategy to reduce the need for manual data labeling by proposing an image-to-image translation model live-cadaver GAN (LC-GAN)…
▽ More
Intelligent vision is appealing in computer-assisted and robotic surgeries. Vision-based analysis with deep learning usually requires large labeled datasets, but manual data labeling is expensive and time-consuming in medical problems. We investigate a novel cross-domain strategy to reduce the need for manual data labeling by proposing an image-to-image translation model live-cadaver GAN (LC-GAN) based on generative adversarial networks (GANs). We consider a situation when a labeled cadaveric surgery dataset is available while the task is instrument segmentation on an unlabeled live surgery dataset. We train LC-GAN to learn the map**s between the cadaveric and live images. For live image segmentation, we first translate the live images to fake-cadaveric images with LC-GAN and then perform segmentation on the fake-cadaveric images with models trained on the real cadaveric dataset. The proposed method fully makes use of the labeled cadaveric dataset for live image segmentation without the need to label the live dataset. LC-GAN has two generators with different architectures that leverage the deep feature representation learned from the cadaveric image based segmentation task. Moreover, we propose the structural similarity loss and segmentation consistency loss to improve the semantic consistency during translation. Our model achieves better image-to-image translation and leads to improved segmentation performance in the proposed cross-domain segmentation task.
△ Less
Submitted 13 August, 2020; v1 submitted 10 March, 2020;
originally announced March 2020.
-
Towards Better Surgical Instrument Segmentation in Endoscopic Vision: Multi-Angle Feature Aggregation and Contour Supervision
Authors:
Fangbo Qin,
Shan Lin,
Yangming Li,
Randall A. Bly,
Kris S. Moe,
Blake Hannaford
Abstract:
Accurate and real-time surgical instrument segmentation is important in the endoscopic vision of robot-assisted surgery, and significant challenges are posed by frequent instrument-tissue contacts and continuous change of observation perspective. For these challenging tasks more and more deep neural networks (DNN) models are designed in recent years. We are motivated to propose a general embeddabl…
▽ More
Accurate and real-time surgical instrument segmentation is important in the endoscopic vision of robot-assisted surgery, and significant challenges are posed by frequent instrument-tissue contacts and continuous change of observation perspective. For these challenging tasks more and more deep neural networks (DNN) models are designed in recent years. We are motivated to propose a general embeddable approach to improve these current DNN segmentation models without increasing the model parameter number. Firstly, observing the limited rotation-invariance performance of DNN, we proposed the Multi-Angle Feature Aggregation (MAFA) method, leveraging active image rotation to gain richer visual cues and make the prediction more robust to instrument orientation changes. Secondly, in the end-to-end training stage, the auxiliary contour supervision is utilized to guide the model to learn the boundary awareness, so that the contour shape of segmentation mask is more precise. The proposed method is validated with ablation experiments on the novel Sinus-Surgery datasets collected from surgeons' operations, and is compared to the existing methods on a public dataset collected with a da Vinci Xi Robot.
△ Less
Submitted 10 August, 2020; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Aesthetic Quality Assessment for Group photograph
Authors:
Yaoting Wang,
Yongzhen Ke,
Kai Wang,
Cuijiao Zhang,
Fan Qin
Abstract:
Image aesthetic quality assessment has got much attention in recent years, but not many works have been done on a specific genre of photos: Group photograph. In this work, we designed a set of high-level features based on the experience and principles of group photography: Opened-eye, Gaze, Smile, Occluded faces, Face Orientation, Facial blur, Character center. Then we combined them and 83 generic…
▽ More
Image aesthetic quality assessment has got much attention in recent years, but not many works have been done on a specific genre of photos: Group photograph. In this work, we designed a set of high-level features based on the experience and principles of group photography: Opened-eye, Gaze, Smile, Occluded faces, Face Orientation, Facial blur, Character center. Then we combined them and 83 generic aesthetic features to build two aesthetic assessment models. We also constructed a large dataset of group photographs - GPD- annotated with the aesthetic score. The experimental result shows that our features perform well for categorizing professional photos and snapshots and predicting the distinction of multiple group photographs of diverse human states under the same scene.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
Linear Span Network for Object Skeleton Detection
Authors:
Chang Liu,
Wei Ke,
Fei Qin,
Qixiang Ye
Abstract:
Robust object skeleton detection requires to explore rich representative visual features and effective feature fusion strategies. In this paper, we first re-visit the implementation of HED, the essential principle of which can be ideally described with a linear reconstruction model. Hinted by this, we formalize a Linear Span framework, and propose Linear Span Network (LSN) modified by Linear Span…
▽ More
Robust object skeleton detection requires to explore rich representative visual features and effective feature fusion strategies. In this paper, we first re-visit the implementation of HED, the essential principle of which can be ideally described with a linear reconstruction model. Hinted by this, we formalize a Linear Span framework, and propose Linear Span Network (LSN) modified by Linear Span Units (LSUs), which minimize the reconstruction error of convolutional network. LSN further utilizes subspace linear span beside the feature linear span to increase the independence of convolutional features and the efficiency of feature integration, which enlarges the capability of fitting complex ground-truth. As a result, LSN can effectively suppress the cluttered backgrounds and reconstruct object skeletons. Experimental results validate the state-of-the-art performance of the proposed LSN.
△ Less
Submitted 25 July, 2018;
originally announced July 2018.
-
Dynamic Analytical Initialization Method for Spacecraft Attitude Estimators
Authors:
Lubin Chang,
Fangjun Qin
Abstract:
This paper proposes a dynamic analytical initialization method for spacecraft attitude estimators. In the proposed method, the desired attitude matrix is decomposed into two parts: one is the constant attitude matrix at the very start and the other encodes the attitude changes of the body frame from its initial state. The latter one can be calculated recursively using the gyroscope outputs and the…
▽ More
This paper proposes a dynamic analytical initialization method for spacecraft attitude estimators. In the proposed method, the desired attitude matrix is decomposed into two parts: one is the constant attitude matrix at the very start and the other encodes the attitude changes of the body frame from its initial state. The latter one can be calculated recursively using the gyroscope outputs and the constant attitude matrix can be determined using constructed vector observations at different time. Compared with traditional initialization methods, the proposed method does not necessitate the spacecraft being static or more than two non-collinear vector observations at the same time. Therefore, the proposed method can promote increased spacecraft autonomy by autonomous initialization of attitude estimators. The effectiveness and prospect of the proposed method in spacecraft attitude estimation applications have been validated through numerical simulations.
△ Less
Submitted 21 June, 2017;
originally announced June 2017.
-
A scalable convolutional neural network for task-specified scenarios via knowledge distillation
Authors:
Mengnan Shi,
Fei Qin,
Qixiang Ye,
Zhenjun Han,
Jianbin Jiao
Abstract:
In this paper, we explore the redundancy in convolutional neural network, which scales with the complexity of vision tasks. Considering that many front-end visual systems are interested in only a limited range of visual targets, the removing of task-specified network redundancy can promote a wide range of potential applications. We propose a task-specified knowledge distillation algorithm to deriv…
▽ More
In this paper, we explore the redundancy in convolutional neural network, which scales with the complexity of vision tasks. Considering that many front-end visual systems are interested in only a limited range of visual targets, the removing of task-specified network redundancy can promote a wide range of potential applications. We propose a task-specified knowledge distillation algorithm to derive a simplified model with pre-set computation cost and minimized accuracy loss, which suits the resource constraint front-end systems well. Experiments on the MNIST and CIFAR10 datasets demonstrate the feasibility of the proposed approach as well as the existence of task-specified redundancy.
△ Less
Submitted 6 January, 2017; v1 submitted 19 September, 2016;
originally announced September 2016.