Search | arXiv e-print repository

Convolution-based Probability Gradient Loss for Semantic Segmentation

Abstract: In this paper, we introduce a novel Convolution-based Probability Gradient (CPG) loss for semantic segmentation. It employs convolution kernels similar to the Sobel operator, capable of computing the gradient of pixel intensity in an image. This enables the computation of gradients for both ground-truth and predicted category-wise probabilities. It enhances network performance by maximizing the si… ▽ More In this paper, we introduce a novel Convolution-based Probability Gradient (CPG) loss for semantic segmentation. It employs convolution kernels similar to the Sobel operator, capable of computing the gradient of pixel intensity in an image. This enables the computation of gradients for both ground-truth and predicted category-wise probabilities. It enhances network performance by maximizing the similarity between these two probability gradients. Moreover, to specifically enhance accuracy near the object's boundary, we extract the object boundary based on the ground-truth probability gradient and exclusively apply the CPG loss to pixels belonging to boundaries. CPG loss proves to be highly convenient and effective. It establishes pixel relationships through convolution, calculating errors from a distinct dimension compared to pixel-wise loss functions such as cross-entropy loss. We conduct qualitative and quantitative analyses to evaluate the impact of the CPG loss on three well-established networks (DeepLabv3-Resnet50, HRNetV2-OCR, and LRASPP_MobileNet_V3_Large) across three standard segmentation datasets (Cityscapes, COCO-Stuff, ADE20K). Our extensive experimental results consistently and significantly demonstrate that the CPG loss enhances the mean Intersection over Union. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 12 pages, 7 figures

arXiv:2010.04609 [pdf, other]

Causal Feature Selection with Dimension Reduction for Interpretable Text Classification

Authors: Guohou Shan, James Foulds, Shimei Pan

Abstract: Text features that are correlated with class labels, but do not directly cause them, are sometimesuseful for prediction, but they may not be insightful. As an alternative to traditional correlation-basedfeature selection, causal inference could reveal more principled, meaningful relationships betweentext features and labels. To help researchers gain insight into text data, e.g. for social sciencea… ▽ More Text features that are correlated with class labels, but do not directly cause them, are sometimesuseful for prediction, but they may not be insightful. As an alternative to traditional correlation-basedfeature selection, causal inference could reveal more principled, meaningful relationships betweentext features and labels. To help researchers gain insight into text data, e.g. for social scienceapplications, in this paper we investigate a class of matching-based causal inference methods fortext feature selection. Features used in document classification are often high dimensional, howeverexisting causal feature selection methods use Propensity Score Matching (PSM) which is known to beless effective in high-dimensional spaces. We propose a new causal feature selection framework thatcombines dimension reduction with causal inference to improve text feature selection. Experiments onboth synthetic and real-world data demonstrate the promise of our methods in improving classificationand enhancing interpretability. △ Less

Submitted 9 October, 2020; originally announced October 2020.

Comments: 11 pages, 3 pages

ACM Class: I.2.7

arXiv:2009.09940 [pdf, other]

CNNPruner: Pruning Convolutional Neural Networks with Visual Analytics

Authors: Guan Li, Junpeng Wang, Han-Wei Shen, Kaixin Chen, Guihua Shan, Zhonghua Lu

Abstract: Convolutional neural networks (CNNs) have demonstrated extraordinarily good performance in many computer vision tasks. The increasing size of CNN models, however, prevents them from being widely deployed to devices with limited computational resources, e.g., mobile/embedded devices. The emerging topic of model pruning strives to address this problem by removing less important neurons and fine-tuni… ▽ More Convolutional neural networks (CNNs) have demonstrated extraordinarily good performance in many computer vision tasks. The increasing size of CNN models, however, prevents them from being widely deployed to devices with limited computational resources, e.g., mobile/embedded devices. The emerging topic of model pruning strives to address this problem by removing less important neurons and fine-tuning the pruned networks to minimize the accuracy loss. Nevertheless, existing automated pruning solutions often rely on a numerical threshold of the pruning criteria, lacking the flexibility to optimally balance the trade-off between model size and accuracy. Moreover, the complicated interplay between the stages of neuron pruning and model fine-tuning makes this process opaque, and therefore becomes difficult to optimize. In this paper, we address these challenges through a visual analytics approach, named CNNPruner. It considers the importance of convolutional filters through both instability and sensitivity, and allows users to interactively create pruning plans according to a desired goal on model size or accuracy. Also, CNNPruner integrates state-of-the-art filter visualization techniques to help users understand the roles that different filters played and refine their pruning plans. Through comprehensive case studies on CNNs with real-world sizes, we validate the effectiveness of CNNPruner. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Comments: 10 pages,15 figures, Accepted for presentation at IEEE VIS 2020

arXiv:2005.07567 [pdf]

Accelerating drug repurposing for COVID-19 via modeling drug mechanism of action with large scale gene-expression profiles

Authors: Lu Han, G. C. Shan, B. F. Chu, H. Y. Wang, Z. J. Wang, S. Q. Gao, W. X. Zhou

Abstract: The novel coronavirus disease, named COVID-19, emerged in China in December 2019, and has rapidly spread around the world. It is clearly urgent to fight COVID-19 at global scale. The development of methods for identifying drug uses based on phenotypic data can improve the efficiency of drug development. However, there are still many difficulties in identifying drug applications based on cell pictu… ▽ More The novel coronavirus disease, named COVID-19, emerged in China in December 2019, and has rapidly spread around the world. It is clearly urgent to fight COVID-19 at global scale. The development of methods for identifying drug uses based on phenotypic data can improve the efficiency of drug development. However, there are still many difficulties in identifying drug applications based on cell picture data. This work reported one state-of-the-art machine learning method to identify drug uses based on the cell image features of 1024 drugs generated in the LINCS program. Because the multi-dimensional features of the image are affected by non-experimental factors, the characteristics of similar drugs vary greatly, and the current sample number is not enough to use deep learning and other methods are used for learning optimization. As a consequence, this study is based on the supervised ITML algorithm to convert the characteristics of drugs. The results show that the characteristics of ITML conversion are more conducive to the recognition of drug functions. The analysis of feature conversion shows that different features play important roles in identifying different drug functions. For the current COVID-19, Chloroquine and Hydroxychloroquine achieve antiviral effects by inhibiting endocytosis, etc., and were classified to the same community. And Clomiphene in the same community inibited the entry of Ebola Virus, indicated a similar MoAs that could be reflected by cell image. △ Less

Submitted 5 October, 2021; v1 submitted 15 May, 2020; originally announced May 2020.

Comments: 22 pages, 4 figures. Cognitive Neurodynamics (2021)

arXiv:2003.00817 [pdf]

Recognizing Handwritten Mathematical Expressions as LaTex Sequences Using a Multiscale Robust Neural Network

Authors: Hongyu Wang, Guangcun Shan

Abstract: In this paper, a robust multiscale neural network is proposed to recognize handwritten mathematical expressions and output LaTeX sequences, which can effectively and correctly focus on where each step of output should be concerned and has a positive effect on analyzing the two-dimensional structure of handwritten mathematical expressions and identifying different mathematical symbols in a long exp… ▽ More In this paper, a robust multiscale neural network is proposed to recognize handwritten mathematical expressions and output LaTeX sequences, which can effectively and correctly focus on where each step of output should be concerned and has a positive effect on analyzing the two-dimensional structure of handwritten mathematical expressions and identifying different mathematical symbols in a long expression. With the addition of visualization, the model's recognition process is shown in detail. In addition, our model achieved 49.459% and 46.062% ExpRate on the public CROHME 2014 and CROHME 2016 datasets. The present model results suggest that the state-of-the-art model has better robustness, fewer errors, and higher accuracy. △ Less

Submitted 26 February, 2020; originally announced March 2020.

Comments: 6 figures, 5 tables, 20 pages

arXiv:1907.09320 [pdf]

An Efficient Target Detection and Recognition Method in Aerial Remote-sensing Images Based on Multiangle Regions-of-Interest

Authors: Guangcun Shan, Hongyu Wang, Wei Liang, Congcong Liu, Qizi Ma, Quan Quan

Abstract: Recently, deep learning technology have been extensively used in the field of image recognition. However, its main application is the recognition and detection of ordinary pictures and common scenes. It is challenging to effectively and expediently analyze remote-sensing images obtained by the image acquisition systems on unmanned aerial vehicles (UAVs), which includes the identification of the ta… ▽ More Recently, deep learning technology have been extensively used in the field of image recognition. However, its main application is the recognition and detection of ordinary pictures and common scenes. It is challenging to effectively and expediently analyze remote-sensing images obtained by the image acquisition systems on unmanned aerial vehicles (UAVs), which includes the identification of the target and calculation of its position. Aerial remote sensing images have different shooting angles and methods compared with ordinary pictures or images, which makes remote-sensing images play an irreplaceable role in some areas. In this study, a new target detection and recognition method in remote-sensing images is proposed based on deep convolution neural network (CNN) for the provision of multilevel information of images in combination with a region proposal network used to generate multiangle regions-of-interest. The proposed method generated results that were much more accurate and precise than those obtained with traditional ways. This demonstrated that the model proposed herein displays tremendous applicability potential in remote-sensing image recognition. △ Less

Submitted 7 June, 2022; v1 submitted 22 July, 2019; originally announced July 2019.

Comments: 5 pages, 3 figures

arXiv:1906.06496 [pdf, other]

doi 10.1007/s11704-021-0173-7

Accelerating temporal action proposal generation via high performance computing

Authors: Tian Wang, Shiye Lei, Youyou Jiang, Choi Chang, Hichem Snoussi, Guangcun Shan

Abstract: Temporal action recognition always depends on temporal action proposal generation to hypothesize actions and algorithms usually need to process very long video sequences and output the starting and ending times of each potential action in each video suffering from high computation cost. To address this, based on boundary sensitive network we propose a new temporal convolution network called Multip… ▽ More Temporal action recognition always depends on temporal action proposal generation to hypothesize actions and algorithms usually need to process very long video sequences and output the starting and ending times of each potential action in each video suffering from high computation cost. To address this, based on boundary sensitive network we propose a new temporal convolution network called Multipath Temporal ConvNet (MTN), which consists of two parts i.e. Multipath DenseNet and SE-ConvNet. In this work, one novel high performance ring parallel architecture based on Message Passing Interface (MPI) is further introduced into temporal action proposal generation, which is a reliable communication protocol, in order to respond to the requirements of large memory occupation and a large number of videos. Remarkably, the total data transmission is reduced by adding a connection between multiple computing load in the newly developed architecture. It is found that, compared to the traditional Parameter Server architecture, our parallel architecture has higher efficiency on temporal action detection task with multiple GPUs, which is suitable for dealing with the tasks of temporal action proposal generation, especially for large datasets of millions of videos. We conduct experiments on ActivityNet-1.3 and THUMOS14, where our method outperforms other state-of-art temporal action detection methods with high recall and high temporal precision. In addition, a time metric is further proposed here to evaluate the speed performance in the distributed training process. △ Less

Submitted 24 April, 2020; v1 submitted 15 June, 2019; originally announced June 2019.

Comments: 11 pages, 12 figures

Journal ref: Frontiers of Computer Science volume 16, Article number: 164317 (2022)

arXiv:1903.11891 [pdf]

AED-Net: An Abnormal Event Detection Network

Authors: Tian Wang, Zichen Miao, Yuxin Chen, Yi Zhou, Guangcun Shan, Hichem Snoussi

Abstract: It is challenging to detect the anomaly in crowded scenes for quite a long time. In this paper, a self-supervised framework, abnormal event detection network (AED-Net), which is composed of PCAnet and kernel principal component analysis (kPCA), is proposed to address this problem. Using surveillance video sequences of different scenes as raw data, PCAnet is trained to extract high-level semantics… ▽ More It is challenging to detect the anomaly in crowded scenes for quite a long time. In this paper, a self-supervised framework, abnormal event detection network (AED-Net), which is composed of PCAnet and kernel principal component analysis (kPCA), is proposed to address this problem. Using surveillance video sequences of different scenes as raw data, PCAnet is trained to extract high-level semantics of crowd's situation. Next, kPCA,a one-class classifier, is trained to determine anomaly of the scene. In contrast to some prevailing deep learning methods,the framework is completely self-supervised because it utilizes only video sequences in a normal situation. Experiments of global and local abnormal event detection are carried out on UMN and UCSD datasets, and competitive results with higher EER and AUC compared to other state-of-the-art methods are observed. Furthermore, by adding local response normalization (LRN) layer, we propose an improvement to original AED-Net. And it is proved to perform better by promoting the framework's generalization capacity according to the experiments. △ Less

Submitted 28 March, 2019; originally announced March 2019.

Comments: 14 pages, 7 figures

Journal ref: Engineering, 2019

arXiv:1902.05376 [pdf]

doi 10.1007/s11432-018-9824-9

Robust Encoder-Decoder Learning Framework towards Offline Handwritten Mathematical Expression Recognition Based on Multi-Scale Deep Neural Network

Authors: Guangcun Shan, Hongyu Wang, Wei Liang

Abstract: Offline handwritten mathematical expression recognition is a challenging task, because handwritten mathematical expressions mainly have two problems in the process of recognition. On one hand, it is how to correctly recognize different mathematical symbols. On the other hand, it is how to correctly recognize the two-dimensional structure existing in mathematical expressions. Inspired by recent wor… ▽ More Offline handwritten mathematical expression recognition is a challenging task, because handwritten mathematical expressions mainly have two problems in the process of recognition. On one hand, it is how to correctly recognize different mathematical symbols. On the other hand, it is how to correctly recognize the two-dimensional structure existing in mathematical expressions. Inspired by recent work in deep learning, a new neural network model that combines a Multi-Scale convolutional neural network (CNN) with an Attention recurrent neural network (RNN) is proposed to identify two-dimensional handwritten mathematical expressions as one-dimensional LaTeX sequences. As a result, the model proposed in the present work has achieved a WER error of 25.715% and ExpRate of 28.216%. △ Less

Submitted 28 May, 2020; v1 submitted 7 February, 2019; originally announced February 2019.

Comments: 11 pages, 16 figures

Journal ref: Sci China Inf Sci, 2021, 64(3): 139101, doi: 10.1007/s11432-018-9824-9

arXiv:1902.03377 [pdf]

doi 10.1109/CAC.2018.8623687

Region based Ensemble Learning Network for Fine-grained Classification

Authors: Weikuang Li, Tian Wang, Chuanyun Wang, Guangcun Shan, Mengyi Zhang, Hichem Snoussi

Abstract: As an important research topic in computer vision, fine-grained classification which aims to recognition subordinate-level categories has attracted significant attention. We propose a novel region based ensemble learning network for fine-grained classification. Our approach contains a detection module and a module for classification. The detection module is based on the faster R-CNN framework to l… ▽ More As an important research topic in computer vision, fine-grained classification which aims to recognition subordinate-level categories has attracted significant attention. We propose a novel region based ensemble learning network for fine-grained classification. Our approach contains a detection module and a module for classification. The detection module is based on the faster R-CNN framework to locate the semantic regions of the object. The classification module using an ensemble learning method, which trains a set of sub-classifiers for different semantic regions and combines them together to get a stronger classifier. In the evaluation, we implement experiments on the CUB-2011 dataset and the result of experiments proves our method s efficient for fine-grained classification. We also extend our approach to remote scene recognition and evaluate it on the NWPU-RESISC45 dataset. △ Less

Submitted 9 February, 2019; originally announced February 2019.

Comments: 6 pages, 3 figures, 2018 Chinese Automation Congress (CAC)

arXiv:1902.03365 [pdf]

doi 10.1109/CAC.2018.8623424

HE-SLAM: a Stereo SLAM System Based on Histogram Equalization and ORB Features

Authors: Yinghong Fang, Guangcun Shan, Xin Li, Wenliang Liu, Tian Wang, Hichem Snoussi

Abstract: In the real-life environments, due to the sudden appearance of windows, lights, and objects blocking the light source, the visual SLAM system can easily capture the low-contrast images caused by over-exposure or over-darkness. At this time, the direct method of estimating camera motion based on pixel luminance information is infeasible, and it is often difficult to find enough valid feature points… ▽ More In the real-life environments, due to the sudden appearance of windows, lights, and objects blocking the light source, the visual SLAM system can easily capture the low-contrast images caused by over-exposure or over-darkness. At this time, the direct method of estimating camera motion based on pixel luminance information is infeasible, and it is often difficult to find enough valid feature points without image processing. This paper proposed HE-SLAM, a new method combining histogram equalization and ORB feature extraction, which can be robust in more scenes, especially in stages with low-contrast images. Because HE-SLAM uses histogram equalization to improve the contrast of images, it can extract enough valid feature points in low-contrast images for subsequent feature matching, keyframe selection, bundle adjustment, and loop closure detection. The proposed HE-SLAM has been tested on the popular datasets (such as KITTI and EuRoc), and the real-time performance and robustness of the system are demonstrated by comparing system runtime and the mean square root error (RMSE) of absolute trajectory error (ATE) with state-of-the-art methods like ORB-SLAM2. △ Less

Submitted 8 February, 2019; originally announced February 2019.

Comments: 7 pages, 2 figures, 2018 Chinese Automation Congress (CAC)

arXiv:1412.7780 [pdf]

doi 10.1007/s12650-014-0206-5

Interactive Visual Exploration of Halos in Large Scale Cosmology Simulation

Authors: Guihua Shan, Mao** Xie, FengAn Li, Yang Gao, Xuebin Chi

Abstract: Halo is one of the most important basic elements in cosmology simulation, which merges from small clumps to ever larger objects. The processes of the birth and merging of the halos play a fundamental role in studying the evolution of large scale cosmological structures. In this paper, a visual analysis system is developed to interactively identify and explore the evolution histories of thousands o… ▽ More Halo is one of the most important basic elements in cosmology simulation, which merges from small clumps to ever larger objects. The processes of the birth and merging of the halos play a fundamental role in studying the evolution of large scale cosmological structures. In this paper, a visual analysis system is developed to interactively identify and explore the evolution histories of thousands of halos. In this system, an intelligent structure-aware selection method in What You See Is What You Get manner is designed to efficiently define the interesting region in 3D space with 2D hand-drawn lasso input. Then the exact information of halos within this 3D region is identified by data mining in the merger tree files. To avoid visual clutter, all the halos are projected in 2D space with a MDS method. Through the linked view of 3D View and 2D graph, Users can interactively explore these halos, including the tracing path and evolution history tree. △ Less

Submitted 24 December, 2014; originally announced December 2014.

Comments: 9pages, 14figures

Journal ref: J. Visualization 17(3):145-156(2014)

Showing 1–12 of 12 results for author: Shan, G