Search | arXiv e-print repository

arXiv:2404.00726 [pdf, other]

MugenNet: A Novel Combined Convolution Neural Network and Transformer Network with its Application for Colonic Polyp Image Segmentation

Authors: Chen Peng, Zhiqin Qian, Kunyu Wang, Qi Luo, Zhuming Bi, Wenjun Zhang

Abstract: Biomedical image segmentation is a very important part in disease diagnosis. The term "colonic polyps" refers to polypoid lesions that occur on the surface of the colonic mucosa within the intestinal lumen. In clinical practice, early detection of polyps is conducted through colonoscopy examinations and biomedical image processing. Therefore, the accurate polyp image segmentation is of great signi… ▽ More Biomedical image segmentation is a very important part in disease diagnosis. The term "colonic polyps" refers to polypoid lesions that occur on the surface of the colonic mucosa within the intestinal lumen. In clinical practice, early detection of polyps is conducted through colonoscopy examinations and biomedical image processing. Therefore, the accurate polyp image segmentation is of great significance in colonoscopy examinations. Convolutional Neural Network (CNN) is a common automatic segmentation method, but its main disadvantage is the long training time. Transformer utilizes a self-attention mechanism, which essentially assigns different importance weights to each piece of information, thus achieving high computational efficiency during segmentation. However, a potential drawback is the risk of information loss. In the study reported in this paper, based on the well-known hybridization principle, we proposed a method to combine CNN and Transformer to retain the strengths of both, and we applied this method to build a system called MugenNet for colonic polyp image segmentation. We conducted a comprehensive experiment to compare MugenNet with other CNN models on five publicly available datasets. The ablation experiment on MugentNet was conducted as well. The experimental results show that MugenNet achieves significantly higher processing speed and accuracy compared with CNN alone. The generalized implication with our work is a method to optimally combine two complimentary methods of machine learning. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2402.13798 [pdf, other]

AFPR-CIM: An Analog-Domain Floating-Point RRAM-based Compute-In-Memory Architecture with Dynamic Range Adaptive FP-ADC

Authors: Haobo Liu, Zhengyang Qian, Wei Wu, Hongwei Ren, Zhiwei Liu, Leibin Ni

Abstract: Power consumption has become the major concern in neural network accelerators for edge devices. The novel non-volatile-memory (NVM) based computing-in-memory (CIM) architecture has shown great potential for better energy efficiency. However, most of the recent NVM-CIM solutions mainly focus on fixed-point calculation and are not applicable to floating-point (FP) processing. In this paper, we propo… ▽ More Power consumption has become the major concern in neural network accelerators for edge devices. The novel non-volatile-memory (NVM) based computing-in-memory (CIM) architecture has shown great potential for better energy efficiency. However, most of the recent NVM-CIM solutions mainly focus on fixed-point calculation and are not applicable to floating-point (FP) processing. In this paper, we propose an analog-domain floating-point CIM architecture (AFPR-CIM) based on resistive random-access memory (RRAM). A novel adaptive dynamic-range FP-ADC is designed to convert the analog computation results into FP codes. Output current with high dynamic range is converted to a normalized voltage range for readout, to prevent precision loss at low power consumption. Moreover, a novel FP-DAC is also implemented which reconstructs FP digital codes into analog values to perform analog computation. The proposed AFPR-CIM architecture enables neural network acceleration with FP8 (E2M5) activation for better accuracy and energy efficiency. Evaluation results show that AFPR-CIM can achieve 19.89 TFLOPS/W energy efficiency and 1474.56 GOPS throughput. Compared to traditional FP8 accelerator, digital FP-CIM, and analog INT8-CIM, this work achieves 4.135x, 5.376x, and 2.841x energy efficiency enhancement, respectively. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted by DATE 2024

arXiv:2309.07428 [pdf, other]

Physical Invisible Backdoor Based on Camera Imaging

Authors: Yusheng Guo, Nan Zhong, Zhenxing Qian, Xinpeng Zhang

Abstract: Backdoor attack aims to compromise a model, which returns an adversary-wanted output when a specific trigger pattern appears yet behaves normally for clean inputs. Current backdoor attacks require changing pixels of clean images, which results in poor stealthiness of attacks and increases the difficulty of the physical implementation. This paper proposes a novel physical invisible backdoor based o… ▽ More Backdoor attack aims to compromise a model, which returns an adversary-wanted output when a specific trigger pattern appears yet behaves normally for clean inputs. Current backdoor attacks require changing pixels of clean images, which results in poor stealthiness of attacks and increases the difficulty of the physical implementation. This paper proposes a novel physical invisible backdoor based on camera imaging without changing nature image pixels. Specifically, a compromised model returns a target label for images taken by a particular camera, while it returns correct results for other images. To implement and evaluate the proposed backdoor, we take shots of different objects from multi-angles using multiple smartphones to build a new dataset of 21,500 images. Conventional backdoor attacks work ineffectively with some classical models, such as ResNet18, over the above-mentioned dataset. Therefore, we propose a three-step training strategy to mount the backdoor attack. First, we design and train a camera identification model with the phone IDs to extract the camera fingerprint feature. Subsequently, we elaborate a special network architecture, which is easily compromised by our backdoor attack, by leveraging the attributes of the CFA interpolation algorithm and combining it with the feature extraction block in the camera identification model. Finally, we transfer the backdoor from the elaborated special network architecture to the classical architecture model via teacher-student distillation learning. Since the trigger of our method is related to the specific phone, our attack works effectively in the physical world. Experiment results demonstrate the feasibility of our proposed approach and robustness against various backdoor defenses. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2307.16418 [pdf, other]

DRAW: Defending Camera-shooted RAW against Image Manipulation

Authors: Xiaoxiao Hu, Qichao Ying, Zhenxing Qian, Sheng Li, Xinpeng Zhang

Abstract: RAW files are the initial measurement of scene radiance widely used in most cameras, and the ubiquitously-used RGB images are converted from RAW data through Image Signal Processing (ISP) pipelines. Nowadays, digital images are risky of being nefariously manipulated. Inspired by the fact that innate immunity is the first line of body defense, we propose DRAW, a novel scheme of defending images aga… ▽ More RAW files are the initial measurement of scene radiance widely used in most cameras, and the ubiquitously-used RGB images are converted from RAW data through Image Signal Processing (ISP) pipelines. Nowadays, digital images are risky of being nefariously manipulated. Inspired by the fact that innate immunity is the first line of body defense, we propose DRAW, a novel scheme of defending images against manipulation by protecting their sources, i.e., camera-shooted RAWs. Specifically, we design a lightweight Multi-frequency Partial Fusion Network (MPF-Net) friendly to devices with limited computing resources by frequency learning and partial feature fusion. It introduces invisible watermarks as protective signal into the RAW data. The protection capability can not only be transferred into the rendered RGB images regardless of the applied ISP pipeline, but also is resilient to post-processing operations such as blurring or compression. Once the image is manipulated, we can accurately identify the forged areas with a localization network. Extensive experiments on several famous RAW datasets, e.g., RAISE, FiveK and SIDD, indicate the effectiveness of our method. We hope that this technique can be used in future cameras as an option for image protection, which could effectively restrict image manipulation at the source. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: To appear in ICCV 2023. The leading two authors contribute equally

arXiv:2305.17928 [pdf, ps, other]

Computation Offloading for Edge Computing in RIS-Assisted Symbiotic Radio Systems

Authors: Bin Li, Zhen Qian, Lei Liu, Yuan Wu, Dapeng Lan, Celimuge Wu

Abstract: In the paper, we investigate the coordination process of sensing and computation offloading in a reconfigurable intelligent surface (RIS)-aided base station (BS)-centric symbiotic radio (SR) systems. Specifically, the Internet-of-Things (IoT) devices first sense data from environment and then tackle the data locally or offload the data to BS for remote computing, while RISs are leveraged to enhanc… ▽ More In the paper, we investigate the coordination process of sensing and computation offloading in a reconfigurable intelligent surface (RIS)-aided base station (BS)-centric symbiotic radio (SR) systems. Specifically, the Internet-of-Things (IoT) devices first sense data from environment and then tackle the data locally or offload the data to BS for remote computing, while RISs are leveraged to enhance the quality of blocked channels and also act as IoT devices to transmit its sensed data. To explore the mechanism of cooperative sensing and computation offloading in this system, we aim at maximizing the total completed sensed bits of all users and RISs by jointly optimizing the time allocation parameter, the passive beamforming at each RIS, the transmit beamforming at BS, and the energy partition parameters for all users subject to the size of sensed data, energy supply and given time cycle. The formulated nonconvex problem is tightly coupled by the time allocation parameter and involves the mathematical expectations, which cannot be solved straightly. We use Monte Carlo and fractional programming methods to transform the nonconvex objective function and then propose an alternating optimization-based algorithm to find an approximate solution with guaranteed convergence. Numerical results show that the RIS-aided SR system outperforms other benchmarks in sensing. Furthermore, with the aid of RIS, the channel and system performance can be significantly improved. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 13 pages, 7 figures

arXiv:2304.06493 [pdf]

Fault diagnosis for PV arrays considering dust impact based on transformed graphical feature of characteristic curves and convolutional neural network with CBAM modules

Authors: Jiaqi Qu, Lu Wei, Qiang Sun, Hamidreza Zareipour, Zheng Qian

Abstract: Various faults can occur during the operation of PV arrays, and both the dust-affected operating conditions and various diode configurations make the faults more complicated. However, current methods for fault diagnosis based on I-V characteristic curves only utilize partial feature information and often rely on calibrating the field characteristic curves to standard test conditions (STC). It is d… ▽ More Various faults can occur during the operation of PV arrays, and both the dust-affected operating conditions and various diode configurations make the faults more complicated. However, current methods for fault diagnosis based on I-V characteristic curves only utilize partial feature information and often rely on calibrating the field characteristic curves to standard test conditions (STC). It is difficult to apply it in practice and to accurately identify multiple complex faults with similarities in different blocking diodes configurations of PV arrays under the influence of dust. Therefore, a novel fault diagnosis method for PV arrays considering dust impact is proposed. In the preprocessing stage, the Isc-Voc normalized Gramian angular difference field (GADF) method is presented, which normalizes and transforms the resampled PV array characteristic curves from the field including I-V and P-V to obtain the transformed graphical feature matrices. Then, in the fault diagnosis stage, the model of convolutional neural network (CNN) with convolutional block attention modules (CBAM) is designed to extract fault differentiation information from the transformed graphical matrices containing full feature information and to classify faults. And different graphical feature transformation methods are compared through simulation cases, and different CNN-based classification methods are also analyzed. The results indicate that the developed method for PV arrays with different blocking diodes configurations under various operating conditions has high fault diagnosis accuracy and reliability. △ Less

Submitted 24 March, 2023; originally announced April 2023.

arXiv:2301.01420 [pdf]

Improved CNN Prediction Based Reversible Data Hiding

Authors: Yingqiang Qiu, Wanli Peng, Xiaodan Lin, Huanqiang Zeng, Zhenxing Qian

Abstract: This letter proposes an improved CNN predictor (ICNNP) for reversible data hiding (RDH) in images, which consists of a feature extraction module, a pixel prediction module, and a complexity prediction module. Due to predicting the complexity of each pixel with the ICNNP during the embedding process, the proposed method can achieve superior performance than the CNN predictor-based method. Specifica… ▽ More This letter proposes an improved CNN predictor (ICNNP) for reversible data hiding (RDH) in images, which consists of a feature extraction module, a pixel prediction module, and a complexity prediction module. Due to predicting the complexity of each pixel with the ICNNP during the embedding process, the proposed method can achieve superior performance than the CNN predictor-based method. Specifically, an input image does be first split into two different sub-images, i.e., the "Dot" image and the "Cross" image. Meanwhile, each sub-image is applied to predict another one. Then, the prediction errors of pixels are sorted with the predicted pixel complexities. In light of this, some sorted prediction errors with less complexity are selected to be efficiently used for low-distortion data embedding with a traditional histogram shift scheme. Experimental results demonstrate that the proposed method can achieve better embedding performance than that of the CNN predictor with the same histogram shifting strategy. △ Less

Submitted 3 January, 2023; originally announced January 2023.

arXiv:2210.15902 [pdf, other]

Learning to Immunize Images for Tamper Localization and Self-Recovery

Authors: Qichao Ying, Hang Zhou, Zhenxing Qian, Sheng Li, Xinpeng Zhang

Abstract: Digital images are vulnerable to nefarious tampering attacks such as content addition or removal that severely alter the original meaning. It is somehow like a person without protection that is open to various kinds of viruses. Image immunization (Imuge) is a technology of protecting the images by introducing trivial perturbation, so that the protected images are immune to the viruses in that the… ▽ More Digital images are vulnerable to nefarious tampering attacks such as content addition or removal that severely alter the original meaning. It is somehow like a person without protection that is open to various kinds of viruses. Image immunization (Imuge) is a technology of protecting the images by introducing trivial perturbation, so that the protected images are immune to the viruses in that the tampered contents can be auto-recovered. This paper presents Imuge+, an enhanced scheme for image immunization. By observing the invertible relationship between image immunization and the corresponding self-recovery, we employ an invertible neural network to jointly learn image immunization and recovery respectively in the forward and backward pass. We also introduce an efficient attack layer that involves both malicious tamper and benign image post-processing, where a novel distillation-based JPEG simulator is proposed for improved JPEG robustness. Our method achieves promising results in real-world tests where experiments show accurate tamper localization as well as high-fidelity content recovery. Additionally, we show superior performance on tamper localization compared to state-of-the-art schemes based on passive forensics. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: Under Review. Extended version of our ACMMM 2021 paper

arXiv:2207.11430 [pdf, other]

Rate-Splitting Multiple Access in Multi-cell Dense Networks: A Stochastic Geometry Approach

Authors: Qiao Zhu, Zhihong Qian, Bruno Clerckx, Xue Wang

Abstract: In this paper, the potential benefits of applying the Rate-Splitting Multiple Access (RSMA) in multi-cell dense networks are explored. Using tools of stochastic geometry, the sum-rate of RSMA-enhanced multi-cell dense networks is evaluated mathematically based on a Moment Generating Function (MGF) based framework to prove that RSMA is a general and powerful strategy for multi-antenna downlink syst… ▽ More In this paper, the potential benefits of applying the Rate-Splitting Multiple Access (RSMA) in multi-cell dense networks are explored. Using tools of stochastic geometry, the sum-rate of RSMA-enhanced multi-cell dense networks is evaluated mathematically based on a Moment Generating Function (MGF) based framework to prove that RSMA is a general and powerful strategy for multi-antenna downlink systems. Further elaboration of the systematic performance metrics is undertaken by develo** analytical expressions for area spectral efficiency and sum-rate in the RSMA-enhanced multi-cell dense networks. Based on the tractable expressions, we then offer an optimization framework for energy efficiency in terms of the number of antennas. Additionally, simulation results are shown to verify the accuracy of our analytical results and provide some insightful insights into system design. Analytically, it has been shown that: 1) the sum-rate of RSMA-enhanced multi-cell dense networks is significantly influenced by the power splitting ratio, and there is a unique value that maximizes the sum-rate; 2) the RSMA-enhanced multi-cell dense networks transmission scheme has superior sum-rate performance compared with Non-Orthogonal Multiple Access (NOMA) and Space-Division Multiple Access (SDMA) in a wide range of power splitting ratio; 3) By increasing the number of antennas and BS density in an RSMA-enhanced multi-cell dense network, the area spectral efficiency can be substantially enhanced; 4) As for energy efficiency, there exists an optimal antenna number for maximizing this performance metric. △ Less

Submitted 23 July, 2022; originally announced July 2022.

arXiv:2206.02405 [pdf, other]

Image Protection for Robust Crop** Localization and Recovery

Authors: Qichao Ying, Hang Zhou, Xiaoxiao Hu, Zhenxing Qian, Sheng Li, Xinpeng Zhang

Abstract: Existing image crop** detection schemes ignore that recovering the cropped-out contents can unveil the purpose of the behaved crop** attack. This paper presents \textbf{CLR}-Net, a novel image protection scheme addressing the combined challenge of image \textbf{C}rop** \textbf{L}ocalization and \textbf{R}ecovery. We first protect the original image by introducing imperceptible perturbations.… ▽ More Existing image crop** detection schemes ignore that recovering the cropped-out contents can unveil the purpose of the behaved crop** attack. This paper presents \textbf{CLR}-Net, a novel image protection scheme addressing the combined challenge of image \textbf{C}rop** \textbf{L}ocalization and \textbf{R}ecovery. We first protect the original image by introducing imperceptible perturbations. Then, typical image post-processing attacks are simulated to erode the protected image. On the recipient's side, we predict the crop** mask and recover the original image. Besides, we propose a novel \textbf{F}ine-\textbf{G}rained generative \textbf{JPEG} simulator (FG-JPEG) as well as a feature alignment network to improve the real-world robustness. Comprehensive experiments prove that the quality of the recovered image and the accuracy of crop localization are both satisfactory. △ Less

Submitted 14 March, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: Accepted by IEEE ICME 2023

arXiv:2112.14420 [pdf, other]

Invertible Image Dataset Protection

Authors: Kejiang Chen, Xianhan Zeng, Qichao Ying, Sheng Li, Zhenxing Qian, Xinpeng Zhang

Abstract: Deep learning has achieved enormous success in various industrial applications. Companies do not want their valuable data to be stolen by malicious employees to train pirated models. Nor do they wish the data analyzed by the competitors after using them online. We propose a novel solution for dataset protection in this scenario by robustly and reversibly transform the images into adversarial image… ▽ More Deep learning has achieved enormous success in various industrial applications. Companies do not want their valuable data to be stolen by malicious employees to train pirated models. Nor do they wish the data analyzed by the competitors after using them online. We propose a novel solution for dataset protection in this scenario by robustly and reversibly transform the images into adversarial images. We develop a reversible adversarial example generator (RAEG) that introduces slight changes to the images to fool traditional classification models. Even though malicious attacks train pirated models based on the defensed versions of the protected images, RAEG can significantly weaken the functionality of these models. Meanwhile, the reversibility of RAEG ensures the performance of authorized models. Extensive experiments demonstrate that RAEG can better protect the data with slight distortion against adversarial defense than previous methods. △ Less

Submitted 29 December, 2021; originally announced December 2021.

Comments: Submitted to ICME 2022. Authors are from University of Science and Technology of China, Fudan University, China. A potential extended version of this work is under way

arXiv:2103.15061 [pdf, other]

Invertible Image Signal Processing

Authors: Yazhou Xing, Zian Qian, Qifeng Chen

Abstract: Unprocessed RAW data is a highly valuable image format for image editing and computer vision. However, since the file size of RAW data is huge, most users can only get access to processed and compressed sRGB images. To bridge this gap, we design an Invertible Image Signal Processing (InvISP) pipeline, which not only enables rendering visually appealing sRGB images but also allows recovering nearly… ▽ More Unprocessed RAW data is a highly valuable image format for image editing and computer vision. However, since the file size of RAW data is huge, most users can only get access to processed and compressed sRGB images. To bridge this gap, we design an Invertible Image Signal Processing (InvISP) pipeline, which not only enables rendering visually appealing sRGB images but also allows recovering nearly perfect RAW data. Due to our framework's inherent reversibility, we can reconstruct realistic RAW data instead of synthesizing RAW data from sRGB images without any memory overhead. We also integrate a differentiable JPEG compression simulator that empowers our framework to reconstruct RAW data from JPEG images. Extensive quantitative and qualitative experiments on two DSLR demonstrate that our method obtains much higher quality in both rendered sRGB images and reconstructed RAW data than alternative methods. △ Less

Submitted 5 April, 2021; v1 submitted 28 March, 2021; originally announced March 2021.

Comments: Accepted to CVPR2021. Code available at: https://github.com/yzxing87/Invertible-ISP

arXiv:2103.13578 [pdf, other]

Test-Time Training for Deformable Multi-Scale Image Registration

Authors: Wentao Zhu, Yufang Huang, Daguang Xu, Zhen Qian, Wei Fan, Xiaohui Xie

Abstract: Registration is a fundamental task in medical robotics and is often a crucial step for many downstream tasks such as motion analysis, intra-operative tracking and image segmentation. Popular registration methods such as ANTs and NiftyReg optimize objective functions for each pair of images from scratch, which are time-consuming for 3D and sequential images with complex deformations. Recently, deep… ▽ More Registration is a fundamental task in medical robotics and is often a crucial step for many downstream tasks such as motion analysis, intra-operative tracking and image segmentation. Popular registration methods such as ANTs and NiftyReg optimize objective functions for each pair of images from scratch, which are time-consuming for 3D and sequential images with complex deformations. Recently, deep learning-based registration approaches such as VoxelMorph have been emerging and achieve competitive performance. In this work, we construct a test-time training for deep deformable image registration to improve the generalization ability of conventional learning-based registration model. We design multi-scale deep networks to consecutively model the residual deformations, which is effective for high variational deformations. Extensive experiments validate the effectiveness of multi-scale deep registration with test-time training based on Dice coefficient for image segmentation and mean square error (MSE), normalized local cross-correlation (NLCC) for tissue dense tracking tasks. Two videos are in https://www.youtube.com/watch?v=NvLrCaqCiAE and https://www.youtube.com/watch?v=pEA6ZmtTNuQ △ Less

Submitted 24 March, 2021; originally announced March 2021.

Comments: ICRA 2021; 8 pages, 4 figures, 2 big tables

Journal ref: ICRA 2021

arXiv:2010.15605 [pdf]

Manifold learning-based feature extraction for structural defect reconstruction

Authors: Qi Li, Dianzi Liu, Zhenghua Qian

Abstract: Data-driven quantitative defect reconstructions using ultrasonic guided waves has recently demonstrated great potential in the area of non-destructive testing. In this paper, we develop an efficient deep learning-based defect reconstruction framework, called NetInv, which recasts the inverse guided wave scattering problem as a data-driven supervised learning progress that realizes a map** betwee… ▽ More Data-driven quantitative defect reconstructions using ultrasonic guided waves has recently demonstrated great potential in the area of non-destructive testing. In this paper, we develop an efficient deep learning-based defect reconstruction framework, called NetInv, which recasts the inverse guided wave scattering problem as a data-driven supervised learning progress that realizes a map** between reflection coefficients in wavenumber domain and defect profiles in the spatial domain. The superiorities of the proposed NetInv over conventional reconstruction methods for defect reconstruction have been demonstrated by several examples. Results show that NetInv has the ability to achieve the higher quality of defect profiles with remarkable efficiency and provides valuable insight into the development of effective data driven structural health monitoring and defect reconstruction using machine learning. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 7 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2009.06276

ACM Class: J.2

arXiv:2005.14330 [pdf, other]

Bipartite Distance for Shape-Aware Landmark Detection in Spinal X-Ray Images

Authors: Abdullah-Al-Zubaer Imran, Chao Huang, Hui Tang, Wei Fan, Kenneth M. C. Cheung, Michael To, Zhen Qian, Demetri Terzopoulos

Abstract: Scoliosis is a congenital disease that causes lateral curvature in the spine. Its assessment relies on the identification and localization of vertebrae in spinal X-ray images, conventionally via tedious and time-consuming manual radiographic procedures that are prone to subjectivity and observational variability. Reliability can be improved through the automatic detection and localization of spina… ▽ More Scoliosis is a congenital disease that causes lateral curvature in the spine. Its assessment relies on the identification and localization of vertebrae in spinal X-ray images, conventionally via tedious and time-consuming manual radiographic procedures that are prone to subjectivity and observational variability. Reliability can be improved through the automatic detection and localization of spinal landmarks. To guide a CNN in the learning of spinal shape while detecting landmarks in X-ray images, we propose a novel loss based on a bipartite distance (BPD) measure, and show that it consistently improves landmark detection performance. △ Less

Submitted 28 May, 2020; originally announced May 2020.

Comments: Presented at Med-NeurIPS 2019

arXiv:2004.13587 [pdf, other]

Do We Need Fully Connected Output Layers in Convolutional Networks?

Authors: Zhongchao Qian, Tyler L. Hayes, Kushal Kafle, Christopher Kanan

Abstract: Traditionally, deep convolutional neural networks consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to perform the final classification. While this design has been successful, for datasets with a large number of categories, the fully connected layers often account for a large percentage of the network's parameters. For applications with mem… ▽ More Traditionally, deep convolutional neural networks consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to perform the final classification. While this design has been successful, for datasets with a large number of categories, the fully connected layers often account for a large percentage of the network's parameters. For applications with memory constraints, such as mobile devices and embedded platforms, this is not ideal. Recently, a family of architectures that involve replacing the learned fully connected output layer with a fixed layer has been proposed as a way to achieve better efficiency. In this paper we examine this idea further and demonstrate that fixed classifiers offer no additional benefit compared to simply removing the output layer along with its parameters. We further demonstrate that the typical approach of having a fully connected final output layer is inefficient in terms of parameter count. We are able to achieve comparable performance to a traditionally learned fully connected classification output layer on the ImageNet-1K, CIFAR-100, Stanford Cars-196, and Oxford Flowers-102 datasets, while not having a fully connected output layer at all. △ Less

Submitted 28 April, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

arXiv:2004.06887 [pdf, other]

Analysis of Scoliosis From Spinal X-Ray Images

Authors: Abdullah-Al-Zubaer Imran, Chao Huang, Hui Tang, Wei Fan, Kenneth M. C. Cheung, Michael To, Zhen Qian, Demetri Terzopoulos

Abstract: Scoliosis is a congenital disease in which the spine is deformed from its normal shape. Measurement of scoliosis requires labeling and identification of vertebrae in the spine. Spine radiographs are the most cost-effective and accessible modality for imaging the spine. Reliable and accurate vertebrae segmentation in spine radiographs is crucial in image-guided spinal assessment, disease diagnosis,… ▽ More Scoliosis is a congenital disease in which the spine is deformed from its normal shape. Measurement of scoliosis requires labeling and identification of vertebrae in the spine. Spine radiographs are the most cost-effective and accessible modality for imaging the spine. Reliable and accurate vertebrae segmentation in spine radiographs is crucial in image-guided spinal assessment, disease diagnosis, and treatment planning. Conventional assessments rely on tedious and time-consuming manual measurement, which is subject to inter-observer variability. A fully automatic method that can accurately identify and segment the associated vertebrae is unavailable in the literature. Leveraging a carefully-adjusted U-Net model with progressive side outputs, we propose an end-to-end segmentation model that provides a fully automatic and reliable segmentation of the vertebrae associated with scoliosis measurement. Our experimental results from a set of anterior-posterior spine X-Ray images indicate that our model, which achieves an average Dice score of 0.993, promises to be an effective tool in the identification and labeling of spinal vertebrae, eventually hel** doctors in the reliable estimation of scoliosis. Moreover, estimation of Cobb angles from the segmented vertebrae further demonstrates the effectiveness of our model. △ Less

Submitted 15 April, 2020; originally announced April 2020.

Comments: 6 pages, 6 figures, 3 tables

arXiv:2004.06718 [pdf, other]

Line Art Correlation Matching Feature Transfer Network for Automatic Animation Colorization

Authors: Zhang Qian, Wang Bo, Wen Wei, Li Hai, Liu Jun Hui

Abstract: Automatic animation line art colorization is a challenging computer vision problem, since the information of the line art is highly sparse and abstracted and there exists a strict requirement for the color and style consistency between frames. Recently, a lot of Generative Adversarial Network (GAN) based image-to-image translation methods for single line art colorization have emerged. They can gen… ▽ More Automatic animation line art colorization is a challenging computer vision problem, since the information of the line art is highly sparse and abstracted and there exists a strict requirement for the color and style consistency between frames. Recently, a lot of Generative Adversarial Network (GAN) based image-to-image translation methods for single line art colorization have emerged. They can generate perceptually appealing results conditioned on line art images. However, these methods can not be adopted for the purpose of animation colorization because there is a lack of consideration of the in-between frame consistency. Existing methods simply input the previous colored frame as a reference to color the next line art, which will mislead the colorization due to the spatial misalignment of the previous colored frame and the next line art especially at positions where apparent changes happen. To address these challenges, we design a kind of correlation matching feature transfer model (called CMFT) to align the colored reference feature in a learnable way and integrate the model into an U-Net based generator in a coarse-to-fine manner. This enables the generator to transfer the layer-wise synchronized features from the deep semantic code to the content progressively. Extension evaluation shows that CMFT model can effectively improve the in-between consistency and the quality of colored frames especially when the motion is intense and diverse. △ Less

Submitted 10 November, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

Comments: 8pages,6 figures

arXiv:1906.07357 [pdf, other]

Neural Multi-Scale Self-Supervised Registration for Echocardiogram Dense Tracking

Authors: Wentao Zhu, Yufang Huang, Mani A Vannan, Shizhen Liu, Daguang Xu, Wei Fan, Zhen Qian, Xiaohui Xie

Abstract: Echocardiography has become routinely used in the diagnosis of cardiomyopathy and abnormal cardiac blood flow. However, manually measuring myocardial motion and cardiac blood flow from echocardiogram is time-consuming and error-prone. Computer algorithms that can automatically track and quantify myocardial motion and cardiac blood flow are highly sought after, but have not been very successful due… ▽ More Echocardiography has become routinely used in the diagnosis of cardiomyopathy and abnormal cardiac blood flow. However, manually measuring myocardial motion and cardiac blood flow from echocardiogram is time-consuming and error-prone. Computer algorithms that can automatically track and quantify myocardial motion and cardiac blood flow are highly sought after, but have not been very successful due to noise and high variability of echocardiography. In this work, we propose a neural multi-scale self-supervised registration (NMSR) method for automated myocardial and cardiac blood flow dense tracking. NMSR incorporates two novel components: 1) utilizing a deep neural net to parameterize the velocity field between two image frames, and 2) optimizing the parameters of the neural net in a sequential multi-scale fashion to account for large variations within the velocity field. Experiments demonstrate that NMSR yields significantly better registration accuracy than state-of-the-art methods, such as advanced normalization tools (ANTs) and VoxelMorph, for both myocardial and cardiac blood flow dense tracking. Our approach promises to provide a fully automated method for fast and accurate analyses of echocardiograms. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Comments: Blood tracking video: https://youtu.be/pEA6ZmtTNuQ Muscle tracking video: https://youtu.be/NvLrCaqCiAE

arXiv:1901.10068 [pdf, other]

doi 10.1016/j.trc.2017.12.015

Statistical inference of probabilistic origin-destination demand using day-to-day traffic data

Authors: Wei Ma, Zhen Qian

Abstract: Recent transportation network studies on uncertainty and reliability call for modeling the probabilistic O-D demand and probabilistic network flow. Making the best use of day-to-day traffic data collected over many years, this paper develops a novel theoretical framework for estimating the mean and variance/covariance matrix of O-D demand considering the day-to-day variation induced by travelers'… ▽ More Recent transportation network studies on uncertainty and reliability call for modeling the probabilistic O-D demand and probabilistic network flow. Making the best use of day-to-day traffic data collected over many years, this paper develops a novel theoretical framework for estimating the mean and variance/covariance matrix of O-D demand considering the day-to-day variation induced by travelers' independent route choices. It also estimates the probability distributions of link/path flow and their travel cost where the variance stems from three sources, O-D demand, route choice and unknown errors. The framework estimates O-D demand mean and variance/covariance matrix iteratively, also known as iterative generalized least squares (IGLS) in statistics. Lasso regularization is employed to obtain sparse covariance matrix for better interpretation and computational efficiency. Though the probabilistic O-D estimation (ODE) works with a much larger solution space than the deterministic ODE, we show that its estimator for O-D demand mean is no worse than the best possible estimator by an error that reduces with the increase in sample size. The probabilistic ODE is examined on two small networks and two real-world large-scale networks. The solution converges quickly under the IGLS framework. In all those experiments, the results of the probabilistic ODE are compelling, satisfactory and computationally plausible. Lasso regularization on the covariance matrix estimation leans to underestimate most of variance/covariance entries. A proper Lasso penalty ensures a good trade-off between bias and variance of the estimation. △ Less

Submitted 28 January, 2019; originally announced January 2019.

Journal ref: Transportation Research Part C: Emerging Technologies 88 (2018): 227-256

Showing 1–20 of 20 results for author: Qian, Z