Search | arXiv e-print repository

arXiv:2405.16248 [pdf]

Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD

Authors: Junlin Song, Yuzhuo Chen, Yuan Yao, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li

Abstract: Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully u… ▽ More Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully utilized. To address this gap, we develop a computer-aided diagnostic model focusing on white matter regions in brain MRI by employing radiomics and machine learning methods. This study introduced a MultiUNet model for segmenting white matter, leveraging the UNet architecture and utilizing manually segmented MRI images as the training data. Subsequently, we extracted white matter features using the Pyradiomics toolkit and applied different machine learning models such as Support Vector Machine, Random Forest, Logistic Regression, and K-Nearest Neighbors to predict autism. The prediction sets all exceeded 80% accuracy. Additionally, we employed Convolutional Neural Network to analyze segmented white matter images, achieving a prediction accuracy of 86.84%. Notably, Support Vector Machine demonstrated the highest prediction accuracy at 89.47%. These findings not only underscore the efficacy of the models but also establish a link between white matter abnormalities and autism. Our study contributes to a comprehensive evaluation of various diagnostic models for autism and introduces a computer-aided diagnostic algorithm for early and objective autism diagnosis based on MRI white matter regions. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2401.03375 [pdf, other]

doi 10.1109/TITS.2023.3343196

Real-Time Asphalt Pavement Layer Thickness Prediction Using Ground-Penetrating Radar Based on a Modified Extended Common Mid-Point (XCMP) Approach

Authors: Siqi Wang, Zhen Leng, Xin Sui, Weiguang Zhang, Tao Ma, Zehui Zhu

Abstract: The conventional surface reflection method has been widely used to measure the asphalt pavement layer dielectric constant using ground-penetrating radar (GPR). This method may be inaccurate for in-service pavement thickness estimation with dielectric constant variation through the depth, which could be addressed using the extended common mid-point method (XCMP) with air-coupled GPR antennas. Howev… ▽ More The conventional surface reflection method has been widely used to measure the asphalt pavement layer dielectric constant using ground-penetrating radar (GPR). This method may be inaccurate for in-service pavement thickness estimation with dielectric constant variation through the depth, which could be addressed using the extended common mid-point method (XCMP) with air-coupled GPR antennas. However, the factors affecting the XCMP method on thickness prediction accuracy haven't been studied. Manual acquisition of key factors is required, which hinders its real-time applications. This study investigates the affecting factors and develops a modified XCMP method to allow automatic thickness prediction of in-service asphalt pavement with non-uniform dielectric properties through depth. A sensitivity analysis was performed, necessitating the accurate estimation of time of flights (TOFs) from antenna pairs. A modified XCMP method based on edge detection was proposed to allow real-time TOFs estimation, then dielectric constant and thickness predictions. Field tests using a multi-channel GPR system were performed for validation. Both the surface reflection and XCMP setups were conducted. Results show that the modified XCMP method is recommended with a mean prediction error of 1.86%, which is more accurate than the surface reflection method (5.73%). △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: IEEE Transactions on Intelligent Transportation Systems (2024)

arXiv:2311.03062 [pdf]

Imaging through multimode fibres with physical prior

Authors: Chuncheng Zhang, Yingjie Shi, Zheyi Yao, Xiubao Sui, Qian Chen

Abstract: Imaging through perturbed multimode fibres based on deep learning has been widely researched. However, existing methods mainly use target-speckle pairs in different configurations. It is challenging to reconstruct targets without trained networks. In this paper, we propose a physics-assisted, unsupervised, learning-based fibre imaging scheme. The role of the physical prior is to simplify the mappi… ▽ More Imaging through perturbed multimode fibres based on deep learning has been widely researched. However, existing methods mainly use target-speckle pairs in different configurations. It is challenging to reconstruct targets without trained networks. In this paper, we propose a physics-assisted, unsupervised, learning-based fibre imaging scheme. The role of the physical prior is to simplify the map** relationship between the speckle pattern and the target image, thereby reducing the computational complexity. The unsupervised network learns target features according to the optimized direction provided by the physical prior. Therefore, the reconstruction process of the online learning only requires a few speckle patterns and unpaired targets. The proposed scheme also increases the generalization ability of the learning-based method in perturbed multimode fibres. Our scheme has the potential to extend the application of multimode fibre imaging. △ Less

Submitted 13 November, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2309.03472 [pdf, other]

Perceptual Quality Assessment of 360$^\circ$ Images Based on Generative Scanpath Representation

Authors: Xiangjie Sui, Hanwei Zhu, Xuelin Liu, Yuming Fang, Shiqi Wang, Zhou Wang

Abstract: Despite substantial efforts dedicated to the design of heuristic models for omnidirectional (i.e., 360$^\circ$) image quality assessment (OIQA), a conspicuous gap remains due to the lack of consideration for the diversity of viewing behaviors that leads to the varying perceptual quality of 360$^\circ$ images. Two critical aspects underline this oversight: the neglect of viewing conditions that sig… ▽ More Despite substantial efforts dedicated to the design of heuristic models for omnidirectional (i.e., 360$^\circ$) image quality assessment (OIQA), a conspicuous gap remains due to the lack of consideration for the diversity of viewing behaviors that leads to the varying perceptual quality of 360$^\circ$ images. Two critical aspects underline this oversight: the neglect of viewing conditions that significantly sway user gaze patterns and the overreliance on a single viewport sequence from the 360$^\circ$ image for quality inference. To address these issues, we introduce a unique generative scanpath representation (GSR) for effective quality inference of 360$^\circ$ images, which aggregates varied perceptual experiences of multi-hypothesis users under a predefined viewing condition. More specifically, given a viewing condition characterized by the starting point of viewing and exploration time, a set of scanpaths consisting of dynamic visual fixations can be produced using an apt scanpath generator. Following this vein, we use the scanpaths to convert the 360$^\circ$ image into the unique GSR, which provides a global overview of gazed-focused contents derived from scanpaths. As such, the quality inference of the 360$^\circ$ image is swiftly transformed to that of GSR. We then propose an efficient OIQA computational framework by learning the quality maps of GSR. Comprehensive experimental results validate that the predictions of the proposed framework are highly consistent with human perception in the spatiotemporal domain, especially in the challenging context of locally distorted 360$^\circ$ images under varied viewing conditions. The code will be released at https://github.com/xiangjieSui/GSR △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 12 pages, 5 figures

arXiv:2307.05138 [pdf]

Super-resolution imaging through a multimode fiber: the physical upsampling of speckle-driven

Authors: Chuncheng Zhang, Tingting Liu, Zhihua Xie, Yu Wang, Tong Liu, Qian Chen, Xiubao Sui

Abstract: Following recent advancements in multimode fiber (MMF), miniaturization of imaging endoscopes has proven crucial for minimally invasive surgery in vivo. Recent progress enabled by super-resolution imaging methods with a data-driven deep learning (DL) framework has balanced the relationship between the core size and resolution. However, most of the DL approaches lack attention to the physical prope… ▽ More Following recent advancements in multimode fiber (MMF), miniaturization of imaging endoscopes has proven crucial for minimally invasive surgery in vivo. Recent progress enabled by super-resolution imaging methods with a data-driven deep learning (DL) framework has balanced the relationship between the core size and resolution. However, most of the DL approaches lack attention to the physical properties of the speckle, which is crucial for reconciling the relationship between the magnification of super-resolution imaging and the quality of reconstruction quality. In the paper, we find that the interferometric process of speckle formation is an essential basis for creating DL models with super-resolution imaging. It physically realizes the upsampling of low-resolution (LR) images and enhances the perceptual capabilities of the models. The finding experimentally validates the role played by the physical upsampling of speckle-driven, effectively complementing the lack of information in data-driven. Experimentally, we break the restriction of the poor reconstruction quality at great magnification by inputting the same size of the speckle with the size of the high-resolution (HR) image to the model. The guidance of our research for endoscopic imaging may accelerate the further development of minimally invasive surgery. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2304.04589 [pdf]

Hyperspectral Image Super-Resolution via Dual-domain Network Based on Hybrid Convolution

Authors: Tingting Liu, Yuan Liu, Chuncheng Zhang, Yuan Liyin, Xiubao Sui, Qian Chen

Abstract: Since the number of incident energies is limited, it is difficult to directly acquire hyperspectral images (HSI) with high spatial resolution. Considering the high dimensionality and correlation of HSI, super-resolution (SR) of HSI remains a challenge in the absence of auxiliary high-resolution images. Furthermore, it is very important to extract the spatial features effectively and make full use… ▽ More Since the number of incident energies is limited, it is difficult to directly acquire hyperspectral images (HSI) with high spatial resolution. Considering the high dimensionality and correlation of HSI, super-resolution (SR) of HSI remains a challenge in the absence of auxiliary high-resolution images. Furthermore, it is very important to extract the spatial features effectively and make full use of the spectral information. This paper proposes a novel HSI super-resolution algorithm, termed dual-domain network based on hybrid convolution (SRDNet). Specifically, a dual-domain network is designed to fully exploit the spatial-spectral and frequency information among the hyper-spectral data. To capture inter-spectral self-similarity, a self-attention learning mechanism (HSL) is devised in the spatial domain. Meanwhile the pyramid structure is applied to increase the acceptance field of attention, which further reinforces the feature representation ability of the network. Moreover, to further improve the perceptual quality of HSI, a frequency loss(HFL) is introduced to optimize the model in the frequency domain. The dynamic weighting mechanism drives the network to gradually refine the generated frequency and excessive smoothing caused by spatial loss. Finally, In order to better fully obtain the map** relationship between high-resolution space and low-resolution space, a hybrid module of 2D and 3D units with progressive upsampling strategy is utilized in our method. Experiments on a widely used benchmark dataset illustrate that the proposed SRDNet method enhances the texture information of HSI and is superior to state-of-the-art methods. △ Less

Submitted 14 July, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

arXiv:2211.16796 [pdf]

doi 10.1007/s11760-023-02641-9

Gradient Domain Weighted Guided Image Filtering

Authors: Bo Wang, Yihong Wang, Xiubao Sui, Yuan Liu, Qian Chen

Abstract: Guided image filter is a well-known local filter in image processing. However, the presence of halo artifacts is a common issue associated with this type of filter. This paper proposes an algorithm that utilizes gradient information to accurately identify the edges of an image. Furthermore, the algorithm uses weighted information to distinguish flat areas from edge areas, resulting in sharper edge… ▽ More Guided image filter is a well-known local filter in image processing. However, the presence of halo artifacts is a common issue associated with this type of filter. This paper proposes an algorithm that utilizes gradient information to accurately identify the edges of an image. Furthermore, the algorithm uses weighted information to distinguish flat areas from edge areas, resulting in sharper edges and reduced blur in flat areas. This approach mitigates the excessive blurring near edges that often leads to halo artifacts. Experimental results demonstrate that the proposed algorithm significantly suppresses halo artifacts at the edges, making it highly effective for both image denoising and detail enhancement. △ Less

Submitted 2 June, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

arXiv:2207.13334 [pdf]

Fast optical refocusing through multimode fiber bend using Cake-Cutting Hadamard encoding algorithm to improve robustness

Authors: Chuncheng Zhang, Zheyi Yao, Zhengyue Qin, Guohua Gu, Qian Chen, Zhihua Xie, Guodong Liu, Xiubao Sui

Abstract: Multimode fibres offer the advantages of high resolution and miniaturization over single mode fibers in the field of optical imaging. However, multimode fibre's imaging is susceptible to perturbations of MMF that can lead to secondary spatial distortions in the transmitted image. Perturbations include random disturbances in the fiber as well as environmental noise. Here, we exploit the fast focusi… ▽ More Multimode fibres offer the advantages of high resolution and miniaturization over single mode fibers in the field of optical imaging. However, multimode fibre's imaging is susceptible to perturbations of MMF that can lead to secondary spatial distortions in the transmitted image. Perturbations include random disturbances in the fiber as well as environmental noise. Here, we exploit the fast focusing capability of the Cake-Cutting Hadamard coding algorithm to counteract the effects of perturbations and improve the system's robustness. Simulation shows that it can approach the theoretical enhancement at 2000 measurements. Experimental results show that the algorithm can help the system to refocus in a short time when MMFs are perturbed. This research will further contribute to using multimode fibres in medicine, communication, and detection. △ Less

Submitted 27 July, 2022; originally announced July 2022.

arXiv:2206.08751 [pdf, other]

Perceptual Quality Assessment of Virtual Reality Videos in the Wild

Authors: Wen Wen, Mu Li, Yiru Yao, Xiangjie Sui, Yabin Zhang, Long Lan, Yuming Fang, Kede Ma

Abstract: Investigating how people perceive virtual reality (VR) videos in the wild (i.e., those captured by everyday users) is a crucial and challenging task in VR-related applications due to complex authentic distortions localized in space and time. Existing panoramic video databases only consider synthetic distortions, assume fixed viewing conditions, and are limited in size. To overcome these shortcomin… ▽ More Investigating how people perceive virtual reality (VR) videos in the wild (i.e., those captured by everyday users) is a crucial and challenging task in VR-related applications due to complex authentic distortions localized in space and time. Existing panoramic video databases only consider synthetic distortions, assume fixed viewing conditions, and are limited in size. To overcome these shortcomings, we construct the VR Video Quality in the Wild (VRVQW) database, containing $502$ user-generated videos with diverse content and distortion characteristics. Based on VRVQW, we conduct a formal psychophysical experiment to record the scanpaths and perceived quality scores from $139$ participants under two different viewing conditions. We provide a thorough statistical analysis of the recorded data, observing significant impact of viewing conditions on both human scanpaths and perceived quality. Moreover, we develop an objective quality assessment model for VR videos based on pseudocylindrical representation and convolution. Results on the proposed VRVQW show that our method is superior to existing video quality assessment models. We have made the database and code available at https://github.com/limuhit/VR-Video-Quality-in-the-Wild. △ Less

Submitted 15 March, 2024; v1 submitted 12 June, 2022; originally announced June 2022.

Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology

arXiv:2201.09432 [pdf]

Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition

Authors: Xurong Xie, Xiang Sui, Xunying Liu, Lan Wang

Abstract: The Mandarin Chinese language is known to be strongly influenced by a rich set of regional accents, while Mandarin speech with each accent is quite low resource. Hence, an important task in Mandarin speech recognition is to appropriately model the acoustic variabilities imposed by accents. In this paper, an investigation of implicit and explicit use of accent information on a range of deep neural… ▽ More The Mandarin Chinese language is known to be strongly influenced by a rich set of regional accents, while Mandarin speech with each accent is quite low resource. Hence, an important task in Mandarin speech recognition is to appropriately model the acoustic variabilities imposed by accents. In this paper, an investigation of implicit and explicit use of accent information on a range of deep neural network (DNN) based acoustic modelling techniques is conducted. Meanwhile, approaches of multi-accent modelling including multi-style training, multi-accent decision tree state tying, DNN tandem and multi-level adaptive network (MLAN) tandem hidden Markov model (HMM) modelling are combined and compared in this paper. On a low resource accented Mandarin speech recognition task consisting of four regional accents, an improved MLAN tandem HMM systems explicitly leveraging the accent information was proposed and significantly outperformed the baseline accent independent DNN tandem systems by 0.8%-1.5% absolute (6%-9% relative) in character error rate after sequence level discriminative training and adaptation. △ Less

Submitted 14 June, 2024; v1 submitted 23 January, 2022; originally announced January 2022.

Comments: Published in JOURNAL OF INTEGRATION TECHNOLOGY CNKI:SUN:JCJI.0.2015-06-003

Journal ref: JOURNAL OF INTEGRATION TECHNOLOGY, Vol. 4, No. 6, Nov. 2015

arXiv:2105.09511 [pdf, other]

Medical Image Segmentation Using Squeeze-and-Expansion Transformers

Authors: Shaohua Li, Xiuchao Sui, Xiangde Luo, Xinxing Xu, Yong Liu, Rick Goh

Abstract: Medical image segmentation is important for computer-aided diagnosis. Good segmentation demands the model to see the big picture and fine details simultaneously, i.e., to learn image features that incorporate large context while keep high spatial resolutions. To approach this goal, the most widely used methods -- U-Net and variants, extract and fuse multi-scale features. However, the fused feature… ▽ More Medical image segmentation is important for computer-aided diagnosis. Good segmentation demands the model to see the big picture and fine details simultaneously, i.e., to learn image features that incorporate large context while keep high spatial resolutions. To approach this goal, the most widely used methods -- U-Net and variants, extract and fuse multi-scale features. However, the fused features still have small "effective receptive fields" with a focus on local image cues, limiting their performance. In this work, we propose Segtran, an alternative segmentation framework based on transformers, which have unlimited "effective receptive fields" even at high feature resolutions. The core of Segtran is a novel Squeeze-and-Expansion transformer: a squeezed attention block regularizes the self attention of transformers, and an expansion block learns diversified representations. Additionally, we propose a new positional encoding scheme for transformers, imposing a continuity inductive bias for images. Experiments were performed on 2D and 3D medical image segmentation tasks: optic disc/cup segmentation in fundus images (REFUGE'20 challenge), polyp segmentation in colonoscopy images, and brain tumor segmentation in MRI scans (BraTS'19 challenge). Compared with representative existing methods, Segtran consistently achieved the highest segmentation accuracy, and exhibited good cross-domain generalization capabilities. The source code of Segtran is released at https://github.com/askerlee/segtran. △ Less

Submitted 1 June, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: Camera ready for IJCAI'2021

arXiv:2103.07862 [pdf]

doi 10.1016/j.ijleo.2021.168043

High speed and reconfigurable optronic neural network with digital nonlinear activation

Authors: Qiuhao Wu, Jia Liu, Xiubao Sui, Li** Wang, Qian Chen

Abstract: With its unique parallel processing capability, optical neural network has shown low-power consumption in image recognition and speech processing. At present, the manufacturing technology of programmable photonic chip is not mature, and the realization of optical neural network in free-space is still a hot spot of intelligent optical computing. In this article, based on MNIST datasets and 4f syste… ▽ More With its unique parallel processing capability, optical neural network has shown low-power consumption in image recognition and speech processing. At present, the manufacturing technology of programmable photonic chip is not mature, and the realization of optical neural network in free-space is still a hot spot of intelligent optical computing. In this article, based on MNIST datasets and 4f system, three-layer optical neural networks are constructed, whose recognition accuracy can reach 93.66%. Our network is programmable, high speed, reconfigurable and is better than the existing free-space optical neural network in terms of spatial complexity. △ Less

Submitted 7 May, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

arXiv:2005.10547 [pdf, other]

Perceptual Quality Assessment of Omnidirectional Images as Moving Camera Videos

Authors: Xiangjie Sui, Kede Ma, Yiru Yao, Yuming Fang

Abstract: Omnidirectional images (also referred to as static 360° panoramas) impose viewing conditions much different from those of regular 2D images. How do humans perceive image distortions in immersive virtual reality (VR) environments is an important problem which receives less attention. We argue that, apart from the distorted panorama itself, two types of VR viewing conditions are crucial in determini… ▽ More Omnidirectional images (also referred to as static 360° panoramas) impose viewing conditions much different from those of regular 2D images. How do humans perceive image distortions in immersive virtual reality (VR) environments is an important problem which receives less attention. We argue that, apart from the distorted panorama itself, two types of VR viewing conditions are crucial in determining the viewing behaviors of users and the perceived quality of the panorama: the starting point and the exploration time. We first carry out a psychophysical experiment to investigate the interplay among the VR viewing conditions, the user viewing behaviors, and the perceived quality of 360° images. Then, we provide a thorough analysis of the collected human data, leading to several interesting findings. Moreover, we propose a computational framework for objective quality assessment of 360° images, embodying viewing conditions and behaviors in a delightful way. Specifically, we first transform an omnidirectional image to several video representations using different user viewing behaviors under different viewing conditions. We then leverage advanced 2D full-reference video quality models to compute the perceived quality. We construct a set of specific quality measures within the proposed framework, and demonstrate their promises on three VR quality databases. △ Less

Submitted 4 January, 2021; v1 submitted 21 May, 2020; originally announced May 2020.

Comments: 11 pages, 11 figure, 9 tables. This paper has been accepted by IEEE Transactions on Visualization and Computer Graphics

arXiv:2004.05554 [pdf, other]

Feature Lenses: Plug-and-play Neural Modules for Transformation-Invariant Visual Representations

Authors: Shaohua Li, Xiuchao Sui, Jie Fu, Yong Liu, Rick Siow Mong Goh

Abstract: Convolutional Neural Networks (CNNs) are known to be brittle under various image transformations, including rotations, scalings, and changes of lighting conditions. We observe that the features of a transformed image are drastically different from the ones of the original image. To make CNNs more invariant to transformations, we propose "Feature Lenses", a set of ad-hoc modules that can be easily… ▽ More Convolutional Neural Networks (CNNs) are known to be brittle under various image transformations, including rotations, scalings, and changes of lighting conditions. We observe that the features of a transformed image are drastically different from the ones of the original image. To make CNNs more invariant to transformations, we propose "Feature Lenses", a set of ad-hoc modules that can be easily plugged into a trained model (referred to as the "host model"). Each individual lens reconstructs the original features given the features of a transformed image under a particular transformation. These lenses jointly counteract feature distortions caused by various transformations, thus making the host model more robust without retraining. By only updating lenses, the host model is freed from iterative updating when facing new transformations absent in the training data; as feature semantics are preserved, downstream applications, such as classifiers and detectors, automatically gain robustness without retraining. Lenses are trained in a self-supervised fashion with no annotations, by minimizing a novel "Top-K Activation Contrast Loss" between lens-transformed features and original features. Evaluated on ImageNet, MNIST-rot, and CIFAR-10, Feature Lenses show clear advantages over baseline methods. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: 20 pages

Showing 1–14 of 14 results for author: Sui, X