Search | arXiv e-print repository

MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction

Authors: Luhui Cai, Weiming Zeng, Hongyu Chen, Hua Zhang, Yueyang Li, Hongjie Yan, Lingbin Bian, Nizhuan Wang

Abstract: Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain… ▽ More Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain interactions between imaging and non-imaging data to node-edge interactions within the graph, overlooking complex inter-modal correlations, leading to suboptimal outcomes. To overcome these challenges, we propose MM-GTUNets, an end-to-end graph transformer based multi-modal graph deep learning (MMGDL) framework designed for brain disorders prediction at large scale. Specifically, to effectively leverage rich multi-modal information related to diseases, we introduce Modality Reward Representation Learning (MRRL) which adaptively constructs population graphs using a reward system. Additionally, we employ variational autoencoder to reconstruct latent representations of non-imaging features aligned with imaging features. Based on this, we propose Adaptive Cross-Modal Graph Learning (ACMGL), which captures critical modality-specific and modality-shared features through a unified GTUNet encoder taking advantages of Graph UNet and Graph Transformer, and feature fusion module. We validated our method on two public multi-modal datasets ABIDE and ADHD-200, demonstrating its superior performance in diagnosing BDs. Our code is available at https://github.com/NZWANG/MM-GTUNets. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.03702 [pdf, other]

DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation

Authors: Zilu Guo, Liuyang Bian, Xuan Huang, Hu Wei, **gyu Li, Huasheng Ni

Abstract: Atrous convolutions are employed as a method to increase the receptive field in semantic segmentation tasks. However, in previous works of semantic segmentation, it was rarely employed in the shallow layers of the model. We revisit the design of atrous convolutions in modern convolutional neural networks (CNNs), and demonstrate that the concept of using large kernels to apply atrous convolutions c… ▽ More Atrous convolutions are employed as a method to increase the receptive field in semantic segmentation tasks. However, in previous works of semantic segmentation, it was rarely employed in the shallow layers of the model. We revisit the design of atrous convolutions in modern convolutional neural networks (CNNs), and demonstrate that the concept of using large kernels to apply atrous convolutions could be a more powerful paradigm. We propose three guidelines to apply atrous convolutions more efficiently. Following these guidelines, we propose DSNet, a Dual-Branch CNN architecture, which incorporates atrous convolutions in the shallow layers of the model architecture, as well as pretraining the nearly entire encoder on ImageNet to achieve better performance. To demonstrate the effectiveness of our approach, our models achieve a new state-of-the-art trade-off between accuracy and speed on ADE20K, Cityscapes and BDD datasets. Specifically, DSNet achieves 40.0% mIOU with inference speed of 179.2 FPS on ADE20K, and 80.4% mIOU with speed of 81.9 FPS on Cityscapes. Source code and models are available at Github: https://github.com/takaniwa/DSNet. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2312.08886 [pdf, other]

Diffusion-based Blind Text Image Super-Resolution

Authors: Yuzhe Zhang, Jiawei Zhang, Hao Li, Zhouxia Wang, Luwei Hou, Dongqing Zou, Liheng Bian

Abstract: Recovering degraded low-resolution text images is challenging, especially for Chinese text images with complex strokes and severe degradation in real-world scenarios. Ensuring both text fidelity and style realness is crucial for high-quality text image super-resolution. Recently, diffusion models have achieved great success in natural image synthesis and restoration due to their powerful data dist… ▽ More Recovering degraded low-resolution text images is challenging, especially for Chinese text images with complex strokes and severe degradation in real-world scenarios. Ensuring both text fidelity and style realness is crucial for high-quality text image super-resolution. Recently, diffusion models have achieved great success in natural image synthesis and restoration due to their powerful data distribution modeling abilities and data generation capabilities. In this work, we propose an Image Diffusion Model (IDM) to restore text images with realistic styles. For diffusion models, they are not only suitable for modeling realistic image distribution but also appropriate for learning text distribution. Since text prior is important to guarantee the correctness of the restored text structure according to existing arts, we also propose a Text Diffusion Model (TDM) for text recognition which can guide IDM to generate text images with correct structures. We further propose a Mixture of Multi-modality module (MoM) to make these two diffusion models cooperate with each other in all the diffusion steps. Extensive experiments on synthetic and real-world datasets demonstrate that our Diffusion-based Blind Text Image Super-Resolution (DiffTSR) can restore text images with more accurate text structures as well as more realistic appearances simultaneously. △ Less

Submitted 3 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR2024

arXiv:2312.06686 [pdf, other]

Robo360: A 3D Omnispective Multi-Material Robotic Manipulation Dataset

Authors: Litian Liang, Liuyu Bian, Caiwei Xiao, Jialin Zhang, Linghao Chen, Isabella Liu, Fanbo Xiang, Zhiao Huang, Hao Su

Abstract: Building robots that can automate labor-intensive tasks has long been the core motivation behind the advancements in computer vision and the robotics community. Recent interest in leveraging 3D algorithms, particularly neural fields, has led to advancements in robot perception and physical understanding in manipulation scenarios. However, the real world's complexity poses significant challenges. T… ▽ More Building robots that can automate labor-intensive tasks has long been the core motivation behind the advancements in computer vision and the robotics community. Recent interest in leveraging 3D algorithms, particularly neural fields, has led to advancements in robot perception and physical understanding in manipulation scenarios. However, the real world's complexity poses significant challenges. To tackle these challenges, we present Robo360, a dataset that features robotic manipulation with a dense view coverage, which enables high-quality 3D neural representation learning, and a diverse set of objects with various physical and optical properties and facilitates research in various object manipulation and physical world modeling tasks. We confirm the effectiveness of our dataset using existing dynamic NeRF and evaluate its potential in learning multi-view policies. We hope that Robo360 can open new research directions yet to be explored at the intersection of understanding the physical world in 3D and robot control. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2309.09427 [pdf, other]

TransTouch: Learning Transparent Objects Depth Sensing Through Sparse Touches

Authors: Liuyu Bian, Pengyang Shi, Weihang Chen, **g Xu, Li Yi, Rui Chen

Abstract: Transparent objects are common in daily life. However, depth sensing for transparent objects remains a challenging problem. While learning-based methods can leverage shape priors to improve the sensing quality, the labor-intensive data collection in the real world and the sim-to-real domain gap restrict these methods' scalability. In this paper, we propose a method to finetune a stereo network wit… ▽ More Transparent objects are common in daily life. However, depth sensing for transparent objects remains a challenging problem. While learning-based methods can leverage shape priors to improve the sensing quality, the labor-intensive data collection in the real world and the sim-to-real domain gap restrict these methods' scalability. In this paper, we propose a method to finetune a stereo network with sparse depth labels automatically collected using a probing system with tactile feedback. We present a novel utility function to evaluate the benefit of touches. By approximating and optimizing the utility function, we can optimize the probing locations given a fixed touching budget to better improve the network's performance on real objects. We further combine tactile depth supervision with a confidence-based regularization to prevent over-fitting during finetuning. To evaluate the effectiveness of our method, we construct a real-world dataset including both diffuse and transparent objects. Experimental results on this dataset show that our method can significantly improve real-world depth sensing accuracy, especially for transparent objects. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: Accepted to the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2305.15750 [pdf, other]

Towards Large-scale Single-shot Millimeter-wave Imaging for Low-cost Security Inspection

Authors: Liheng Bian, Daoyu Li, Shuoguang Wang, Chunyang Teng, Huteng Liu, Hanwen Xu, Xuyang Chang, Guoqiang Zhao, Shiyong Li, Jun Zhang

Abstract: Millimeter-wave (MMW) imaging is emerging as a promising technique for safe security inspection. It achieves a delicate balance between imaging resolution, penetrability and human safety, resulting in higher resolution compared to low-frequency microwave, stronger penetrability compared to visible light, and stronger safety compared to X ray. Despite of recent advance in the last decades, the high… ▽ More Millimeter-wave (MMW) imaging is emerging as a promising technique for safe security inspection. It achieves a delicate balance between imaging resolution, penetrability and human safety, resulting in higher resolution compared to low-frequency microwave, stronger penetrability compared to visible light, and stronger safety compared to X ray. Despite of recent advance in the last decades, the high cost of requisite large-scale antenna array hinders widespread adoption of MMW imaging in practice. To tackle this challenge, we report a large-scale single-shot MMW imaging framework using sparse antenna array, achieving low-cost but high-fidelity security inspection under an interpretable learning scheme. We first collected extensive full-sampled MMW echoes to study the statistical ranking of each element in the large-scale array. These elements are then sampled based on the ranking, building the experimentally optimal sparse sampling strategy that reduces the cost of antenna array by up to one order of magnitude. Additionally, we derived an untrained interpretable learning scheme, which realizes robust and accurate image reconstruction from sparsely sampled echoes. Last, we developed a neural network for automatic object detection, and experimentally demonstrated successful detection of concealed centimeter-sized targets using 10% sparse array, whereas all the other contemporary approaches failed at the same sample sampling ratio. The performance of the reported technique presents higher than 50% superiority over the existing MMW imaging schemes on various metrics including precision, recall, and mAP50. With such strong detection ability and order-of-magnitude cost reduction, we anticipate that this technique provides a practical way for large-scale single-shot MMW imaging, and could advocate its further practical applications. △ Less

Submitted 18 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2301.06084 [pdf]

Scattering-induced entropy boost for highly-compressed optical sensing and encryption

Authors: Liheng Bian, Xinrui Zhan, Xuyang Chang, Daoyu Li, Rong Yan, Yinuo Zhang, Haowen Ruan, Jun Zhang

Abstract: Image classification often relies on a high-quality machine vision system with a large view field and high resolution, demanding fine imaging optics, heavy computational costs, and large communication bandwidths between an image sensor and a computing unit. Here, we report a novel image-free sensing framework for resource efficient image classification where the required number of measurements can… ▽ More Image classification often relies on a high-quality machine vision system with a large view field and high resolution, demanding fine imaging optics, heavy computational costs, and large communication bandwidths between an image sensor and a computing unit. Here, we report a novel image-free sensing framework for resource efficient image classification where the required number of measurements can be reduced by up to two orders of magnitude. In the proposed framework of single-pixel detection, the optical field from a target is first scattered by an optical diffuser and then two-dimensionally modulated by a spatial light modulator. The optical diffuser simultaneously serves as a compressor and an encryptor for the target information, effectively narrowing the view field and improving the system's security. The one-dimensional sequence of intensity values, measured with time-varying patterns on the spatial light modulator, is then used to extract semantic information based on end-to-end deep learning. The proposed sensing framework is shown to provide over 95 percentage accuracy with the sampling rate of 1 percentage and 5 percentage, respectively, for the classification of MNIST dataset and the recognition of Chinese license plate, which was up to 24 percentage more efficient compared with the case without an optical diffuser. The proposed framework represents a significant breakthrough in realizing high-throughput machine intelligence for scene analysis, with low-bandwidth, low cost, and strong encryption. △ Less

Submitted 16 December, 2022; originally announced January 2023.

arXiv:2301.03047 [pdf, other]

Large-scale Global Low-rank Optimization for Computational Compressed Imaging

Authors: Daoyu Li, Hanwen Xu, Miao Cao, Xin Yuan, David J. Brady, Liheng Bian

Abstract: Computational reconstruction plays a vital role in computer vision and computational photography. Most of the conventional optimization and deep learning techniques explore local information for reconstruction. Recently, nonlocal low-rank (NLR) reconstruction has achieved remarkable success in improving accuracy and generalization. However, the computational cost has inhibited NLR from seeking glo… ▽ More Computational reconstruction plays a vital role in computer vision and computational photography. Most of the conventional optimization and deep learning techniques explore local information for reconstruction. Recently, nonlocal low-rank (NLR) reconstruction has achieved remarkable success in improving accuracy and generalization. However, the computational cost has inhibited NLR from seeking global structural similarity, which consequentially keeps it trapped in the tradeoff between accuracy and efficiency and prevents it from high-dimensional large-scale tasks. To address this challenge, we report here the global low-rank (GLR) optimization technique, realizing highly-efficient large-scale reconstruction with global self-similarity. Inspired by the self-attention mechanism in deep learning, GLR extracts exemplar image patches by feature detection instead of conventional uniform selection. This directly produces key patches using structural features to avoid burdensome computational redundancy. Further, it performs patch matching across the entire image via neural-based convolution, which produces the global similarity heat map in parallel, rather than conventional sequential block-wise matching. As such, GLR improves patch grou** efficiency by more than one order of magnitude. We experimentally demonstrate GLR's effectiveness on temporal, frequency, and spectral dimensions, including different computational imaging modalities of compressive temporal imaging, magnetic resonance imaging, and multispectral filter array demosaicing. This work presents the superiority of inherent fusion of deep learning strategies and iterative optimization, and breaks the persistent dilemma of the tradeoff between accuracy and efficiency for various large-scale reconstruction tasks. △ Less

Submitted 8 January, 2023; originally announced January 2023.

arXiv:2212.13654 [pdf]

Large-scale single-photon imaging

Authors: Liheng Bian, Haoze Song, Lintao Peng, Xuyang Chang, Xi Yang, Roarke Horstmeyer, Lin Ye, Tong Qin, Dezhi Zheng, Jun Zhang

Abstract: Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep lea… ▽ More Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep learning into SPAD, enabling super-resolution single-photon imaging over an order of magnitude, with significant enhancement of bit depth and imaging quality. We first studied the complex photon flow model of SPAD electronics to accurately characterize multiple physical noise sources, and collected a real SPAD image dataset (64 $\times$ 32 pixels, 90 scenes, 10 different bit depth, 3 different illumination flux, 2790 images in total) to calibrate noise model parameters. With this real-world physical noise model, we for the first time synthesized a large-scale realistic single-photon image dataset (image pairs of 5 different resolutions with maximum megapixels, 17250 scenes, 10 different bit depth, 3 different illumination flux, 2.6 million images in total) for subsequent network training. To tackle the severe super-resolution challenge of SPAD inputs with low bit depth, low resolution, and heavy noise, we further built a deep transformer network with a content-adaptive self-attention mechanism and gated fusion modules, which can dig global contextual features to remove multi-source noise and extract full-frequency details. We applied the technique on a series of experiments including macroscopic and microscopic imaging, microfluidic inspection, and Fourier ptychography. The experiments validate the technique's state-of-the-art super-resolution SPAD imaging performance, with more than 5 dB superiority on PSNR compared to the existing methods. △ Less

Submitted 27 December, 2022; originally announced December 2022.

arXiv:2207.08201 [pdf, other]

doi 10.1109/TIP.2023.3244417

INFWIDE: Image and Feature Space Wiener Deconvolution Network for Non-blind Image Deblurring in Low-Light Conditions

Authors: Zhihong Zhang, Yuxiao Cheng, **li Suo, Liheng Bian, Qionghai Dai

Abstract: Under low-light environment, handheld photography suffers from severe camera shake under long exposure settings. Although existing deblurring algorithms have shown promising performance on well-exposed blurry images, they still cannot cope with low-light snapshots. Sophisticated noise and saturation regions are two dominating challenges in practical low-light deblurring. In this work, we propose a… ▽ More Under low-light environment, handheld photography suffers from severe camera shake under long exposure settings. Although existing deblurring algorithms have shown promising performance on well-exposed blurry images, they still cannot cope with low-light snapshots. Sophisticated noise and saturation regions are two dominating challenges in practical low-light deblurring. In this work, we propose a novel non-blind deblurring method dubbed image and feature space Wiener deconvolution network (INFWIDE) to tackle these problems systematically. In terms of algorithm design, INFWIDE proposes a two-branch architecture, which explicitly removes noise and hallucinates saturated regions in the image space and suppresses ringing artifacts in the feature space, and integrates the two complementary outputs with a subtle multi-scale fusion network for high quality night photograph deblurring. For effective network training, we design a set of loss functions integrating a forward imaging model and backward reconstruction to form a close-loop regularization to secure good convergence of the deep neural network. Further, to optimize INFWIDE's applicability in real low-light conditions, a physical-process-based low-light noise model is employed to synthesize realistic noisy night photographs for model training. Taking advantage of the traditional Wiener deconvolution algorithm's physically driven characteristics and arisen deep neural network's representation ability, INFWIDE can recover fine details while suppressing the unpleasant artifacts during deblurring. Extensive experiments on synthetic data and real data demonstrate the superior performance of the proposed approach. △ Less

Submitted 17 February, 2023; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: Accepted by IEEE Trans. Image Process, early access version available at https://ieeexplore.ieee.org/document/10047966

arXiv:2203.14856 [pdf]

WSEBP: A Novel Width-depth Synchronous Extension-based Basis Pursuit Algorithm for Multi-Layer Convolutional Sparse Coding

Authors: Haitong Tang, Shuang He, Lingbin Bian, Zhiming Cui, Nizhuan Wang

Abstract: The pursuit algorithms integrated in multi-layer convolutional sparse coding (ML-CSC) can interpret the convolutional neural networks (CNNs). However, many current state-of-art (SOTA) pursuit algorithms require multiple iterations to optimize the solution of ML-CSC, which limits their applications to deeper CNNs due to high computational cost and large number of resources for getting very tiny gai… ▽ More The pursuit algorithms integrated in multi-layer convolutional sparse coding (ML-CSC) can interpret the convolutional neural networks (CNNs). However, many current state-of-art (SOTA) pursuit algorithms require multiple iterations to optimize the solution of ML-CSC, which limits their applications to deeper CNNs due to high computational cost and large number of resources for getting very tiny gain of performance. In this study, we focus on the 0th iteration in pursuit algorithm by introducing an effective initialization strategy for each layer, by which the solution for ML-CSC can be improved. Specifically, we first propose a novel width-depth synchronous extension-based basis pursuit (WSEBP) algorithm which solves the ML-CSC problem without the limitation of the number of iterations compared to the SOTA algorithms and maximizes the performance by an effective initialization in each layer. Then, we propose a simple and unified ML-CSC-based classification network (ML-CSC-Net) which consists of an ML-CSC-based feature encoder and a fully-connected layer to validate the performance of WSEBP on image classification task. The experimental results show that our proposed WSEBP outperforms SOTA algorithms in terms of accuracy and consumption resources. In addition, the WSEBP integrated in CNNs can improve the performance of deeper CNNs and make them interpretable. Finally, taking VGG as an example, we propose WSEBP-VGG13 to enhance the performance of VGG13, which achieves competitive results on four public datasets, i.e., 87.79% vs. 86.83% on Cifar-10 dataset, 58.01% vs. 54.60% on Cifar-100 dataset, 91.52% vs. 89.58% on COVID-19 dataset, and 99.88% vs. 99.78% on Crack dataset, respectively. The results show the effectiveness of the proposed WSEBP, the improved performance of ML-CSC with WSEBP, and interpretation of the CNNs or deeper CNNs. △ Less

Submitted 29 March, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

arXiv:2201.02833 [pdf, other]

Weighted Encoding Optimization for Dynamic Single-pixel Imaging and Sensing

Authors: Xinrui Zhan, Liheng Bian, Chunli Zhu, Jun Zhang

Abstract: Using single-pixel detection, the end-to-end neural network that jointly optimizes both encoding and decoding enables high-precision imaging and high-level semantic sensing. However, for varied sampling rates, the large-scale network requires retraining that is laboursome and computation-consuming. In this letter, we report a weighted optimization technique for dynamic rate-adaptive single-pixel i… ▽ More Using single-pixel detection, the end-to-end neural network that jointly optimizes both encoding and decoding enables high-precision imaging and high-level semantic sensing. However, for varied sampling rates, the large-scale network requires retraining that is laboursome and computation-consuming. In this letter, we report a weighted optimization technique for dynamic rate-adaptive single-pixel imaging and sensing, which only needs to train the network for one time that is available for any sampling rates. Specifically, we introduce a novel weighting scheme in the encoding process to characterize different patterns' modulation efficiency. While the network is training at a high sampling rate, the modulation patterns and corresponding weights are updated iteratively, which produces optimal ranked encoding series when converged. In the experimental implementation, the optimal pattern series with the highest weights are employed for light modulation, thus achieving highly-efficient imaging and sensing. The reported strategy saves the additional training of another low-rate network required by the existing dynamic single-pixel networks, which further doubles training efficiency. Experiments on the MNIST dataset validated that once the network is trained with a sampling rate of 1, the average imaging PSNR reaches 23.50 dB at 0.1 sampling rate, and the image-free classification accuracy reaches up to 95.00\% at a sampling rate of 0.03 and 97.91\% at a sampling rate of 0.1. △ Less

Submitted 8 January, 2022; originally announced January 2022.

arXiv:2201.00895 [pdf, other]

A Gradient Map** Guided Explainable Deep Neural Network for Extracapsular Extension Identification in 3D Head and Neck Cancer Computed Tomography Images

Authors: Yibin Wang, Abdur Rahman, W. Neil. Duggar, P. Russell Roberts, Toms V. Thomas, Linkan Bian, Haifeng Wang

Abstract: Diagnosis and treatment management for head and neck squamous cell carcinoma (HNSCC) is guided by routine diagnostic head and neck computed tomography (CT) scans to identify tumor and lymph node features. Extracapsular extension (ECE) is a strong predictor of patients' survival outcomes with HNSCC. It is essential to detect the occurrence of ECE as it changes staging and management for the patient… ▽ More Diagnosis and treatment management for head and neck squamous cell carcinoma (HNSCC) is guided by routine diagnostic head and neck computed tomography (CT) scans to identify tumor and lymph node features. Extracapsular extension (ECE) is a strong predictor of patients' survival outcomes with HNSCC. It is essential to detect the occurrence of ECE as it changes staging and management for the patients. Current clinical ECE detection relies on visual identification and pathologic confirmation conducted by radiologists. Machine learning (ML)-based ECE diagnosis has shown high potential in the recent years. However, manual annotation of lymph node region is a required data preprocessing step in most of the current ML-based ECE diagnosis studies. In addition, this manual annotation process is time-consuming, labor-intensive, and error-prone. Therefore, in this paper, we propose a Gradient Map** Guided Explainable Network (GMGENet) framework to perform ECE identification automatically without requiring annotated lymph node region information. The gradient-weighted class activation map** (Grad-CAM) technique is proposed to guide the deep learning algorithm to focus on the regions that are highly related to ECE. Informative volumes of interest (VOIs) are extracted without labeled lymph node region information. In evaluation, the proposed method is well-trained and tested using cross validation, achieving test accuracy and AUC of 90.2% and 91.1%, respectively. The presence or absence of ECE has been analyzed and correlated with gold standard histopathological findings. △ Less

Submitted 3 January, 2022; originally announced January 2022.

arXiv:2112.10587 [pdf, other]

Image-free multi-character recognition

Authors: Huayi Wang, Chunli Zhu, Liheng Bian

Abstract: The recently developed image-free sensing technique maintains the advantages of both the light hardware and software, which has been applied in simple target classification and motion tracking. In practical applications, however, there usually exist multiple targets in the field of view, where existing trials fail to produce multi-semantic information. In this letter, we report a novel image-free… ▽ More The recently developed image-free sensing technique maintains the advantages of both the light hardware and software, which has been applied in simple target classification and motion tracking. In practical applications, however, there usually exist multiple targets in the field of view, where existing trials fail to produce multi-semantic information. In this letter, we report a novel image-free sensing technique to tackle the multi-target recognition challenge for the first time. Different from the convolutional layer stack of image-free single-pixel networks, the reported CRNN network utilities the bidirectional LSTM architecture to predict the distribution of multiple characters simultaneously. The framework enables to capture the long-range dependencies, providing a high recognition accuracy of multiple characters. We demonstrated the technique's effectiveness in license plate detection, which achieved 87.60% recognition accuracy at a 5% sampling rate with a higher than 100 FPS refresh rate. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Comments: 17pages, 4figures

arXiv:2111.11843 [pdf, other]

doi 10.1109/TIP.2023.3276332

U-shape Transformer for Underwater Image Enhancement

Authors: Lintao Peng, Chunli Zhu, Liheng Bian

Abstract: The light absorption and scattering of underwater impurities lead to poor underwater imaging quality. The existing data-driven based underwater image enhancement (UIE) techniques suffer from the lack of a large-scale dataset containing various underwater scenes and high-fidelity reference images. Besides, the inconsistent attenuation in different color channels and space areas is not fully conside… ▽ More The light absorption and scattering of underwater impurities lead to poor underwater imaging quality. The existing data-driven based underwater image enhancement (UIE) techniques suffer from the lack of a large-scale dataset containing various underwater scenes and high-fidelity reference images. Besides, the inconsistent attenuation in different color channels and space areas is not fully considered for boosted enhancement. In this work, we constructed a large-scale underwater image (LSUI) dataset including 5004 image pairs, and reported an U-shape Transformer network where the transformer model is for the first time introduced to the UIE task. The U-shape Transformer is integrated with a channel-wise multi-scale feature fusion transformer (CMSFFT) module and a spatial-wise global feature modeling transformer (SGFMT) module, which reinforce the network's attention to the color channels and space areas with more serious attenuation. Meanwhile, in order to further improve the contrast and saturation, a novel loss function combining RGB, LAB and LCH color spaces is designed following the human vision principle. The extensive experiments on available datasets validate the state-of-the-art performance of the reported technique with more than 2dB superiority. △ Less

Submitted 12 June, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

Comments: under review

arXiv:2108.10617 [pdf, other]

Image-free single-pixel segmentation

Authors: Haiyan Liu, Liheng Bian, Jun Zhang

Abstract: The existing segmentation techniques require high-fidelity images as input to perform semantic segmentation. Since the segmentation results contain most of edge information that is much less than the acquired images, the throughput gap leads to both hardware and software waste. In this letter, we report an image-free single-pixel segmentation technique. The technique combines structured illuminati… ▽ More The existing segmentation techniques require high-fidelity images as input to perform semantic segmentation. Since the segmentation results contain most of edge information that is much less than the acquired images, the throughput gap leads to both hardware and software waste. In this letter, we report an image-free single-pixel segmentation technique. The technique combines structured illumination and single-pixel detection together, to efficiently samples and multiplexes scene's segmentation information into compressed one-dimensional measurements. The illumination patterns are optimized together with the subsequent reconstruction neural network, which directly infers segmentation maps from the single-pixel measurements. The end-to-end encoding-and-decoding learning framework enables optimized illumination with corresponding network, which provides both high acquisition and segmentation efficiency. Both simulation and experimental results validate that accurate segmentation can be achieved using two-order-of-magnitude less input data. When the sampling ratio is 1%, the Dice coefficient reaches above 80% and the pixel accuracy reaches above 96%. We envision that this image-free segmentation technique can be widely applied in various resource-limited platforms such as UAV and unmanned vehicle that require real-time sensing. △ Less

Submitted 24 August, 2021; originally announced August 2021.

arXiv:2106.05082 [pdf, other]

doi 10.1364/OE.434805

Agile wide-field imaging with selective high resolution

Authors: Lintao Peng, Liheng Bian, Tiexin Liu, Jun Zhang

Abstract: Wide-field and high-resolution (HR) imaging is essential for various applications such as aviation reconnaissance, topographic map** and safety monitoring. The existing techniques require a large-scale detector array to capture HR images of the whole field, resulting in high complexity and heavy cost. In this work, we report an agile wide-field imaging framework with selective high resolution th… ▽ More Wide-field and high-resolution (HR) imaging is essential for various applications such as aviation reconnaissance, topographic map** and safety monitoring. The existing techniques require a large-scale detector array to capture HR images of the whole field, resulting in high complexity and heavy cost. In this work, we report an agile wide-field imaging framework with selective high resolution that requires only two detectors. It builds on the statistical sparsity prior of natural scenes that the important targets locate only at small regions of interests (ROI), instead of the whole field. Under this assumption, we use a short-focal camera to image wide field with a certain low resolution, and use a long-focal camera to acquire the HR images of ROI. To automatically locate ROI in the wide field in real time, we propose an efficient deep-learning based multiscale registration method that is robust and blind to the large setting differences (focal, white balance, etc) between the two cameras. Using the registered location, the long-focal camera mounted on a gimbal enables real-time tracking of the ROI for continuous HR imaging. We demonstrated the novel imaging framework by building a proof-of-concept setup with only 1181 gram weight, and assembled it on an unmanned aerial vehicle for air-to-ground monitoring. Experiments show that the setup maintains 120$^{\circ}$ wide field-of-view (FOV) with selective 0.45$mrad$ instantaneous FOV. △ Less

Submitted 11 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: 12pages,6figures

arXiv:2104.07460 [pdf, other]

doi 10.1145/3453483.3454054

Automated Conformance Testing for JavaScript Engines via Deep Compiler Fuzzing

Authors: Guixin Ye, Zhanyong Tang, Shin Hwei Tan, Songfang Huang, Dingyi Fang, Xiaoyang Sun, Lizhong Bian, Haibo Wang, Zheng Wang

Abstract: JavaScript (JS) is a popular, platform-independent programming language. To ensure the interoperability of JS programs across different platforms, the implementation of a JS engine should conform to the ECMAScript standard. However, doing so is challenging as there are many subtle definitions of API behaviors, and the definitions keep evolving. We present COMFORT, a new compiler fuzzing framewor… ▽ More JavaScript (JS) is a popular, platform-independent programming language. To ensure the interoperability of JS programs across different platforms, the implementation of a JS engine should conform to the ECMAScript standard. However, doing so is challenging as there are many subtle definitions of API behaviors, and the definitions keep evolving. We present COMFORT, a new compiler fuzzing framework for detecting JS engine bugs and behaviors that deviate from the ECMAScript standard. COMFORT leverages the recent advance in deep learning-based language models to automatically generate JS test code. As a departure from prior fuzzers, COMFORT utilizes the well-structured ECMAScript specifications to automatically generate test data along with the test programs to expose bugs that could be overlooked by the developers or manually written test cases. COMFORT then applies differential testing methodologies on the generated test cases to expose standard conformance bugs. We apply COMFORT to ten mainstream JS engines. In 200 hours of automated concurrent testing runs, we discover bugs in all tested JS engines. We had identified 158 unique JS engine bugs, of which 129 have been verified, and 115 have already been fixed by the developers. Furthermore, 21 of the Comfort-generated test cases have been added to Test262, the official ECMAScript conformance test suite. △ Less

Submitted 15 April, 2021; originally announced April 2021.

Comments: PLDI 2021

ACM Class: D.3.0; I.2.5

arXiv:2104.03777 [pdf, other]

Affine-modeled video extraction from a single motion blurred image

Authors: Daoyu Li, Liheng Bian, Jun Zhang

Abstract: A motion-blurred image is the temporal average of multiple sharp frames over the exposure time. Recovering these sharp video frames from a single blurred image is nontrivial, due to not only its strong ill-posedness, but also various types of complex motion in reality such as rotation and motion in depth. In this work, we report a generalized video extraction method using the affine motion modelin… ▽ More A motion-blurred image is the temporal average of multiple sharp frames over the exposure time. Recovering these sharp video frames from a single blurred image is nontrivial, due to not only its strong ill-posedness, but also various types of complex motion in reality such as rotation and motion in depth. In this work, we report a generalized video extraction method using the affine motion modeling, enabling to tackle multiple types of complex motion and their mixing. In its workflow, the moving objects are first segemented in the alpha channel. This allows separate recovery of different objects with different motion. Then, we reduce the variable space by modeling each video clip as a series of affine transformations of a reference frame, and introduce the $l0$-norm total variation regularization to attenuate the ringing artifact. The differentiable affine operators are employed to realize gradient-descent optimization of the affine model, which follows a novel coarse-to-fine strategy to further reduce artifacts. As a result, both the affine parameters and sharp reference image are retrieved. They are finally input into stepwise affine transformation to recover the sharp video frames. The stepwise retrieval maintains the nature to bypass the frame order ambiguity. Experiments on both public datasets and real captured data validate the state-of-the-art performance of the reported technique. △ Less

Submitted 8 April, 2021; originally announced April 2021.

arXiv:1909.11498 [pdf, other]

doi 10.1364/OL.395150

Non-imaging single-pixel sensing with optimized binary modulation

Authors: Hao Fu, Liheng Bian, Jun Zhang

Abstract: The conventional high-level sensing techniques require high-fidelity images as input to extract target features, which are produced by either complex imaging hardware or high-complexity reconstruction algorithms. In this letter, we propose single-pixel sensing (SPS) that performs high-level sensing directly from coupled measurements of a single-pixel detector, without the conventional image acquis… ▽ More The conventional high-level sensing techniques require high-fidelity images as input to extract target features, which are produced by either complex imaging hardware or high-complexity reconstruction algorithms. In this letter, we propose single-pixel sensing (SPS) that performs high-level sensing directly from coupled measurements of a single-pixel detector, without the conventional image acquisition and reconstruction process. The technique consists of three steps including binary light modulation that can be physically implemented at $\sim$22kHz, single-pixel coupled detection owning wide working spectrum and high signal-to-noise ratio, and end-to-end deep-learning based sensing that reduces both hardware and software complexity. Besides, the binary modulation is trained and optimized together with the sensing network, which ensures least required measurements and optimal sensing accuracy. The effectiveness of SPS is demonstrated on the classification task of handwritten MNIST dataset, and 96.68% classification accuracy at $\sim$1kHz is achieved. The reported single-pixel sensing technique is a novel framework for highly efficient machine intelligence. △ Less

Submitted 27 September, 2019; v1 submitted 25 September, 2019; originally announced September 2019.

arXiv:1707.03164 [pdf, other]

doi 10.1364/JOSAA.35.000078

Experimental comparison of single-pixel imaging algorithms

Authors: Liheng Bian, **li Suo, Qionghai Dai, Feng Chen

Abstract: Single-pixel imaging (SPI) is a novel technique capturing 2D images using a photodiode, instead of conventional 2D array sensors. SPI owns high signal-to-noise ratio, wide spectrum range, low cost, and robustness to light scattering. Various algorithms have been proposed for SPI reconstruction, including the linear correlation methods, the alternating projection method (AP), and the compressive se… ▽ More Single-pixel imaging (SPI) is a novel technique capturing 2D images using a photodiode, instead of conventional 2D array sensors. SPI owns high signal-to-noise ratio, wide spectrum range, low cost, and robustness to light scattering. Various algorithms have been proposed for SPI reconstruction, including the linear correlation methods, the alternating projection method (AP), and the compressive sensing based methods. However, there has been no comprehensive review discussing respective advantages, which is important for SPI's further applications and development. In this paper, we reviewed and compared these algorithms in a unified reconstruction framework. Besides, we proposed two other SPI algorithms including a conjugate gradient descent based method (CGD) and a Poisson maximum likelihood based method. Both simulations and experiments validate the following conclusions: to obtain comparable reconstruction accuracy, the compressive sensing based total variation regularization method (TV) requires the least measurements and consumes the least running time for small-scale reconstruction; the CGD and AP methods run fastest in large-scale cases; the TV and AP methods are the most robust to measurement noise. In a word, there are trade-offs between capture efficiency, computational complexity and robustness to noise among different SPI algorithms. We have released our source code for non-commercial use. △ Less

Submitted 24 October, 2017; v1 submitted 11 July, 2017; originally announced July 2017.

arXiv:1701.08721 [pdf]

doi 10.1016/j.compenvurbsys.2016.06.002

Scale effects on spatially embedded contact networks

Authors: Peng Gao, Ling Bian

Abstract: Spatial phenomena are subject to scale effects, but there are rarely studies addressing such effects on spatially embedded contact networks. There are two types of structure in these networks, network structure and spatial structure. The network structure has been actively studied. The spatial structure of these networks has received attention only in recent years. Certainly little is known whethe… ▽ More Spatial phenomena are subject to scale effects, but there are rarely studies addressing such effects on spatially embedded contact networks. There are two types of structure in these networks, network structure and spatial structure. The network structure has been actively studied. The spatial structure of these networks has received attention only in recent years. Certainly little is known whether the two structures respond to each other. This study examines the scale effects, in terms of spatial extent, on the network structure and the spatial structure of spatially embedded contact networks. Two issues are explored, how the two types of structures change in response to scale changes, and the range of the scale effects. Two sets of areal units, regular grids with 24 different levels of spatial extent and census units of three levels of spatial extent, are used to divide one observed and two reference random networks into multiple scales. Six metrics are used to represent the two structures. Results show different scale effects. In terms of the network structure, the properties of the observed network are sensitive to scale changes at fine scales. In comparison, the clustered spatial structure of the network is scale independent. The behaviors of the network structure are affected by the spatial structure. This information helps identify vulnerable households and communities to health risks and helps deploy intervention strategies to spatially targeted areas. △ Less

Submitted 30 January, 2017; originally announced January 2017.

Journal ref: Computers, Environment and Urban Systems, 59, 142-151 (2016)

arXiv:1603.04746 [pdf, other]

doi 10.1038/srep27384

Fourier ptychographic reconstruction using Poisson maximum likelihood and truncated Wirtinger gradient

Authors: Liheng Bian, **li Suo, Jaebum Chung, Xiaoze Ou, Changhuei Yang, Feng Chen, Qionghai Dai

Abstract: Fourier ptychographic microscopy (FPM) is a novel computational coherent imaging technique for high space-bandwidth product imaging. Mathematically, Fourier ptychographic (FP) reconstruction can be implemented as a phase retrieval optimization process, in which we only obtain low resolution intensity images corresponding to the sub-bands of the sample's high resolution (HR) spatial spectrum, and a… ▽ More Fourier ptychographic microscopy (FPM) is a novel computational coherent imaging technique for high space-bandwidth product imaging. Mathematically, Fourier ptychographic (FP) reconstruction can be implemented as a phase retrieval optimization process, in which we only obtain low resolution intensity images corresponding to the sub-bands of the sample's high resolution (HR) spatial spectrum, and aim to retrieve the complex HR spectrum. In real setups, the measurements always suffer from various degenerations such as Gaussian noise, Poisson noise, speckle noise and pupil location error, which would largely degrade the reconstruction. To efficiently address these degenerations, we propose a novel FP reconstruction method under a gradient descent optimization framework in this paper. The technique utilizes Poisson maximum likelihood for better signal modeling, and truncated Wirtinger gradient for error removal. Results on both simulated data and real data captured using our laser FPM setup show that the proposed method outperforms other state-of-the-art algorithms. Also, we have released our source code for non-commercial use. △ Less

Submitted 1 March, 2016; originally announced March 2016.

arXiv:1312.1931 [pdf, ps, other]

doi 10.1117/1.JBO.20.3.036006

Multi-frame denoising of high speed optical coherence tomography data using inter-frame and intra-frame priors

Authors: Liheng Bian, **li Suo, Feng Chen, Qionghai Dai

Abstract: Optical coherence tomography (OCT) is an important interferometric diagnostic technique which provides cross-sectional views of the subsurface microstructure of biological tissues. However, the imaging quality of high-speed OCT is limited due to the large speckle noise. To address this problem, this paper proposes a multi-frame algorithmic method to denoise OCT volume. Mathematically, we build an… ▽ More Optical coherence tomography (OCT) is an important interferometric diagnostic technique which provides cross-sectional views of the subsurface microstructure of biological tissues. However, the imaging quality of high-speed OCT is limited due to the large speckle noise. To address this problem, this paper proposes a multi-frame algorithmic method to denoise OCT volume. Mathematically, we build an optimization model which forces the temporally registered frames to be low rank, and the gradient in each frame to be sparse, under logarithmic image formation and noise variance constraints. Besides, a convex optimization algorithm based on the augmented Lagrangian method is derived to solve the above model. The results reveal that our approach outperforms the other methods in terms of both speckle noise suppression and crucial detail preservation. △ Less

Submitted 29 November, 2014; v1 submitted 6 December, 2013; originally announced December 2013.

Showing 1–24 of 24 results for author: Bian, L