Search | arXiv e-print repository

A Hybrid Task-Constrained Motion Planning for Collaborative Robots in Intelligent Remanufacturing

Authors: Wansong Liu, Chang Liu, Xiao Liang, Minghui Zheng

Abstract: Industrial manipulators have extensively collaborated with human operators to execute tasks, e.g., disassembly of end-of-use products, in intelligent remanufacturing. A safety task execution requires real-time path planning for the manipulator's end-effector to autonomously avoid human operators. This is even more challenging when the end-effector needs to follow a planned path while avoiding the… ▽ More Industrial manipulators have extensively collaborated with human operators to execute tasks, e.g., disassembly of end-of-use products, in intelligent remanufacturing. A safety task execution requires real-time path planning for the manipulator's end-effector to autonomously avoid human operators. This is even more challenging when the end-effector needs to follow a planned path while avoiding the collision between the manipulator body and human operators, which is usually computationally expensive and limits real-time application. This paper proposes an efficient hybrid motion planning algorithm that consists of an A$^*$ algorithm and an online manipulator reconfiguration mechanism (OMRM) to tackle such challenges in task and configuration spaces respectively. The A$^*$ algorithm is first leveraged to plan the shortest collision-free path of the end-effector in task space. When the manipulator body is risky to the human operator, our OMRM then selects an alternative joint configuration with minimum reconfiguration effort from a database to assist the manipulator to follow the planned path and avoid the human operator simultaneously. The database of manipulator reconfiguration establishes the relationship between the task and configuration space offline using forward kinematics, and is able to provide multiple reconfiguration candidates for a desired end-effector's position. The proposed new hybrid algorithm plans safe manipulator motion during the whole task execution. Extensive numerical and experimental studies, as well as comparison studies between the proposed one and the state-of-the-art ones, have been conducted to validate the proposed motion planning algorithm. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.02518 [pdf, other]

DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering

Authors: Zhongpai Gao, Benjamin Planche, Meng Zheng, Xiao Chen, Terrence Chen, Ziyan Wu

Abstract: Digitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D CT volumes, widely used in preoperative settings but limited in intraoperative applications due to computational bottlenecks, especially for accurate but heavy physics-based Monte Carlo methods. While analytical DRR renderers offer greater efficiency, they overlook anisotropic X-ray image formation phenomena… ▽ More Digitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D CT volumes, widely used in preoperative settings but limited in intraoperative applications due to computational bottlenecks, especially for accurate but heavy physics-based Monte Carlo methods. While analytical DRR renderers offer greater efficiency, they overlook anisotropic X-ray image formation phenomena, such as Compton scattering. We present a novel approach that marries realistic physics-inspired X-ray simulation with efficient, differentiable DRR generation using 3D Gaussian splatting (3DGS). Our direction-disentangled 3DGS (DDGS) method separates the radiosity contribution into isotropic and direction-dependent components, approximating complex anisotropic interactions without intricate runtime simulations. Additionally, we adapt the 3DGS initialization to account for tomography data properties, enhancing accuracy and efficiency. Our method outperforms state-of-the-art techniques in image accuracy. Furthermore, our DDGS shows promise for intraoperative applications and inverse problems such as pose registration, delivering superior registration accuracy and runtime performance compared to analytical DRR methods. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.07962 [pdf, other]

KG-Planner: Knowledge-Informed Graph Neural Planning for Collaborative Manipulators

Authors: Wansong Liu, Kareem Eltouny, Sibo Tian, Xiao Liang, Minghui Zheng

Abstract: This paper presents a novel knowledge-informed graph neural planner (KG-Planner) to address the challenge of efficiently planning collision-free motions for robots in high-dimensional spaces, considering both static and dynamic environments involving humans. Unlike traditional motion planners that struggle with finding a balance between efficiency and optimality, the KG-Planner takes a different a… ▽ More This paper presents a novel knowledge-informed graph neural planner (KG-Planner) to address the challenge of efficiently planning collision-free motions for robots in high-dimensional spaces, considering both static and dynamic environments involving humans. Unlike traditional motion planners that struggle with finding a balance between efficiency and optimality, the KG-Planner takes a different approach. Instead of relying solely on a neural network or imitating the motions of an oracle planner, our KG-Planner integrates explicit physical knowledge from the workspace. The integration of knowledge has two key aspects: (1) we present an approach to design a graph that can comprehensively model the workspace's compositional structure. The designed graph explicitly incorporates critical elements such as robot joints, obstacles, and their interconnections. This representation allows us to capture the intricate relationships between these elements. (2) We train a Graph Neural Network (GNN) that excels at generating nearly optimal robot motions. In particular, the GNN employs a layer-wise propagation rule to facilitate the exchange and update of information among workspace elements based on their connections. This propagation emphasizes the influence of these elements throughout the planning process. To validate the efficacy and efficiency of our KG-Planner, we conduct extensive experiments in both static and dynamic environments. These experiments include scenarios with and without human workers. The results of our approach are compared against existing methods, showcasing the superior performance of the KG-Planner. A short video introduction of this work is available (video link provided in the paper). △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2404.10231 [pdf, ps, other]

doi 10.1109/LRA.2024.3391026

Improving Disturbance Estimation and Suppression via Learning among Systems with Mismatched Dynamics

Authors: Harsh Modi, Zhu Chen, Xiao Liang, Minghui Zheng

Abstract: Iterative learning control (ILC) is a method for reducing system tracking or estimation errors over multiple iterations by using information from past iterations. The disturbance observer (DOB) is used to estimate and mitigate disturbances within the system, while the system is being affected by them. ILC enhances system performance by introducing a feedforward signal in each iteration. However, i… ▽ More Iterative learning control (ILC) is a method for reducing system tracking or estimation errors over multiple iterations by using information from past iterations. The disturbance observer (DOB) is used to estimate and mitigate disturbances within the system, while the system is being affected by them. ILC enhances system performance by introducing a feedforward signal in each iteration. However, its effectiveness may diminish if the conditions change during the iterations. On the other hand, although DOB effectively mitigates the effects of new disturbances, it cannot entirely eliminate them as it operates reactively. Therefore, neither ILC nor DOB alone can ensure sufficient robustness in challenging scenarios. This study focuses on the simultaneous utilization of ILC and DOB to enhance system robustness. The proposed methodology specifically targets dynamically different linearized systems performing repetitive tasks. The systems share similar forms but differ in dynamics (e.g. sizes, masses, and controllers). Consequently, the design of learning filters must account for these differences in dynamics. To validate the approach, the study establishes a theoretical framework for designing learning filters in conjunction with DOB. The validity of the framework is then confirmed through numerical studies and experimental tests conducted on unmanned aerial vehicles (UAVs). Although UAVs are nonlinear systems, the study employs a linearized controller as they operate in proximity to the hover condition. A video introduction of this paper is available via this link: https://zh.engr.tamu.edu/wp-content/uploads/sites/310/2024/02/ILCDOB_v3f.mp4. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.04829 [pdf, ps, other]

doi 10.1109/LSP.2024.3397160

Wi-Fi-based Personnel Identity Recognition: Addressing Dataset Imbalance with C-DDPMs

Authors: Jichen Bian, Chong Tan, Peiyao Tang, Min Zheng

Abstract: Wireless sensing technologies become increasingly prevalent due to the ubiquitous nature of wireless signals and their inherent privacy-friendly characteristics. Device-free personnel identity recognition, a prevalent application in wireless sensing, is susceptibly challenged by imbalanced channel state information (CSI) datasets. This letter proposes a novel method for CSI dataset augmentation th… ▽ More Wireless sensing technologies become increasingly prevalent due to the ubiquitous nature of wireless signals and their inherent privacy-friendly characteristics. Device-free personnel identity recognition, a prevalent application in wireless sensing, is susceptibly challenged by imbalanced channel state information (CSI) datasets. This letter proposes a novel method for CSI dataset augmentation that employs Conditional Denoising Diffusion Probabilistic Models (C-DDPMs) to generate additional samples that address class imbalance issues. The augmentation markedly improves classification accuracies on our homemade dataset, elevating all classes to above 94%. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Journal ref: IEEE Signal Processing Letters, 2024

arXiv:2403.05807 [pdf, other]

A self-supervised CNN for image watermark removal

Authors: Chunwei Tian, Menghua Zheng, Tiancai Jiao, Wangmeng Zuo, Yanning Zhang, Chia-Wen Lin

Abstract: Popular convolutional neural networks mainly use paired images in a supervised way for image watermark removal. However, watermarked images do not have reference images in the real world, which results in poor robustness of image watermark removal techniques. In this paper, we propose a self-supervised convolutional neural network (CNN) in image watermark removal (SWCNN). SWCNN uses a self-supervi… ▽ More Popular convolutional neural networks mainly use paired images in a supervised way for image watermark removal. However, watermarked images do not have reference images in the real world, which results in poor robustness of image watermark removal techniques. In this paper, we propose a self-supervised convolutional neural network (CNN) in image watermark removal (SWCNN). SWCNN uses a self-supervised way to construct reference watermarked images rather than given paired training samples, according to watermark distribution. A heterogeneous U-Net architecture is used to extract more complementary structural information via simple components for image watermark removal. Taking into account texture information, a mixed loss is exploited to improve visual effects of image watermark removal. Besides, a watermark dataset is conducted. Experimental results show that the proposed SWCNN is superior to popular CNNs in image watermark removal. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.17200 [pdf, other]

Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain

Authors: Qunliang Xing, Mai Xu, Shengxi Li, Xin Deng, Meisong Zheng, Huaida Liu, Ying Chen

Abstract: Existing quality enhancement methods for compressed images focus on aligning the enhancement domain with the raw domain to yield realistic images. However, these methods exhibit a pervasive enhancement bias towards the compression domain, inadvertently regarding it as more realistic than the raw domain. This bias makes enhanced images closely resemble their compressed counterparts, thus degrading… ▽ More Existing quality enhancement methods for compressed images focus on aligning the enhancement domain with the raw domain to yield realistic images. However, these methods exhibit a pervasive enhancement bias towards the compression domain, inadvertently regarding it as more realistic than the raw domain. This bias makes enhanced images closely resemble their compressed counterparts, thus degrading their perceptual quality. In this paper, we propose a simple yet effective method to mitigate this bias and enhance the quality of compressed images. Our method employs a conditional discriminator with the compressed image as a key condition, and then incorporates a domain-divergence regularization to actively distance the enhancement domain from the compression domain. Through this dual strategy, our method enables the discrimination against the compression domain, and brings the enhancement domain closer to the raw domain. Comprehensive quality evaluations confirm the superiority of our method over other state-of-the-art methods without incurring inference overheads. △ Less

Submitted 19 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted to CVPR 2024

arXiv:2311.15231 [pdf, other]

Double Reverse Regularization Network Based on Self-Knowledge Distillation for SAR Object Classification

Authors: Bo Xu, Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng

Abstract: In current synthetic aperture radar (SAR) object classification, one of the major challenges is the severe overfitting issue due to the limited dataset (few-shot) and noisy data. Considering the advantages of knowledge distillation as a learned label smoothing regularization, this paper proposes a novel Double Reverse Regularization Network based on Self-Knowledge Distillation (DRRNet-SKD). Specif… ▽ More In current synthetic aperture radar (SAR) object classification, one of the major challenges is the severe overfitting issue due to the limited dataset (few-shot) and noisy data. Considering the advantages of knowledge distillation as a learned label smoothing regularization, this paper proposes a novel Double Reverse Regularization Network based on Self-Knowledge Distillation (DRRNet-SKD). Specifically, through exploring the effect of distillation weight on the process of distillation, we are inspired to adopt the double reverse thought to implement an effective regularization network by combining offline and online distillation in a complementary way. Then, the Adaptive Weight Assignment (AWA) module is designed to adaptively assign two reverse-changing weights based on the network performance, allowing the student network to better benefit from both teachers. The experimental results on OpenSARShip and FUSAR-Ship demonstrate that DRRNet-SKD exhibits remarkable performance improvement on classical CNNs, outperforming state-of-the-art self-knowledge distillation methods. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 6 pages, 8 figures

arXiv:2310.10408 [pdf, other]

A cross Transformer for image denoising

Authors: Chunwei Tian, Menghua Zheng, Wangmeng Zuo, Shichao Zhang, Yanning Zhang, Chia-Wen Ling

Abstract: Deep convolutional neural networks (CNNs) depend on feedforward and feedback ways to obtain good performance in image denoising. However, how to obtain effective structural information via CNNs to efficiently represent given noisy images is key for complex scenes. In this paper, we propose a cross Transformer denoising CNN (CTNet) with a serial block (SB), a parallel block (PB), and a residual blo… ▽ More Deep convolutional neural networks (CNNs) depend on feedforward and feedback ways to obtain good performance in image denoising. However, how to obtain effective structural information via CNNs to efficiently represent given noisy images is key for complex scenes. In this paper, we propose a cross Transformer denoising CNN (CTNet) with a serial block (SB), a parallel block (PB), and a residual block (RB) to obtain clean images for complex scenes. A SB uses an enhanced residual architecture to deeply search structural information for image denoising. To avoid loss of key information, PB uses three heterogeneous networks to implement multiple interactions of multi-level features to broadly search for extra information for improving the adaptability of an obtained denoiser for complex scenes. Also, to improve denoising performance, Transformer mechanisms are embedded into the SB and PB to extract complementary salient features for effectively removing noise in terms of pixel relations. Finally, a RB is applied to acquire clean images. Experiments illustrate that our CTNet is superior to some popular denoising methods in terms of real and synthetic image denoising. It is suitable to mobile digital devices, i.e., phones. Codes can be obtained at https://github.com/hellloxiaotian/CTNet. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2306.02634 [pdf, other]

Computational 3D topographic microscopy from terabytes of data per sample

Authors: Kevin C. Zhou, Mark Harfouche, Maxwell Zheng, Joakim Jönsson, Kyung Chul Lee, Ron Appel, Paul Reamey, Thomas Doman, Veton Saliu, Gregor Horstmeyer, Roarke Horstmeyer

Abstract: We present a large-scale computational 3D topographic microscope that enables 6-gigapixel profilometric 3D imaging at micron-scale resolution across $>$110 cm$^2$ areas over multi-millimeter axial ranges. Our computational microscope, termed STARCAM (Scanning Topographic All-in-focus Reconstruction with a Computational Array Microscope), features a parallelized, 54-camera architecture with 3-axis… ▽ More We present a large-scale computational 3D topographic microscope that enables 6-gigapixel profilometric 3D imaging at micron-scale resolution across $>$110 cm$^2$ areas over multi-millimeter axial ranges. Our computational microscope, termed STARCAM (Scanning Topographic All-in-focus Reconstruction with a Computational Array Microscope), features a parallelized, 54-camera architecture with 3-axis translation to capture, for each sample of interest, a multi-dimensional, 2.1-terabyte (TB) dataset, consisting of a total of 224,640 9.4-megapixel images. We developed a self-supervised neural network-based algorithm for 3D reconstruction and stitching that jointly estimates an all-in-focus photometric composite and 3D height map across the entire field of view, using multi-view stereo information and image sharpness as a focal metric. The memory-efficient, compressed differentiable representation offered by the neural network effectively enables joint participation of the entire multi-TB dataset during the reconstruction process. To demonstrate the broad utility of our new computational microscope, we applied STARCAM to a variety of decimeter-scale objects, with applications ranging from cultural heritage to industrial inspection. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.12708 [pdf, other]

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer

Authors: Huadai Liu, Rongjie Huang, Xuan Lin, Wenqiang Xu, Maozong Zheng, Hong Chen, **zheng He, Zhou Zhao

Abstract: Text-to-speech(TTS) has undergone remarkable improvements in performance, particularly with the advent of Denoising Diffusion Probabilistic Models (DDPMs). However, the perceived quality of audio depends not solely on its content, pitch, rhythm, and energy, but also on the physical environment. In this work, we propose ViT-TTS, the first visual TTS model with scalable diffusion transformers. ViT-T… ▽ More Text-to-speech(TTS) has undergone remarkable improvements in performance, particularly with the advent of Denoising Diffusion Probabilistic Models (DDPMs). However, the perceived quality of audio depends not solely on its content, pitch, rhythm, and energy, but also on the physical environment. In this work, we propose ViT-TTS, the first visual TTS model with scalable diffusion transformers. ViT-TTS complement the phoneme sequence with the visual information to generate high-perceived audio, opening up new avenues for practical applications of AR and VR to allow a more immersive and realistic audio experience. To mitigate the data scarcity in learning visual acoustic information, we 1) introduce a self-supervised learning framework to enhance both the visual-text encoder and denoiser decoder; 2) leverage the diffusion transformer scalable in terms of parameters and capacity to learn visual scene information. Experimental results demonstrate that ViT-TTS achieves new state-of-the-art results, outperforming cascaded systems and other baselines regardless of the visibility of the scene. With low-resource data (1h, 2h, 5h), ViT-TTS achieves comparative results with rich-resource baselines.~\footnote{Audio samples are available at \url{https://ViT-TTS.github.io/.}} △ Less

Submitted 21 April, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: Accepted by EMNLP 2023

arXiv:2302.05736 [pdf]

Locating the Sources of Sub-synchronous Oscillations Induced by the Control of Voltage Source Converters Based on Energy Structure and Nonlinearity Detection

Authors: Zetian Zheng, Shaowei Huang, Jun Yan, Qiangsheng Bu, Chen Shen, Mingzhong Zheng, Ye Liu

Abstract: The oscillation phenomena associated with the control of voltage source converters (VSCs) are widely concerning, and locating the source of these oscillations is crucial to suppressing them; therefore, this paper presents a locating scheme, based on the energy structure and nonlinearity detection. On the one hand, the energy structure, which conforms with the principle of the energy-based method a… ▽ More The oscillation phenomena associated with the control of voltage source converters (VSCs) are widely concerning, and locating the source of these oscillations is crucial to suppressing them; therefore, this paper presents a locating scheme, based on the energy structure and nonlinearity detection. On the one hand, the energy structure, which conforms with the principle of the energy-based method and dissipativity theory, is constructed to describe the transient energy flow for VSCs, and on this basis, a defined characteristic quantity is implemented to narrow the scope of oscillation source location; on the other hand, according to the self-sustained oscillation characteristics of VSCs, an index for nonlinearity detection is applied to locate the VSCs which produce the oscillation energy. The combination of the energy structure and nonlinearity detection could distinguish the contributions of different VSCs to the oscillation. The results of a case study implemented by the PSCAD/EMTDC simulation validate the proposed scheme. △ Less

Submitted 17 February, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

arXiv:2301.08351 [pdf, other]

doi 10.1038/s41566-023-01171-7

Parallelized computational 3D video microscopy of freely moving organisms at multiple gigapixels per second

Authors: Kevin C. Zhou, Mark Harfouche, Colin L. Cooke, Jaehee Park, Pavan C. Konda, Lucas Kreiss, Kanghyun Kim, Joakim Jönsson, Jed Doman, Paul Reamey, Veton Saliu, Clare B. Cook, Maxwell Zheng, Jack P. Bechtel, Aurélien Bègue, Matthew McCarroll, Jennifer Bagwell, Gregor Horstmeyer, Michel Bagnat, Roarke Horstmeyer

Abstract: To study the behavior of freely moving model organisms such as zebrafish (Danio rerio) and fruit flies (Drosophila) across multiple spatial scales, it would be ideal to use a light microscope that can resolve 3D information over a wide field of view (FOV) at high speed and high spatial resolution. However, it is challenging to design an optical instrument to achieve all of these properties simulta… ▽ More To study the behavior of freely moving model organisms such as zebrafish (Danio rerio) and fruit flies (Drosophila) across multiple spatial scales, it would be ideal to use a light microscope that can resolve 3D information over a wide field of view (FOV) at high speed and high spatial resolution. However, it is challenging to design an optical instrument to achieve all of these properties simultaneously. Existing techniques for large-FOV microscopic imaging and for 3D image measurement typically require many sequential image snapshots, thus compromising speed and throughput. Here, we present 3D-RAPID, a computational microscope based on a synchronized array of 54 cameras that can capture high-speed 3D topographic videos over a 135-cm^2 area, achieving up to 230 frames per second at throughputs exceeding 5 gigapixels (GPs) per second. 3D-RAPID features a 3D reconstruction algorithm that, for each synchronized temporal snapshot, simultaneously fuses all 54 images seamlessly into a globally-consistent composite that includes a coregistered 3D height map. The self-supervised 3D reconstruction algorithm itself trains a spatiotemporally-compressed convolutional neural network (CNN) that maps raw photometric images to 3D topography, using stereo overlap redundancy and ray-propagation physics as the only supervision mechanism. As a result, our end-to-end 3D reconstruction algorithm is robust to generalization errors and scales to arbitrarily long videos from arbitrarily sized camera arrays. The scalable hardware and software design of 3D-RAPID addresses a longstanding problem in the field of behavioral imaging, enabling parallelized 3D observation of large collections of freely moving organisms at high spatiotemporal throughputs, which we demonstrate in ants (Pogonomyrmex barbatus), fruit flies, and zebrafish larvae. △ Less

Submitted 19 January, 2023; originally announced January 2023.

arXiv:2209.12394 [pdf, other]

Multi-stage image denoising with the wavelet transform

Authors: Chunwei Tian, Menghua Zheng, Wangmeng Zuo, Bob Zhang, Yanning Zhang, David Zhang

Abstract: Deep convolutional neural networks (CNNs) are used for image denoising via automatically mining accurate structure information. However, most of existing CNNs depend on enlarging depth of designed networks to obtain better denoising performance, which may cause training difficulty. In this paper, we propose a multi-stage image denoising CNN with the wavelet transform (MWDCNN) via three stages, i.e… ▽ More Deep convolutional neural networks (CNNs) are used for image denoising via automatically mining accurate structure information. However, most of existing CNNs depend on enlarging depth of designed networks to obtain better denoising performance, which may cause training difficulty. In this paper, we propose a multi-stage image denoising CNN with the wavelet transform (MWDCNN) via three stages, i.e., a dynamic convolutional block (DCB), two cascaded wavelet transform and enhancement blocks (WEBs) and a residual block (RB). DCB uses a dynamic convolution to dynamically adjust parameters of several convolutions for making a tradeoff between denoising performance and computational costs. WEB uses a combination of signal processing technique (i.e., wavelet transformation) and discriminative learning to suppress noise for recovering more detailed information in image denoising. To further remove redundant features, RB is used to refine obtained features for improving denoising effects and reconstruct clean images via improved residual dense architectures. Experimental results show that the proposed MWDCNN outperforms some popular denoising methods in terms of quantitative and qualitative analysis. Codes are available at https://github.com/hellloxiaotian/MWDCNN. △ Less

Submitted 3 October, 2022; v1 submitted 25 September, 2022; originally announced September 2022.

arXiv:2209.01921 [pdf, other]

doi 10.1109/TGRS.2023.3264560

Multi-frequency PolSAR Image Fusion Classification Based on Semantic Interactive Information and Topological Structure

Authors: Yice Cao, Yan Wu, Ming Li, Mingjie Zheng, Peng Zhang, Jili Wang

Abstract: Compared with the rapid development of single-frequency multi-polarization SAR image classification technology, there is less research on the land cover classification of multifrequency polarimetric SAR (MF-PolSAR) images. In addition, the current deep learning methods for MF-PolSAR classification are mainly based on convolutional neural networks (CNNs), only local spatiality is considered but the… ▽ More Compared with the rapid development of single-frequency multi-polarization SAR image classification technology, there is less research on the land cover classification of multifrequency polarimetric SAR (MF-PolSAR) images. In addition, the current deep learning methods for MF-PolSAR classification are mainly based on convolutional neural networks (CNNs), only local spatiality is considered but the nonlocal relationship is ignored. Therefore, based on semantic interaction and nonlocal topological structure, this paper proposes the MF semantics and topology fusion network (MF-STFnet) to improve MF-PolSAR classification performance. In MF-STFnet, two kinds of classification are implemented for each band, semantic information-based (SIC) and topological property-based (TPC). They work collaboratively during MF-STFnet training, which can not only fully leverage the complementarity of bands, but also combine local and nonlocal spatial information to improve the discrimination between different categories. For SIC, the designed crossband interactive feature extraction module (CIFEM) is embedded to explicitly model the deep semantic correlation among bands, thereby leveraging the complementarity of bands to make ground objects more separable. For TPC, the graph sample and aggregate network (GraphSAGE) is employed to dynamically capture the representation of nonlocal topological relations between land cover categories. In this way, the robustness of classification can be further improved by combining nonlocal spatial information. Finally, an adaptive weighting fusion (AWF) strategy is proposed to merge inference from different bands, so as to make the MF joint classification decisions of SIC and TPC. The comparative experiments show that MF-STFnet can achieve more competitive classification performance than some state-of-the-art methods. △ Less

Submitted 5 September, 2022; originally announced September 2022.

arXiv:2112.09310 [pdf, other]

Joint Device Detection, Channel Estimation, and Data Decoding with Collision Resolution for MIMO Massive Unsourced Random Access

Authors: Tianya Li, Yongpeng Wu, Mengfan Zheng, Wenjun Zhang, Chengwen Xing, Jian** An, Xiang-Gen Xia, Chengshan Xiao

Abstract: In this paper, we investigate a joint device activity detection (DAD), channel estimation (CE), and data decoding (DD) algorithm for multiple-input multiple-output (MIMO) massive unsourced random access (URA). Different from the state-of-the-art slotted transmission scheme, the data in the proposed framework is split into only two parts. A portion of the data is coded by compressed sensing (CS) an… ▽ More In this paper, we investigate a joint device activity detection (DAD), channel estimation (CE), and data decoding (DD) algorithm for multiple-input multiple-output (MIMO) massive unsourced random access (URA). Different from the state-of-the-art slotted transmission scheme, the data in the proposed framework is split into only two parts. A portion of the data is coded by compressed sensing (CS) and the rest is low-density-parity-check (LDPC) coded. In addition to being part of the data, information bits in the CS phase also undertake the task of interleaving pattern design and channel estimation (CE). The principle of interleave-division multiple access (IDMA) is exploited to reduce the interference among devices in the LDPC phase. Based on the belief propagation (BP) algorithm, a low-complexity iterative message passing (MP) algorithm is utilized to decode the data embedded in these two phases separately. Moreover, combined with successive interference cancellation (SIC), the proposed joint DAD-CE-DD algorithm is performed to further improve performance by utilizing the belief of each other. Additionally, based on the energy detection (ED) and sliding window protocol (SWP), we develop a collision resolution protocol to handle the codeword collision, a common issue in the URA system. In addition to the complexity reduction, the proposed algorithm exhibits a substantial performance enhancement compared to the state-of-the-art in terms of efficiency and accuracy. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: Accepted by IEEE JSAC special issue on Next Generation Multiple Access

arXiv:2111.02461 [pdf, other]

Automatic ultrasound vessel segmentation with deep spatiotemporal context learning

Authors: Baichuan Jiang, Alvin Chen, Shyam Bharat, Mingxin Zheng

Abstract: Accurate, real-time segmentation of vessel structures in ultrasound image sequences can aid in the measurement of lumen diameters and assessment of vascular diseases. This, however, remains a challenging task, particularly for extremely small vessels that are difficult to visualize. We propose to leverage the rich spatiotemporal context available in ultrasound to improve segmentation of small-scal… ▽ More Accurate, real-time segmentation of vessel structures in ultrasound image sequences can aid in the measurement of lumen diameters and assessment of vascular diseases. This, however, remains a challenging task, particularly for extremely small vessels that are difficult to visualize. We propose to leverage the rich spatiotemporal context available in ultrasound to improve segmentation of small-scale lower-extremity arterial vasculature. We describe efficient deep learning methods that incorporate temporal, spatial, and feature-aware contextual embeddings at multiple resolution scales while jointly utilizing information from B-mode and Color Doppler signals. Evaluating on femoral and tibial artery scans performed on healthy subjects by an expert ultrasonographer, and comparing to consensus expert ground-truth annotations of inner lumen boundaries, we demonstrate real-time segmentation using the context-aware models and show that they significantly outperform comparable baseline approaches. △ Less

Submitted 3 November, 2021; originally announced November 2021.

arXiv:2110.06977 [pdf, other]

Cloud-Assisted Collaborative Road Information Discovery with Gaussian Process: Application to Road Profile Estimation

Authors: Mohammad R. Hajidavalloo, Zhaojian Li, Xin Xia, Ali Louati, Minghui Zheng, Weichao Zhuang

Abstract: There is an increasing popularity in exploiting modern vehicles as mobile sensors to obtain important road information such as potholes, black ice and road profile. Availability of such information has been identified as a key enabler for next-generation vehicles with enhanced safety, efficiency, and comfort. However, existing road information discovery approaches have been predominately performed… ▽ More There is an increasing popularity in exploiting modern vehicles as mobile sensors to obtain important road information such as potholes, black ice and road profile. Availability of such information has been identified as a key enabler for next-generation vehicles with enhanced safety, efficiency, and comfort. However, existing road information discovery approaches have been predominately performed in a single-vehicle setting, which is inevitably susceptible to vehicle model uncertainty and measurement errors. To overcome these limitations, this paper presents a novel cloud-assisted collaborative estimation framework that can utilize multiple heterogeneous vehicles to iteratively enhance estimation performance. Specifically, each vehicle combines its onboard measurements with a cloud-based Gaussian process (GP), crowdsourced from prior participating vehicles as "pseudo-measurements", into a local estimator to refine the estimation. The resultant local onboard estimation is then sent back to the cloud to update the GP, where we utilize a noisy input GP (NIGP) method to explicitly handle uncertain GPS measurements. We employ the proposed framework to the application of collaborative road profile estimation. Promising results on extensive simulations and hardware-in-the-loop experiments show that the proposed collaborative estimation can significantly enhance estimation and iteratively improve the performance from vehicle to vehicle, despite vehicle heterogeneity, model uncertainty, and measurement noises. △ Less

Submitted 9 June, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: Submitted to IEEE Transactions on Intelligent Transportation Systems

arXiv:2011.04988 [pdf, other]

AIM 2020 Challenge on Rendering Realistic Bokeh

Authors: Andrey Ignatov, Radu Timofte, Ming Qian, Congyu Qiao, Jiamin Lin, Zhenyu Guo, Chenghua Li, Cong Leng, Jian Cheng, Juewen Peng, Xianrui Luo, Ke Xian, Zi** Wu, Zhiguo Cao, Densen Puthussery, Jiji C V, Hrishikesh P S, Melvin Kuriakose, Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah, Kuldeep Purohit, Praveen Kandula, Maitreya Suin, A. N. Rajagopalan , et al. (10 additional authors not shown)

Abstract: This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using th… ▽ More This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR camera. The participants had to render bokeh effect based on only one single frame without any additional data from other cameras or sensors. The target metric used in this challenge combined the runtime and the perceptual quality of the solutions measured in the user study. To ensure the efficiency of the submitted models, we measured their runtime on standard desktop CPUs as well as were running the models on smartphone GPUs. The proposed solutions significantly improved the baseline results, defining the state-of-the-art for practical bokeh effect rendering problem. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: Published in ECCV 2020 Workshop (Advances in Image Manipulation), https://data.vision.ee.ethz.ch/cvl/aim20/

arXiv:2009.00473 [pdf, other]

doi 10.1109/TSP.2020.3021985

Large Intelligent Surface Aided Physical Layer Security Transmission

Authors: Biqian Feng, Yongpeng Wu, Mengfan Zheng, Xiang-Gen Xia, Yongjian Wang, Chengshan Xiao

Abstract: In this paper, we investigate a large intelligent surface-enhanced (LIS-enhanced) system, where a LIS is deployed to assist secure transmission. Our design aims to maximize the achievable secrecy rates in different channel models, i.e., Rician fading and (or) independent and identically distributed Gaussian fading for the legitimate and eavesdropper channels. In addition, we take into consideratio… ▽ More In this paper, we investigate a large intelligent surface-enhanced (LIS-enhanced) system, where a LIS is deployed to assist secure transmission. Our design aims to maximize the achievable secrecy rates in different channel models, i.e., Rician fading and (or) independent and identically distributed Gaussian fading for the legitimate and eavesdropper channels. In addition, we take into consideration an artificial noise-aided transmission structure for further improving system performance. The difficulties of tackling the aforementioned problems are the structure of the expected secrecy rate expressions and the non-convex phase shift constraint. To facilitate the design, we propose two frameworks, namely the sample average approximation based (SAA-based) algorithm and the hybrid stochastic projected gradient-convergent policy (hybrid SPG-CP) algorithm, to calculate the expectation terms in the secrecy rate expressions. Meanwhile, majorization minimization (MM) is adopted to address the non-convexity of the phase shift constraint. In addition, we give some analyses on two special scenarios by making full use of the expectation terms. Simulation results show that the proposed algorithms effectively optimize the secrecy communication rate for the considered setup, and the LIS-enhanced system greatly improves secrecy performance compared to conventional architectures without LIS. △ Less

Submitted 1 September, 2020; originally announced September 2020.

Comments: Accepted by IEEE Transactions on Signal Processing

arXiv:2003.02649 [pdf, other]

An Audio-Based Fault Diagnosis Method for Quadrotors Using Convolutional Neural Network and Transfer Learning

Authors: Wansong Liu, Zhu Chen, Minghui Zheng

Abstract: Quadrotor unmanned aerial vehicles (UAVs) have been developed and applied into several types of workplaces, such as warehouses, which usually involve human workers. The co-existence of human and UAVs brings new challenges to UAVs: potential failure of UAVs may cause risk and danger to surrounding human. Effective and efficient detection of such failure may provide early warning to the surrounding… ▽ More Quadrotor unmanned aerial vehicles (UAVs) have been developed and applied into several types of workplaces, such as warehouses, which usually involve human workers. The co-existence of human and UAVs brings new challenges to UAVs: potential failure of UAVs may cause risk and danger to surrounding human. Effective and efficient detection of such failure may provide early warning to the surrounding human workers and reduce such risk to human beings as much as possible. One of the commonest reasons that cause the failure of the UAV's flight is the physical damage to the propellers. This paper presents a method to detect the propellers' damage only based on the audio noise caused by the UAV's flight. The diagnostic model is developed based on convolutional neural network (CNN) and transfer learning techniques. The audio data is collected from the UAVs in real time, transformed into the time-frequency spectrogram, and used to train the CNN-based diagnostic model. The developed model is able to detect the abnormal features of the spectrogram and thus the physical damage of the propellers. To reduce the data dependence on the UAV's dynamic models and enable the utilization of the training data from UAVs with different dynamic models, the CNN-based diagnostic model is further augmented by transfer learning. As such, the refinement of the well-trained diagnostic model ground on other UAVs only requires a small amount of UAV's training data. Experimental tests are conducted to validate the diagnostic model with an accuracy of higher than 90%. △ Less

Submitted 12 August, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: ACC 2020 Final Version

arXiv:2003.02059 [pdf, other]

Vehicle-Human Interactive Behaviors in Emergency: Data Extraction from Traffic Accident Videos

Authors: Wansong Liu, Danyang Luo, Changxu Wu, Minghui Zheng

Abstract: Currently, studying the vehicle-human interactive behavior in the emergency needs a large amount of datasets in the actual emergent situations that are almost unavailable. Existing public data sources on autonomous vehicles (AVs) mainly focus either on the normal driving scenarios or on emergency situations without human involvement. To fill this gap and facilitate related research, this paper pro… ▽ More Currently, studying the vehicle-human interactive behavior in the emergency needs a large amount of datasets in the actual emergent situations that are almost unavailable. Existing public data sources on autonomous vehicles (AVs) mainly focus either on the normal driving scenarios or on emergency situations without human involvement. To fill this gap and facilitate related research, this paper provides a new yet convenient way to extract the interactive behavior data (i.e., the trajectories of vehicles and humans) from actual accident videos that were captured by both the surveillance cameras and driving recorders. The main challenge for data extraction from real-time accident video lies in the fact that the recording cameras are un-calibrated and the angles of surveillance are unknown. The approach proposed in this paper employs image processing to obtain a new perspective which is different from the original video's perspective. Meanwhile, we manually detect and mark object feature points in each image frame. In order to acquire a gradient of reference ratios, a geometric model is implemented in the analysis of reference pixel value, and the feature points are then scaled to the object trajectory based on the gradient of ratios. The generated trajectories not only restore the object movements completely but also reflect changes in vehicle velocity and rotation based on the feature points distributions. △ Less

Submitted 12 August, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: ACC 2020 final version

arXiv:1912.09859 [pdf, ps, other]

Lightweight and Unobtrusive Data Obfuscation at IoT Edge for Remote Inference

Authors: Dixing Xu, Mengyao Zheng, Linshan Jiang, Chaojie Gu, Rui Tan, Peng Cheng

Abstract: Executing deep neural networks for inference on the server-class or cloud backend based on data generated at the edge of Internet of Things is desirable due primarily to the limited compute power of edge devices and the need to protect the confidentiality of the inference neural networks. However, such a remote inference scheme incurs concerns regarding the privacy of the inference data transmitte… ▽ More Executing deep neural networks for inference on the server-class or cloud backend based on data generated at the edge of Internet of Things is desirable due primarily to the limited compute power of edge devices and the need to protect the confidentiality of the inference neural networks. However, such a remote inference scheme incurs concerns regarding the privacy of the inference data transmitted by the edge devices to the curious backend. This paper presents a lightweight and unobtrusive approach to obfuscate the inference data at the edge devices. It is lightweight in that the edge device only needs to execute a small-scale neural network; it is unobtrusive in that the edge device does not need to indicate whether obfuscation is applied. Extensive evaluation by three case studies of free spoken digit recognition, handwritten digit recognition, and American sign language recognition shows that our approach effectively protects the confidentiality of the raw forms of the inference data while effectively preserving the backend's inference accuracy. △ Less

Submitted 25 March, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

Comments: This paper has been accepted by IEEE Internet of Things Journal, Special Issue on Artificial Intelligence Powered Edge Computing for Internet of Things

arXiv:1908.03478 [pdf, other]

A Preliminary Study on A Physical Model Oriented Learning Algorithm with Application to UAVs

Authors: Minghui Zheng, Zhu Chen, Xiao Liang

Abstract: This paper provides a preliminary study for an efficient learning algorithm by reasoning the error from first principle physics to generate learning signals in near real time. Motivated by iterative learning control (ILC), this learning algorithm is applied to the feedforward control loop of the unmanned aerial vehicles (UAVs), enabling the learning from errors made by other UAVs with different dy… ▽ More This paper provides a preliminary study for an efficient learning algorithm by reasoning the error from first principle physics to generate learning signals in near real time. Motivated by iterative learning control (ILC), this learning algorithm is applied to the feedforward control loop of the unmanned aerial vehicles (UAVs), enabling the learning from errors made by other UAVs with different dynamics or flying in different scenarios. This learning framework improves the data utilization efficiency and learning reliability via analytically incorporating the physical model map**, and enhances the flexibility of the model-based methodology with equip** it with the self-learning capability. Numerical studies are performed to validate the proposed learning algorithm. △ Less

Submitted 9 August, 2019; originally announced August 2019.

Showing 1–25 of 25 results for author: Zheng, M