Search | arXiv e-print repository

Near-Field Channel Characterization for Mid-band ELAA Systems: Sounding, Parameter Estimation, and Modeling

Authors: Wei Fan, Zhiqiang Yuan, Yejian Lyu, Jianhua Zhang, Gert Pedersen, Jonathan Borrill, Fengchun Zhang

Abstract: 6G communication will greatly benefit from using extremely large-scale antenna arrays (ELAAs) and new mid-band spectrums (7-24 GHz). These techniques require a thorough exploration of the challenges and potentials of the associated near-field (NF) phenomena. It is crucial to develop accurate NF channel models that include spherical wave propagation and spatial non-stationarity (SnS). However, chan… ▽ More 6G communication will greatly benefit from using extremely large-scale antenna arrays (ELAAs) and new mid-band spectrums (7-24 GHz). These techniques require a thorough exploration of the challenges and potentials of the associated near-field (NF) phenomena. It is crucial to develop accurate NF channel models that include spherical wave propagation and spatial non-stationarity (SnS). However, channel measurement campaigns for mid-band ELAA systems have rarely been reported in the state-of-the-art. To this end, this work develops a channel sounder dedicated to mid-band ELAA systems based on a distributed modular vector network analyzer incorporating radio-over-fiber (RoF), phase compensation, and virtual antenna array schemes. This novel channel-sounding testbed based on off-the-shelf VNA has the potential to enable large-scale experimentation due to its generic and easy-accessible nature. The main challenges and solutions for develo** NF channel models for mid-band ELAA systems are discussed, including channel sounders, multipath parameter estimation algorithms, and channel modeling frameworks. Besides, the study reports a measurement campaign in an indoor scenario using a 720-element virtual uniform circular array ELAA operating at {16-20} GHz, highlighting the presence of spherical wavefronts and spatial non-stationary effects. The effectiveness of the proposed near-field channel parameter estimator and channel modeling framework is also demonstrated using the measurement data. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Submitted to IEEE Communication Magazine

arXiv:2404.09425 [pdf, other]

Super-resolution of biomedical volumes with 2D supervision

Authors: Cheng Jiang, Alexander Gedeon, Yiwei Lyu, Eric Landgraf, Yufeng Zhang, Xinhai Hou, Akhil Kondepudi, Asadur Chowdury, Honglak Lee, Todd Hollon

Abstract: Volumetric biomedical microscopy has the potential to increase the diagnostic information extracted from clinical tissue specimens and improve the diagnostic accuracy of both human pathologists and computational pathology models. Unfortunately, barriers to integrating 3-dimensional (3D) volumetric microscopy into clinical medicine include long imaging times, poor depth / z-axis resolution, and an… ▽ More Volumetric biomedical microscopy has the potential to increase the diagnostic information extracted from clinical tissue specimens and improve the diagnostic accuracy of both human pathologists and computational pathology models. Unfortunately, barriers to integrating 3-dimensional (3D) volumetric microscopy into clinical medicine include long imaging times, poor depth / z-axis resolution, and an insufficient amount of high-quality volumetric data. Leveraging the abundance of high-resolution 2D microscopy data, we introduce masked slice diffusion for super-resolution (MSDSR), which exploits the inherent equivalence in the data-generating distribution across all spatial dimensions of biological specimens. This intrinsic characteristic allows for super-resolution models trained on high-resolution images from one plane (e.g., XY) to effectively generalize to others (XZ, YZ), overcoming the traditional dependency on orientation. We focus on the application of MSDSR to stimulated Raman histology (SRH), an optical imaging modality for biological specimen analysis and intraoperative diagnosis, characterized by its rapid acquisition of high-resolution 2D images but slow and costly optical z-sectioning. To evaluate MSDSR's efficacy, we introduce a new performance metric, SliceFID, and demonstrate MSDSR's superior performance over baseline models through extensive evaluations. Our findings reveal that MSDSR not only significantly enhances the quality and resolution of 3D volumetric data, but also addresses major obstacles hindering the broader application of 3D volumetric microscopy in clinical diagnostics and biomedical research. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: CVPR Workshop on Computer Vision for Microscopy Image Analysis 2024

arXiv:2403.13680 [pdf, other]

Step-Calibrated Diffusion for Biomedical Optical Image Restoration

Authors: Yiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. Hollon

Abstract: High-quality, high-resolution medical imaging is essential for clinical care. Raman-based biomedical optical imaging uses non-ionizing infrared radiation to evaluate human tissues in real time and is used for early cancer detection, brain tumor diagnosis, and intraoperative tissue analysis. Unfortunately, optical imaging is vulnerable to image degradation due to laser scattering and absorption, wh… ▽ More High-quality, high-resolution medical imaging is essential for clinical care. Raman-based biomedical optical imaging uses non-ionizing infrared radiation to evaluate human tissues in real time and is used for early cancer detection, brain tumor diagnosis, and intraoperative tissue analysis. Unfortunately, optical imaging is vulnerable to image degradation due to laser scattering and absorption, which can result in diagnostic errors and misguided treatment. Restoration of optical images is a challenging computer vision task because the sources of image degradation are multi-factorial, stochastic, and tissue-dependent, preventing a straightforward method to obtain paired low-quality/high-quality data. Here, we present Restorative Step-Calibrated Diffusion (RSCD), an unpaired image restoration method that views the image restoration problem as completing the finishing steps of a diffusion-based image generation task. RSCD uses a step calibrator model to dynamically determine the severity of image degradation and the number of steps required to complete the reverse diffusion process for image restoration. RSCD outperforms other widely used unpaired image restoration methods on both image quality and perceptual evaluation metrics for restoring optical images. Medical imaging experts consistently prefer images restored using RSCD in blinded comparison experiments and report minimal to no hallucinations. Finally, we show that RSCD improves performance on downstream clinical imaging tasks, including automated brain tumor diagnosis and deep tissue imaging. Our code is available at https://github.com/MLNeurosurg/restorative_step-calibrated_diffusion. △ Less

Submitted 16 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2402.09752 [pdf]

Vector spectrometer with Hertz-level resolution and super-recognition capability

Authors: Ting Qing, Shupeng Li, Huashan Yang, Lihan Wang, Yijie Fang, Xiaohu Tang, Meihui Cao, Jianming Lu, Jijun He, Junqiu Liu, Yueguang Lyu, Shilong Pan

Abstract: High-resolution optical spectrometers are crucial in revealing intricate characteristics of signals, determining laser frequencies, measuring physical constants, identifying substances, and advancing biosensing applications. Conventional spectrometers, however, often grapple with inherent trade-offs among spectral resolution, wavelength range, and accuracy. Furthermore, even at high resolution, re… ▽ More High-resolution optical spectrometers are crucial in revealing intricate characteristics of signals, determining laser frequencies, measuring physical constants, identifying substances, and advancing biosensing applications. Conventional spectrometers, however, often grapple with inherent trade-offs among spectral resolution, wavelength range, and accuracy. Furthermore, even at high resolution, resolving overlap** spectral lines during spectroscopic analyses remains a huge challenge. Here, we propose a vector spectrometer with ultrahigh resolution, combining broadband optical frequency hop**, ultrafine microwave-photonic scanning, and vector detection. A programmable frequency-hop** laser was developed, facilitating a sub-Hz linewidth and Hz-level frequency stability, an improvement of four and six orders of magnitude, respectively, compared to those of state-of-the-art tunable lasers. We also designed an asymmetric optical transmitter and receiver to eliminate measurement errors arising from modulation nonlinearity and multi-channel crosstalk. The resultant vector spectrometer exhibits an unprecedented frequency resolution of 2 Hz, surpassing the state-of-the-art by four orders of magnitude, over a 33-nm range. Through high-resolution vector analysis, we observed that group delay information enhances the separation capability of overlap** spectral lines by over 47%, significantly streamlining the real-time identification of diverse substances. Our technique fills the gap in optical spectrometers with resolutions below 10 kHz and enables vector measurement to embrace revolution in functionality. △ Less

Submitted 6 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 21 pages, 6 figures

arXiv:2401.09455 [pdf, other]

Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach

Authors: Yifeng Lyu, Han Hu, Rongfei Fan, Zhi Liu, Jian** An, Shiwen Mao

Abstract: The integrated satellite-terrestrial network (ISTN) system has experienced significant growth, offering seamless communication services in remote areas with limited terrestrial infrastructure. However, designing a routing scheme for ISTN is exceedingly difficult, primarily due to the heightened complexity resulting from the inclusion of additional ground stations, along with the requirement to sat… ▽ More The integrated satellite-terrestrial network (ISTN) system has experienced significant growth, offering seamless communication services in remote areas with limited terrestrial infrastructure. However, designing a routing scheme for ISTN is exceedingly difficult, primarily due to the heightened complexity resulting from the inclusion of additional ground stations, along with the requirement to satisfy various constraints related to satellite service quality. To address these challenges, we study packet routing with ground stations and satellites working jointly to transmit packets, while prioritizing fast communication and meeting energy efficiency and packet loss requirements. Specifically, we formulate the problem of packet routing with constraints as a max-min problem using the Lagrange method. Then we propose a novel constrained Multi-Agent reinforcement learning (MARL) dynamic routing algorithm named CMADR, which efficiently balances objective improvement and constraint satisfaction during the updating of policy and Lagrange multipliers. Finally, we conduct extensive experiments and an ablation study using the OneWeb and Telesat mega-constellations. Results demonstrate that CMADR reduces the packet delay by a minimum of 21% and 15%, while meeting stringent energy consumption and packet loss rate constraints, outperforming several baseline algorithms. △ Less

Submitted 22 December, 2023; originally announced January 2024.

arXiv:2312.05256 [pdf, other]

Holistic Evaluation of GPT-4V for Biomedical Imaging

Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, **gyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more. Tasks include modality recognition, anatomy localization, disease diagnosis, report generation, and lesion detection. The extensive experiments provide insights into GPT-4V's strengths and weaknesses. Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization. GPT-4V excels at diagnostic report generation, indicating strong image captioning skills. While promising for biomedical imaging AI, GPT-4V requires further enhancement and validation before clinical deployment. We emphasize responsible development and testing for trustworthy integration of biomedical AGI. This rigorous evaluation of GPT-4V on diverse medical images advances understanding of multimodal large language models (LLMs) and guides future work toward impactful healthcare applications. △ Less

Submitted 10 November, 2023; originally announced December 2023.

arXiv:2208.14022 [pdf, other]

Stabilize, Decompose, and Denoise: Self-Supervised Fluoroscopy Denoising

Authors: Ruizhou Liu, Qiang Ma, Zhiwei Cheng, Yuanyuan Lyu, Jianji Wang, S. Kevin Zhou

Abstract: Fluoroscopy is an imaging technique that uses X-ray to obtain a real-time 2D video of the interior of a 3D object, hel** surgeons to observe pathological structures and tissue functions especially during intervention. However, it suffers from heavy noise that mainly arises from the clinical use of a low dose X-ray, thereby necessitating the technology of fluoroscopy denoising. Such denoising is… ▽ More Fluoroscopy is an imaging technique that uses X-ray to obtain a real-time 2D video of the interior of a 3D object, hel** surgeons to observe pathological structures and tissue functions especially during intervention. However, it suffers from heavy noise that mainly arises from the clinical use of a low dose X-ray, thereby necessitating the technology of fluoroscopy denoising. Such denoising is challenged by the relative motion between the object being imaged and the X-ray imaging system. We tackle this challenge by proposing a self-supervised, three-stage framework that exploits the domain knowledge of fluoroscopy imaging. (i) Stabilize: we first construct a dynamic panorama based on optical flow calculation to stabilize the non-stationary background induced by the motion of the X-ray detector. (ii) Decompose: we then propose a novel mask-based Robust Principle Component Analysis (RPCA) decomposition method to separate a video with detector motion into a low-rank background and a sparse foreground. Such a decomposition accommodates the reading habit of experts. (iii) Denoise: we finally denoise the background and foreground separately by a self-supervised learning strategy and fuse the denoised parts into the final output via a bilateral, spatiotemporal filter. To assess the effectiveness of our work, we curate a dedicated fluoroscopy dataset of 27 videos (1,568 frames) and corresponding ground truth. Our experiments demonstrate that it achieves significant improvements in terms of denoising and enhancement effects when compared with standard approaches. Finally, expert rating confirms this efficacy. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: 11 pages, 18 figures

arXiv:2208.03750 [pdf, other]

Omni-directional Pathloss Measurement Based on Virtual Antenna Array with Directional Antennas

Authors: Mengting Li, Fengchun Zhang, Xiang Zhang, Yejian Lyu, Wei Fan

Abstract: Omni-directional pathloss, which refers to the pathloss when omni-directional antennas are used at the link ends, are essential for system design and evaluation. In the millimeter-wave (mm-Wave) and beyond bands, high gain directional antennas are widely used for channel measurements due to the significant signal attenuation. Conventional methods for omni-directional pathloss estimation are based… ▽ More Omni-directional pathloss, which refers to the pathloss when omni-directional antennas are used at the link ends, are essential for system design and evaluation. In the millimeter-wave (mm-Wave) and beyond bands, high gain directional antennas are widely used for channel measurements due to the significant signal attenuation. Conventional methods for omni-directional pathloss estimation are based on directional scanning sounding (DSS) system, i.e., a single directional antenna placed at the center of a rotator capturing signals from different rotation angles. The omni-directional pathloss is obtained by either summing up all the powers above the noise level or just summing up the powers of detected propagation paths. However, both methods are problematic with relatively wide main beams and high side-lobes provided by the directional antennas. In this letter, directional antenna based virtual antenna array (VAA) system is implemented for omni-directional pathloss estimation. The VAA scheme uses the same measurement system as the DSS, yet it offers high angular resolution (i.e. narrow main beam) and low side-lobes, which is essential for achieving accurate multipath detection in the power angular delay profiles (PADPs) and thereby obtaining accurate omni-directional pathloss. A measurement campaign was designed and conducted in an indoor corridor at 28-30 GHz to verify the effectiveness of the proposed method. △ Less

Submitted 7 August, 2022; originally announced August 2022.

arXiv:2203.02772 [pdf, other]

Rib Suppression in Digital Chest Tomosynthesis

Authors: Yihua Sun, Qingsong Yao, Yuanyuan Lyu, Jianji Wang, Yi Xiao, Hongen Liao, S. Kevin Zhou

Abstract: Digital chest tomosynthesis (DCT) is a technique to produce sectional 3D images of a human chest for pulmonary disease screening, with 2D X-ray projections taken within an extremely limited range of angles. However, under the limited angle scenario, DCT contains strong artifacts caused by the presence of ribs, jamming the imaging quality of the lung area. Recently, great progress has been achieved… ▽ More Digital chest tomosynthesis (DCT) is a technique to produce sectional 3D images of a human chest for pulmonary disease screening, with 2D X-ray projections taken within an extremely limited range of angles. However, under the limited angle scenario, DCT contains strong artifacts caused by the presence of ribs, jamming the imaging quality of the lung area. Recently, great progress has been achieved for rib suppression in a single X-ray image, to reveal a clearer lung texture. We firstly extend the rib suppression problem to the 3D case at the software level. We propose a $\textbf{T}$omosynthesis $\textbf{RI}$b Su$\textbf{P}$pression and $\textbf{L}$ung $\textbf{E}$nhancement $\textbf{Net}$work (TRIPLE-Net) to model the 3D rib component and provide a rib-free DCT. TRIPLE-Net takes the advantages from both 2D and 3D domains, which model the ribs in DCT with the exact FBP procedure and 3D depth information, respectively. The experiments on simulated datasets and clinical data have shown the effectiveness of TRIPLE-Net to preserve lung details as well as improve the imaging quality of pulmonary diseases. Finally, an expert user study confirms our findings. △ Less

Submitted 5 March, 2022; originally announced March 2022.

arXiv:2202.00379 [pdf, other]

doi 10.1177/02783649211052312

NTU VIRAL: A Visual-Inertial-Ranging-Lidar Dataset, From an Aerial Vehicle Viewpoint

Authors: Thien-Minh Nguyen, Shenghai Yuan, Muqing Cao, Yang Lyu, Thien Hoang Nguyen, Lihua Xie

Abstract: In recent years, autonomous robots have become ubiquitous in research and daily life. Among many factors, public datasets play an important role in the progress of this field, as they waive the tall order of initial investment in hardware and manpower. However, for research on autonomous aerial systems, there appears to be a relative lack of public datasets on par with those used for autonomous dr… ▽ More In recent years, autonomous robots have become ubiquitous in research and daily life. Among many factors, public datasets play an important role in the progress of this field, as they waive the tall order of initial investment in hardware and manpower. However, for research on autonomous aerial systems, there appears to be a relative lack of public datasets on par with those used for autonomous driving and ground robots. Thus, to fill in this gap, we conduct a data collection exercise on an aerial platform equipped with an extensive and unique set of sensors: two 3D lidars, two hardware-synchronized global-shutter cameras, multiple Inertial Measurement Units (IMUs), and especially, multiple Ultra-wideband (UWB) ranging units. The comprehensive sensor suite resembles that of an autonomous driving car, but features distinct and challenging characteristics of aerial operations. We record multiple datasets in several challenging indoor and outdoor conditions. Calibration results and ground truth from a high-accuracy laser tracker are also included in each package. All resources can be accessed via our webpage https://ntu-aris.github.io/ntu_viral_dataset. △ Less

Submitted 1 February, 2022; originally announced February 2022.

Comments: IJRR 2021

arXiv:2112.13194 [pdf, other]

doi 10.1109/ACCESS.2022.3157876

Network-Aware 5G Edge Computing for Object Detection: Augmenting Wearables to "See" More, Farther and Faster

Authors: Zhongzheng Yuan, Tommy Azzino, Yu Hao, Yixuan Lyu, Haoyang Pei, Alain Boldini, Marco Mezzavilla, Mahya Beheshti, Maurizio Porfiri, Todd Hudson, William Seiple, Yi Fang, Sundeep Rangan, Yao Wang, J. R. Rizzo

Abstract: Advanced wearable devices are increasingly incorporating high-resolution multi-camera systems. As state-of-the-art neural networks for processing the resulting image data are computationally demanding, there has been growing interest in leveraging fifth generation (5G) wireless connectivity and mobile edge computing for offloading this processing to the cloud. To assess this possibility, this pape… ▽ More Advanced wearable devices are increasingly incorporating high-resolution multi-camera systems. As state-of-the-art neural networks for processing the resulting image data are computationally demanding, there has been growing interest in leveraging fifth generation (5G) wireless connectivity and mobile edge computing for offloading this processing to the cloud. To assess this possibility, this paper presents a detailed simulation and evaluation of 5G wireless offloading for object detection within a powerful, new smart wearable called VIS4ION, for the Blind-and-Visually Impaired (BVI). The current VIS4ION system is an instrumented book-bag with high-resolution cameras, vision processing and haptic and audio feedback. The paper considers uploading the camera data to a mobile edge cloud to perform real-time object detection and transmitting the detection results back to the wearable. To determine the video requirements, the paper evaluates the impact of video bit rate and resolution on object detection accuracy and range. A new street scene dataset with labeled objects relevant to BVI navigation is leveraged for analysis. The vision evaluation is combined with a detailed full-stack wireless network simulation to determine the distribution of throughputs and delays with real navigation paths and ray-tracing from new high-resolution 3D models in an urban environment. For comparison, the wireless simulation considers both a standard 4G-Long Term Evolution (LTE) carrier and high-rate 5G millimeter-wave (mmWave) carrier. The work thus provides a thorough and realistic assessment of edge computing with mmWave connectivity in an application with both high bandwidth and low latency requirements. △ Less

Submitted 15 April, 2022; v1 submitted 25 December, 2021; originally announced December 2021.

Comments: Published in: IEEE Access ( Volume: 10)

arXiv:2110.10194

CoFi: Coarse-to-Fine ICP for LiDAR Localization in an Efficient Long-lasting Point Cloud Map

Authors: Yecheng Lyu, Xinming Huang, Ziming Zhang

Abstract: LiDAR odometry and localization has attracted increasing research interest in recent years. In the existing works, iterative closest point (ICP) is widely used since it is precise and efficient. Due to its non-convexity and its local iterative strategy, however, ICP-based method easily falls into local optima, which in turn calls for a precise initialization. In this paper, we propose CoFi, a Coar… ▽ More LiDAR odometry and localization has attracted increasing research interest in recent years. In the existing works, iterative closest point (ICP) is widely used since it is precise and efficient. Due to its non-convexity and its local iterative strategy, however, ICP-based method easily falls into local optima, which in turn calls for a precise initialization. In this paper, we propose CoFi, a Coarse-to-Fine ICP algorithm for LiDAR localization. Specifically, the proposed algorithm down-samples the input point sets under multiple voxel resolution, and gradually refines the transformation from the coarse point sets to the fine-grained point sets. In addition, we propose a map based LiDAR localization algorithm that extracts semantic feature points from the LiDAR frames and apply CoFi to estimate the pose on an efficient point cloud map. With the help of the Cylinder3D algorithm for LiDAR scan semantic segmentation, the proposed CoFi localization algorithm demonstrates the state-of-the-art performance on the KITTI odometry benchmark, with significant improvement over the literature. △ Less

Submitted 28 February, 2024; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: Revise to new article

arXiv:2110.09134 [pdf, other]

GAN-based disentanglement learning for chest X-ray rib suppression

Authors: Luyi Han, Yuanyuan Lyu, Cheng Peng, S. Kevin Zhou

Abstract: Clinical evidence has shown that rib-suppressed chest X-rays (CXRs) can improve the reliability of pulmonary disease diagnosis. However, previous approaches on generating rib-suppressed CXR face challenges in preserving details and eliminating rib residues. We hereby propose a GAN-based disentanglement learning framework called Rib Suppression GAN, or RSGAN, to perform rib suppression by utilizing… ▽ More Clinical evidence has shown that rib-suppressed chest X-rays (CXRs) can improve the reliability of pulmonary disease diagnosis. However, previous approaches on generating rib-suppressed CXR face challenges in preserving details and eliminating rib residues. We hereby propose a GAN-based disentanglement learning framework called Rib Suppression GAN, or RSGAN, to perform rib suppression by utilizing the anatomical knowledge embedded in unpaired computed tomography (CT) images. In this approach, we employ a residual map to characterize the intensity difference between CXR and the corresponding rib-suppressed result. To predict the residual map in CXR domain, we disentangle the image into structure- and contrast-specific features and transfer the rib structural priors from digitally reconstructed radiographs (DRRs) computed by CT. Furthermore, we employ additional adaptive loss to suppress rib residue and preserve more details. We conduct extensive experiments based on 1,673 CT volumes, and four benchmarking CXR datasets, totaling over 120K images, to demonstrate that (i) our proposed RSGAN achieves superior image quality compared to the state-of-the-art rib suppression methods; (ii) combining CXR with our rib-suppressed result leads to better performance in lung disease classification and tuberculosis area detection. △ Less

Submitted 18 October, 2021; originally announced October 2021.

arXiv:2109.06120 [pdf, other]

LiDAR Odometry Methodologies for Autonomous Driving: A Survey

Authors: Nikhil Jonnavithula, Yecheng Lyu, Ziming Zhang

Abstract: Vehicle odometry is an essential component of an automated driving system as it computes the vehicle's position and orientation. The odometry module has a higher demand and impact in urban areas where the global navigation satellite system (GNSS) signal is weak and noisy. Traditional visual odometry methods suffer from the diverse illumination status and get disparities during pose estimation, whi… ▽ More Vehicle odometry is an essential component of an automated driving system as it computes the vehicle's position and orientation. The odometry module has a higher demand and impact in urban areas where the global navigation satellite system (GNSS) signal is weak and noisy. Traditional visual odometry methods suffer from the diverse illumination status and get disparities during pose estimation, which results in significant errors as the error accumulates. Odometry using light detection and ranging (LiDAR) devices has attracted increasing research interest as LiDAR devices are robust to illumination variations. In this survey, we examine the existing LiDAR odometry methods and summarize the pipeline and delineate the several intermediate steps. Additionally, the existing LiDAR odometry methods are categorized by their correspondence type, and their advantages, disadvantages, and correlations are analyzed across-category and within-category in each step. Finally, we compare the accuracy and the running speed among these methodologies evaluated over the KITTI odometry dataset and outline promising future research directions. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: International Conference on Robotics and Automation (ICRA) conference and Robotics and Automation Letters (RA-L) journal

arXiv:2106.10623 [pdf, other]

A Scalable 256-Elements E-Band Phased-Array Transceiver for Broadband Communication

Authors: Xu Li, Wenyao Zhai, Morris Repeta, Hua Cai, Tyler Ross, Kimia Ansari, Sam Tiller, Hari Krishna Pothula, Dong Liang, Fan Yang, Yibo Lyu, Songlin Shuai, Guangjian Wang, Wen Tong

Abstract: For E-band wireless communications, a high gain steerable antenna with sub-arrays is desired to reduce the implementation complexity. This paper presents an E-band communication link with 256-elements antennas based on 8-elements sub-arrays and four beam-forming chips in silicon germanium (SiGe) bipolar complementary metal-oxide-semiconductor (BiCMOS), which is packaged on a 19-layer low temperatu… ▽ More For E-band wireless communications, a high gain steerable antenna with sub-arrays is desired to reduce the implementation complexity. This paper presents an E-band communication link with 256-elements antennas based on 8-elements sub-arrays and four beam-forming chips in silicon germanium (SiGe) bipolar complementary metal-oxide-semiconductor (BiCMOS), which is packaged on a 19-layer low temperature co-fired ceramic (LTCC) substrate. After the design and manufacture of the 256-elements antenna, a fast near-field calibration method is proposed for calibration, where a single near-field measurement is required. Then near-field to far-field (NFFF) transform and far-field to near-field (FFNF) transform are used for the bore-sight calibration. The comparison with high frequency structure simulator (HFSS) is utilized for the non-bore-sight calibration. Verified on the 256-elements antenna, the beam-forming performance measured in the chamber is in good agreement with the simulations. The communication in the office environment is also realized using a fifth generation (5G) new radio (NR) system, whose bandwidth is 400 megahertz (MHz) and waveform format is orthogonal frequency division multiplexing (OFDM) with 120 kilohertz (kHz) sub-carrier spacing. △ Less

Submitted 20 June, 2021; originally announced June 2021.

arXiv:2106.02424 [pdf, other]

Contour Moments Based Manipulation of Composite Rigid-Deformable Objects with Finite Time Model Estimation and Shape/Position Control

Authors: Jiaming Qi, Guangfu Ma, Jihong Zhu, Peng Zhou, Yueyong Lyu, Haibo Zhang, David Navarro-Alarcon

Abstract: The robotic manipulation of composite rigid-deformable objects (i.e. those with mixed non-homogeneous stiffness properties) is a challenging problem with clear practical applications that, despite the recent progress in the field, it has not been sufficiently studied in the literature. To deal with this issue, in this paper we propose a new visual servoing method that has the capability to manipul… ▽ More The robotic manipulation of composite rigid-deformable objects (i.e. those with mixed non-homogeneous stiffness properties) is a challenging problem with clear practical applications that, despite the recent progress in the field, it has not been sufficiently studied in the literature. To deal with this issue, in this paper we propose a new visual servoing method that has the capability to manipulate this broad class of objects (which varies from soft to rigid) with the same adaptive strategy. To quantify the object's infinite-dimensional configuration, our new approach computes a compact feedback vector of 2D contour moments features. A sliding mode control scheme is then designed to simultaneously ensure the finite-time convergence of both the feedback shape error and the model estimation error. The stability of the proposed framework (including the boundedness of all the signals) is rigorously proved with Lyapunov theory. Detailed simulations and experiments are presented to validate the effectiveness of the proposed approach. To the best of the author's knowledge, this is the first time that contour moments along with finite-time control have been used to solve this difficult manipulation problem. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2104.11888 [pdf, other]

doi 10.1109/LRA.2021.3080633

MILIOM: Tightly Coupled Multi-Input Lidar-Inertia Odometry and Map**

Authors: Thien-Minh Nguyen, Shenghai Yuan, Muqing Cao, Yang Lyu, Thien Hoang Nguyen, Lihua Xie

Abstract: In this letter we investigate a tightly coupled Lidar-Inertia Odometry and Map** (LIOM) scheme, with the capability to incorporate multiple lidars with complementary field of view (FOV). In essence, we devise a time-synchronized scheme to combine extracted features from separate lidars into a single pointcloud, which is then used to construct a local map and compute the feature-map matching (FMM… ▽ More In this letter we investigate a tightly coupled Lidar-Inertia Odometry and Map** (LIOM) scheme, with the capability to incorporate multiple lidars with complementary field of view (FOV). In essence, we devise a time-synchronized scheme to combine extracted features from separate lidars into a single pointcloud, which is then used to construct a local map and compute the feature-map matching (FMM) coefficients. These coefficients, along with the IMU preinteration observations, are then used to construct a factor graph that will be optimized to produce an estimate of the sliding window trajectory. We also propose a key frame-based map management strategy to marginalize certain poses and pointclouds in the sliding window to grow a global map, which is used to assemble the local map in the later stage. The use of multiple lidars with complementary FOV and the global map ensures that our estimate has low drift and can sustain good localization in situations where single lidar use gives poor result, or even fails to work. Multi-thread computation implementations are also adopted to fractionally cut down the computation time and ensure real-time performance. We demonstrate the efficacy of our system via a series of experiments on public datasets collected from an aerial vehicle. △ Less

Submitted 5 July, 2021; v1 submitted 24 April, 2021; originally announced April 2021.

Comments: Accepted for IEEE RAL and IROS 2021

Journal ref: IEEE Robotics and Automation Letters (Volume: 6, Issue: 3, July 2021)

arXiv:2103.05255 [pdf, other]

Improving Generalizability in Limited-Angle CT Reconstruction with Sinogram Extrapolation

Authors: Ce Wang, Haimiao Zhang, Qian Li, Kun Shang, Yuanyuan Lyu, Bin Dong, S. Kevin Zhou

Abstract: Computed tomography (CT) reconstruction from X-ray projections acquired within a limited angle range is challenging, especially when the angle range is extremely small. Both analytical and iterative models need more projections for effective modeling. Deep learning methods have gained prevalence due to their excellent reconstruction performances, but such success is mainly limited within the same… ▽ More Computed tomography (CT) reconstruction from X-ray projections acquired within a limited angle range is challenging, especially when the angle range is extremely small. Both analytical and iterative models need more projections for effective modeling. Deep learning methods have gained prevalence due to their excellent reconstruction performances, but such success is mainly limited within the same dataset and does not generalize across datasets with different distributions. Hereby we propose ExtraPolationNetwork for limited-angle CT reconstruction via the introduction of a sinogram extrapolation module, which is theoretically justified. The module complements extra sinogram information and boots model generalizability. Extensive experimental results show that our reconstruction model achieves state-of-the-art performance on NIH-AAPM dataset, similar to existing approaches. More importantly, we show that using such a sinogram extrapolation module significantly improves the generalization capability of the model on unseen datasets (e.g., COVID-19 and LIDC datasets) when compared to existing approaches. △ Less

Submitted 17 November, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

arXiv:2103.04552 [pdf, other]

U-DuDoNet: Unpaired dual-domain network for CT metal artifact reduction

Authors: Yuanyuan Lyu, Jiajun Fu, Cheng Peng, S. Kevin Zhou

Abstract: Recently, both supervised and unsupervised deep learning methods have been widely applied on the CT metal artifact reduction (MAR) task. Supervised methods such as Dual Domain Network (Du-DoNet) work well on simulation data; however, their performance on clinical data is limited due to domain gap. Unsupervised methods are more generalized, but do not eliminate artifacts completely through the sole… ▽ More Recently, both supervised and unsupervised deep learning methods have been widely applied on the CT metal artifact reduction (MAR) task. Supervised methods such as Dual Domain Network (Du-DoNet) work well on simulation data; however, their performance on clinical data is limited due to domain gap. Unsupervised methods are more generalized, but do not eliminate artifacts completely through the sole processing on the image domain. To combine the advantages of both MAR methods, we propose an unpaired dual-domain network (U-DuDoNet) trained using unpaired data. Unlike the artifact disentanglement network (ADN) that utilizes multiple encoders and decoders for disentangling content from artifact, our U-DuDoNet directly models the artifact generation process through additions in both sinogram and image domains, which is theoretically justified by an additive property associated with metal artifact. Our design includes a self-learned sinogram prior net, which provides guidance for restoring the information in the sinogram domain, and cyclic constraints for artifact reduction and addition on unpaired data. Extensive experiments on simulation data and clinical images demonstrate that our novel framework outperforms the state-of-the-art unpaired approaches. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 11 pages, 4 figures

arXiv:2102.13304 [pdf]

Feasibility Enhancement of Constrained Receding Horizon Control Using Generalized Control Barrier Function

Authors: Haitong Ma, Xiangteng Zhang, Shengbo Eben Li, Ziyu Lin, Yao Lyu, Sifa Zheng

Abstract: Receding horizon control (RHC) is a popular procedure to deal with optimal control problems. Due to the existence of state constraints, optimization-based RHC often suffers the notorious issue of infeasibility, which strongly shrinks the region of controllable state. This paper proposes a generalized control barrier function (CBF) to enlarge the feasible region of constrained RHC with only a one-s… ▽ More Receding horizon control (RHC) is a popular procedure to deal with optimal control problems. Due to the existence of state constraints, optimization-based RHC often suffers the notorious issue of infeasibility, which strongly shrinks the region of controllable state. This paper proposes a generalized control barrier function (CBF) to enlarge the feasible region of constrained RHC with only a one-step constraint on the prediction horizon. This design can reduce the constrained steps by penalizing the tendency to move towards the constraint boundary. Additionally, generalized CBF is able to handle high-order equality or inequality constraints through extending the constrained step to nonadjacent nodes. We apply this technique on an automated vehicle control task. The results show that compared to multi-step pointwise constraints, generalized CBF can effectively avoid the infeasibility issue in a larger partition of the state space, and the computing efficiency is also improved by 14%-23%. △ Less

Submitted 26 February, 2021; originally announced February 2021.

arXiv:2102.01423 [pdf, other]

Vision Based Autonomous UAV Plane Estimation And Following for Building Inspection

Authors: Yang Lyu, Muqing Cao, Shenghai Yuan, Lihua Xie

Abstract: Unmanned Aerial Vehicle (UAV) has already demonstrated its potential in many civilian applications, and the façade inspection is among the most promising ones. In this paper, we focus on enabling the autonomous perception and control of a small UAV for a façade inspection task. Specifically, we consider the perception as a planar object pose estimation problem by simplifying the building structure… ▽ More Unmanned Aerial Vehicle (UAV) has already demonstrated its potential in many civilian applications, and the façade inspection is among the most promising ones. In this paper, we focus on enabling the autonomous perception and control of a small UAV for a façade inspection task. Specifically, we consider the perception as a planar object pose estimation problem by simplifying the building structure as concatenation of planes, and the control as an optimal reference tracking control problem. First, a vision based adaptive observer is proposed which can realize stable plane pose estimation under very mild observation conditions. Second, a model predictive controller is designed to achieve stable tracking and smooth transition in a multi-plane scenario, while the persistent excitation (PE) condition of the observer and the maneuver constraints of the UAV are satisfied. The proposed autonomous plane pose estimation and plane tracking methods are tested in both simulation and practical building fasçade inspection scenarios, which demonstrate their effectiveness and practicability. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: 12 pages, 12 figures, submitted to IEEE Transactions. for possible publication

arXiv:2011.08973 [pdf, ps, other]

Two Types of Mixed Orthogonal Frequency Division Multiplexing (X-OFDM) Waveform for Optical Wireless Communication

Authors: Xu Li, **g**g Huang, Yibo Lyu, Rui Ni, Jia** Luo, Jun** Zhang

Abstract: The optical wireless communication (OWC) with the intensity modulation (IM), requires the modulated radio frequency (RF) signal to be real and non-negative. To satisfy the requirements, this paper proposes two types of mixed orthogonal frequency division multiplexing (X-OFDM) waveform. The Hermitian symmetry (HS) character of the sub-carriers in the frequency domain, guarantees the signal in the t… ▽ More The optical wireless communication (OWC) with the intensity modulation (IM), requires the modulated radio frequency (RF) signal to be real and non-negative. To satisfy the requirements, this paper proposes two types of mixed orthogonal frequency division multiplexing (X-OFDM) waveform. The Hermitian symmetry (HS) character of the sub-carriers in the frequency domain, guarantees the signal in the time domain to be real, which reduces the spectral efficiency to $1/2$. For the odd sub-carriers in the frequency domain, the signal in the time domain after the inverse fast fourier transform (IFFT) is antisymmetric. For the even sub-carriers in the frequency domain, the signal in the time domain after the IFFT is symmetric. Based on the antisymmetric and symmetric characters, the two types of X-OFDM waveform are designed to guarantee the signal in the time domain to be non-negative, where the direct current (DC) bias is not needed. With $N$ sub-carriers in the frequency domain, the generated signal in the time domain has $3N/2$ points, which further reduces the spectral efficiency to $1/3 = 1/2 \times 2/3$. The numerical simulations show that, the two types of X-OFDM waveform greatly enhance the power efficiency considering the OWC channel with the signal-dependent noise and/or the signal-independent noise. △ Less

Submitted 7 November, 2020; originally announced November 2020.

Comments: X. Li, J. Huang, Y. Lyu, J. Luo, J. Zhang, "A Mixed Orthogonal Frequency Division Multiplexing (X-OFDM) Waveform for Optical Wireless Communication," \emph{Accepted by 2020 IEEE GLOBECOM Workshop on Optical Wireless Communications (OWC)}, 2020

arXiv:2010.13072 [pdf, other]

LIRO: Tightly Coupled Lidar-Inertia-Ranging Odometry

Authors: Thien-Minh Nguyen, Muqing Cao, Shenghai Yuan, Yang Lyu, Thien Hoang Nguyen, Lihua Xie

Abstract: In recent years, thanks to the continuously reduced cost and weight of 3D Lidar, the applications of this type of sensor in robotics community have become increasingly popular. Despite many progresses, estimation drift and tracking loss are still prevalent concerns associated with these systems. However, in theory these issues can be resolved with the use of some observations to fixed landmarks in… ▽ More In recent years, thanks to the continuously reduced cost and weight of 3D Lidar, the applications of this type of sensor in robotics community have become increasingly popular. Despite many progresses, estimation drift and tracking loss are still prevalent concerns associated with these systems. However, in theory these issues can be resolved with the use of some observations to fixed landmarks in the environments. This motivates us to investigate a tightly coupled sensor fusion scheme of Ultra-Wideband (UWB) range measurements with Lidar and inertia measurements. First, data from IMU, Lidar and UWB are associated with the robot's states on a sliding windows based on their timestamps. Then, we construct a cost function comprising of factors from UWB, Lidar and IMU preintegration measurements. Finally an optimization process is carried out to estimate the robot's position and orientation. Via some real world experiments, we show that the method can effectively resolve the drift issue, while only requiring two or three anchors deployed in the environment. △ Less

Submitted 25 October, 2020; originally announced October 2020.

arXiv:2010.12274 [pdf, other]

VIRAL-Fusion: A Visual-Inertial-Ranging-Lidar Sensor Fusion Approach

Authors: Thien-Minh Nguyen, Shenghai Yuan, Muqing Cao, Yang Lyu, Thien Hoang Nguyen, Lihua Xie

Abstract: In recent years, Onboard Self Localization (OSL) methods based on cameras or Lidar have achieved many significant progresses. However, some issues such as estimation drift and feature-dependence still remain inherent limitations. On the other hand, infrastructure-based methods can generally overcome these issues, but at the expense of some installation cost. This poses an interesting problem of ho… ▽ More In recent years, Onboard Self Localization (OSL) methods based on cameras or Lidar have achieved many significant progresses. However, some issues such as estimation drift and feature-dependence still remain inherent limitations. On the other hand, infrastructure-based methods can generally overcome these issues, but at the expense of some installation cost. This poses an interesting problem of how to effectively combine these methods, so as to achieve localization with long-term consistency as well as flexibility compared to any single method. To this end, we propose a comprehensive optimization-based estimator for 15-dimensional state of an Unmanned Aerial Vehicle (UAV), fusing data from an extensive set of sensors: inertial measurement units (IMUs), Ultra-Wideband (UWB) ranging sensors, and multiple onboard Visual-Inertial and Lidar odometry subsystems. In essence, a sliding window is used to formulate a sequence of robot poses, where relative rotational and translational constraints between these poses are observed in the IMU preintegration and OSL observations, while orientation and position are coupled in body-offset UWB range observations. An optimization-based approach is developed to estimate the trajectory of the robot in this sliding window. We evaluate the performance of the proposed scheme in multiple scenarios, including experiments on public datasets, high-fidelity graphical-physical simulator, and field-collected data from UAV flight tests. The result demonstrates that our integrated localization method can effectively resolve the drift issue, while incurring minimal installation requirements. △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:2007.06373 [pdf, other]

Symmetric Dilated Convolution for Surgical Gesture Recognition

Authors: **glu Zhang, Yinyu Nie, Yao Lyu, Hailin Li, Jian Chang, Xiaosong Yang, Jian Jun Zhang

Abstract: Automatic surgical gesture recognition is a prerequisite of intra-operative computer assistance and objective surgical skill assessment. Prior works either require additional sensors to collect kinematics data or have limitations on capturing temporal information from long and untrimmed surgical videos. To tackle these challenges, we propose a novel temporal convolutional architecture to automatic… ▽ More Automatic surgical gesture recognition is a prerequisite of intra-operative computer assistance and objective surgical skill assessment. Prior works either require additional sensors to collect kinematics data or have limitations on capturing temporal information from long and untrimmed surgical videos. To tackle these challenges, we propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures with corresponding boundaries only using RGB videos. We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly. We validate the effectiveness of our approach on a fundamental robotic suturing task from the JIGSAWS dataset. The experiment results demonstrate the ability of our method on capturing long-term frame dependencies, which largely outperform the state-of-the-art methods on the frame-wise accuracy up to ~6 points and the F1@50 score ~6 points. △ Less

Submitted 14 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

Comments: Accepted to MICCAI 2020

arXiv:2006.11825 [pdf, other]

TreeRNN: Topology-Preserving Deep GraphEmbedding and Learning

Authors: Yecheng Lyu, Ming Li, Xinming Huang, Ulkuhan Guler, Patrick Schaumont, Ziming Zhang

Abstract: General graphs are difficult for learning due to their irregular structures. Existing works employ message passing along graph edges to extract local patterns using customized graph kernels, but few of them are effective for the integration of such local patterns into global features. In contrast, in this paper we study the methods to transfer the graphs into trees so that explicit orders are lear… ▽ More General graphs are difficult for learning due to their irregular structures. Existing works employ message passing along graph edges to extract local patterns using customized graph kernels, but few of them are effective for the integration of such local patterns into global features. In contrast, in this paper we study the methods to transfer the graphs into trees so that explicit orders are learned to direct the feature integration from local to global. To this end, we apply the breadth first search (BFS) to construct trees from the graphs, which adds direction to the graph edges from the center node to the peripheral nodes. In addition, we proposed a novel projection scheme that transfer the trees to image representations, which is suitable for conventional convolution neural networks (CNNs) and recurrent neural networks (RNNs). To best learn the patterns from the graph-tree-images, we propose TreeRNN, a 2D RNN architecture that recurrently integrates the image pixels by rows and columns to help classify the graph categories. We evaluate the proposed method on several graph classification datasets, and manage to demonstrate comparable accuracy with the state-of-the-art on MUTAG, PTC-MR and NCI1 datasets. △ Less

Submitted 22 July, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

Comments: 7 pages

arXiv:2006.07644 [pdf, other]

RoadNet-RT: High Throughput CNN Architecture and SoC Design for Real-Time Road Segmentation

Authors: Lin Bai, Yecheng Lyu, Xinming Huang

Abstract: In recent years, convolutional neural network has gained popularity in many engineering applications especially for computer vision. In order to achieve better performance, often more complex structures and advanced operations are incorporated into the neural networks, which results very long inference time. For time-critical tasks such as autonomous driving and virtual reality, real-time processi… ▽ More In recent years, convolutional neural network has gained popularity in many engineering applications especially for computer vision. In order to achieve better performance, often more complex structures and advanced operations are incorporated into the neural networks, which results very long inference time. For time-critical tasks such as autonomous driving and virtual reality, real-time processing is fundamental. In order to reach real-time process speed, a light-weight, high-throughput CNN architecture namely RoadNet-RT is proposed for road segmentation in this paper. It achieves 90.33% MaxF score on test set of KITTI road segmentation task and 8 ms per frame when running on GTX 1080 GPU. Comparing to the state-of-the-art network, RoadNet-RT speeds up the inference time by a factor of 20 at the cost of only 6.2% accuracy loss. For hardware design optimization, several techniques such as depthwise separable convolution and non-uniformed kernel size convolution are customized designed to further reduce the processing time. The proposed CNN architecture has been successfully implemented on an FPGA ZCU102 MPSoC platform that achieves the computation capability of 83.05 GOPS. The system throughput reaches 327.9 frames per second with image size 1216x176. △ Less

Submitted 17 May, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

Journal ref: in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 2, pp. 704-714, Feb. 2021

arXiv:2006.00053 [pdf, other]

A Unified Hardware Architecture for Convolutions and Deconvolutions in CNN

Authors: Lin Bai, Yecheng Lyu, Xinming Huang

Abstract: In this paper, a scalable neural network hardware architecture for image segmentation is proposed. By sharing the same computing resources, both convolution and deconvolution operations are handled by the same process element array. In addition, access to on-chip and off-chip memories is optimized to alleviate the burden introduced by partial sum. As an example, SegNet-Basic has been implemented u… ▽ More In this paper, a scalable neural network hardware architecture for image segmentation is proposed. By sharing the same computing resources, both convolution and deconvolution operations are handled by the same process element array. In addition, access to on-chip and off-chip memories is optimized to alleviate the burden introduced by partial sum. As an example, SegNet-Basic has been implemented using the proposed unified architecture by targeting on Xilinx ZC706 FPGA, which achieves the performance of 151.5 GOPS and 94.3 GOPS for convolution and deconvolution respectively. This unified convolution/deconvolution design is applicable to other CNNs with deconvolution. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: This paper has been accepted by ISCAS 2020

arXiv:2006.00049 [pdf, other]

PointNet on FPGA for Real-Time LiDAR Point Cloud Processing

Authors: Lin Bai, Yecheng Lyu, Xin Xu, Xinming Huang

Abstract: LiDAR sensors have been widely used in many autonomous vehicle modalities, such as perception, map**, and localization. This paper presents an FPGA-based deep learning platform for real-time point cloud processing targeted on autonomous vehicles. The software driver for the Velodyne LiDAR sensor is modified and moved into the on-chip processor system, while the programmable logic is designed as… ▽ More LiDAR sensors have been widely used in many autonomous vehicle modalities, such as perception, map**, and localization. This paper presents an FPGA-based deep learning platform for real-time point cloud processing targeted on autonomous vehicles. The software driver for the Velodyne LiDAR sensor is modified and moved into the on-chip processor system, while the programmable logic is designed as a customized hardware accelerator. As the state-of-art deep learning algorithm for point cloud processing, PointNet is successfully implemented on the proposed FPGA platform. Targeted on a Xilinx Zynq UltraScale+ MPSoC ZCU104 development board, the FPGA implementations of PointNet achieve the computing performance of 182.1 GOPS and 280.0 GOPS for classification and segmentation respectively. The proposed design can support an input up to 4096 points per frame. The processing time is 19.8 ms for classification and 34.6 ms for segmentation, which meets the real-time requirement for most of the existing LiDAR sensors. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: This paper has been accepted by ISCAS 2020

arXiv:2001.06719 [pdf, other]

System-on-Chip Security Assertions

Authors: Yangdi Lyu, Prabhat Mishra

Abstract: Assertions are widely used for functional validation as well as coverage analysis for both software and hardware designs. Assertions enable runtime error detection as well as faster localization of errors. While there is a vast literature on both software and hardware assertions for monitoring functional scenarios, there is limited effort in utilizing assertions to monitor System-on-Chip (SoC) sec… ▽ More Assertions are widely used for functional validation as well as coverage analysis for both software and hardware designs. Assertions enable runtime error detection as well as faster localization of errors. While there is a vast literature on both software and hardware assertions for monitoring functional scenarios, there is limited effort in utilizing assertions to monitor System-on-Chip (SoC) security vulnerabilities. In this paper, we identify common SoC security vulnerabilities by analyzing the design. To monitor these vulnerabilities, we define several classes of assertions to enable runtime checking of security vulnerabilities. Our experimental results demonstrate that the security assertions generated by our proposed approach can detect all the inserted vulnerabilities while the functional assertions generated by state-of-the-art assertion generation techniques fail to detect most of them. △ Less

Submitted 18 January, 2020; originally announced January 2020.

arXiv:2001.00340 [pdf, other]

doi 10.1007/978-3-030-59713-9_15

Encoding Metal Mask Projection for Metal Artifact Reduction in Computed Tomography

Authors: Yuanyuan Lyu, Wei-An Lin, Haofu Liao, **g**g Lu, S. Kevin Zhou

Abstract: Metal artifact reduction (MAR) in computed tomography (CT) is a notoriously challenging task because the artifacts are structured and non-local in the image domain. However, they are inherently local in the sinogram domain. Thus, one possible approach to MAR is to exploit the latter characteristic by learning to reduce artifacts in the sinogram. However, if we directly treat the metal-affected reg… ▽ More Metal artifact reduction (MAR) in computed tomography (CT) is a notoriously challenging task because the artifacts are structured and non-local in the image domain. However, they are inherently local in the sinogram domain. Thus, one possible approach to MAR is to exploit the latter characteristic by learning to reduce artifacts in the sinogram. However, if we directly treat the metal-affected regions in sinogram as missing and replace them with the surrogate data generated by a neural network, the artifact-reduced CT images tend to be over-smoothed and distorted since fine-grained details within the metal-affected regions are completely ignored. In this work, we provide analytical investigation to the issue and propose to address the problem by (1) retaining the metal-affected regions in sinogram and (2) replacing the binarized metal trace with the metal mask projection such that the geometry information of metal implants is encoded. Extensive experiments on simulated datasets and expert evaluations on clinical images demonstrate that our novel network yields anatomically more precise artifact-reduced images than the state-of-the-art approaches, especially when metallic objects are large. △ Less

Submitted 19 July, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

Comments: accepted by MICCAI 2020

arXiv:2001.00339 [pdf, other]

A$^3$DSegNet: Anatomy-aware artifact disentanglement and segmentation network for unpaired segmentation, artifact reduction, and modality translation

Authors: Yuanyuan Lyu, Haofu Liao, Heqin Zhu, S. Kevin Zhou

Abstract: Spinal surgery planning necessitates automatic segmentation of vertebrae in cone-beam computed tomography (CBCT), an intraoperative imaging modality that is widely used in intervention. However, CBCT images are of low-quality and artifact-laden due to noise, poor tissue contrast, and the presence of metallic objects, causing vertebra segmentation, even manually, a demanding task. In contrast, ther… ▽ More Spinal surgery planning necessitates automatic segmentation of vertebrae in cone-beam computed tomography (CBCT), an intraoperative imaging modality that is widely used in intervention. However, CBCT images are of low-quality and artifact-laden due to noise, poor tissue contrast, and the presence of metallic objects, causing vertebra segmentation, even manually, a demanding task. In contrast, there exists a wealth of artifact-free, high quality CT images with vertebra annotations. This motivates us to build a CBCT vertebra segmentation model using unpaired CT images with annotations. To overcome the domain and artifact gaps between CBCT and CT, it is a must to address the three heterogeneous tasks of vertebra segmentation, artifact reduction and modality translation all together. To this, we propose a novel anatomy-aware artifact disentanglement and segmentation network (A$^3$DSegNet) that intensively leverages knowledge sharing of these three tasks to promote learning. Specifically, it takes a random pair of CBCT and CT images as the input and manipulates the synthesis and segmentation via different decoding combinations from the disentangled latent layers. Then, by proposing various forms of consistency among the synthesized images and among segmented vertebrae, the learning is achieved without paired (i.e., anatomically identical) data. Finally, we stack 2D slices together and build 3D networks on top to obtain final 3D segmentation result. Extensive experiments on a large number of clinical CBCT (21,364) and CT (17,089) images show that the proposed A$^3$DSegNet performs significantly better than state-of-the-art competing methods trained independently for each task and, remarkably, it achieves an average Dice coefficient of 0.926 for unpaired 3D CBCT vertebra segmentation. △ Less

Submitted 9 March, 2021; v1 submitted 2 January, 2020; originally announced January 2020.

Comments: Accepted by IPMI 2021

arXiv:1903.02122 [pdf, other]

doi 10.1109/HPEC.2019.8916441

An Interactive LiDAR to Camera Calibration

Authors: Yecheng Lyu, Lin Bai, Mahdi Elhousni, Xinming Huang

Abstract: Recent progress in the automated driving system (ADS) and advanced driver assistant system (ADAS) has shown that the combined use of 3D light detection and ranging (LiDAR) and the camera is essential for an intelligent vehicle to perceive and understand its surroundings. LiDAR-camera fusion requires precise intrinsic and extrinsic calibrations between the sensors. However, due to the limitation of… ▽ More Recent progress in the automated driving system (ADS) and advanced driver assistant system (ADAS) has shown that the combined use of 3D light detection and ranging (LiDAR) and the camera is essential for an intelligent vehicle to perceive and understand its surroundings. LiDAR-camera fusion requires precise intrinsic and extrinsic calibrations between the sensors. However, due to the limitation of the calibration equipment and susceptibility to noise, algorithms in existing methods tend to fail in finding LiDAR-camera correspondences in long-range. In this paper, we introduced an interactive LiDAR to camera calibration toolbox to estimate the intrinsic and extrinsic transforms. This toolbox automatically detects the corner of a planer board from a sequence of LiDAR frames and provides a convenient user interface for annotating the corresponding pixels on camera frames. Since the toolbox only detects the top corner of the board, there is no need to prepare a precise polygon planar board or a checkerboard with different reflectivity areas as in the existing methods. Furthermore, the toolbox uses genetic algorithms to estimate the transforms and supports multiple camera models such as the pinhole camera model and the fisheye camera model. Experiments using Velodyne VLP-16 LiDAR and Point Grey Chameleon 3 camera show robust results. △ Less

Submitted 5 March, 2019; originally announced March 2019.

Comments: Submitted to IV 2019

arXiv:1808.04450 [pdf, other]

Road Segmentation Using CNN and Distributed LSTM

Authors: Yecheng Lyu, Lin Bai, Xinming Huang

Abstract: In automated driving systems (ADS) and advanced driver-assistance systems (ADAS), an efficient road segmentation is necessary to perceive the drivable region and build an occupancy map for path planning. The existing algorithms implement gigantic convolutional neural networks (CNNs) that are computationally expensive and time consuming. In this paper, we introduced distributed LSTM, a neural netwo… ▽ More In automated driving systems (ADS) and advanced driver-assistance systems (ADAS), an efficient road segmentation is necessary to perceive the drivable region and build an occupancy map for path planning. The existing algorithms implement gigantic convolutional neural networks (CNNs) that are computationally expensive and time consuming. In this paper, we introduced distributed LSTM, a neural network widely used in audio and video processing, to process rows and columns in images and feature maps. We then propose a new network combining the convolutional and distributed LSTM layers to solve the road segmentation problem. In the end, the network is trained and tested in KITTI road benchmark. The result shows that the combined structure enhances the feature extraction and processing but takes less processing time than pure CNN structure. △ Less

Submitted 5 March, 2019; v1 submitted 10 August, 2018; originally announced August 2018.

Comments: 6 pages. arXiv admin note: text overlap with arXiv:1804.05164

arXiv:1808.03506 [pdf, other]

doi 10.1109/TCSI.2018.2881162

ChipNet: Real-Time LiDAR Processing for Drivable Region Segmentation on an FPGA

Authors: Yecheng Lyu, Lin Bai, Xinming Huang

Abstract: This paper presents a field-programmable gate array (FPGA) design of a segmentation algorithm based on convolutional neural network (CNN) that can process light detection and ranging (LiDAR) data in real-time. For autonomous vehicles, drivable region segmentation is an essential step that sets up the static constraints for planning tasks. Traditional drivable region segmentation algorithms are mos… ▽ More This paper presents a field-programmable gate array (FPGA) design of a segmentation algorithm based on convolutional neural network (CNN) that can process light detection and ranging (LiDAR) data in real-time. For autonomous vehicles, drivable region segmentation is an essential step that sets up the static constraints for planning tasks. Traditional drivable region segmentation algorithms are mostly developed on camera data, so their performance is susceptible to the light conditions and the qualities of road markings. LiDAR sensors can obtain the 3D geometry information of the vehicle surroundings with high precision. However, it is a computational challenge to process a large amount of LiDAR data in real-time. In this paper, a convolutional neural network model is proposed and trained to perform semantic segmentation using data from the LiDAR sensor. An efficient hardware architecture is proposed and implemented on an FPGA that can process each LiDAR scan in 17.59 ms, which is much faster than the previous works. Evaluated using Ford and KITTI road detection benchmarks, the proposed solution achieves both high accuracy in performance and real-time processing in speed. △ Less

Submitted 5 March, 2019; v1 submitted 10 August, 2018; originally announced August 2018.

Comments: 11 pages. IEEE Transactions on Circuits and Systems I: Regular Papers (2018)

arXiv:1804.05164 [pdf, other]

Road Segmentation Using CNN with GRU

Authors: Yecheng Lyu, Xinming Huang

Abstract: This paper presents an accurate and fast algorithm for road segmentation using convolutional neural network (CNN) and gated recurrent units (GRU). For autonomous vehicles, road segmentation is a fundamental task that can provide the drivable area for path planning. The existing deep neural network based segmentation algorithms usually take a very deep encoder-decoder structure to fuse pixels, whic… ▽ More This paper presents an accurate and fast algorithm for road segmentation using convolutional neural network (CNN) and gated recurrent units (GRU). For autonomous vehicles, road segmentation is a fundamental task that can provide the drivable area for path planning. The existing deep neural network based segmentation algorithms usually take a very deep encoder-decoder structure to fuse pixels, which requires heavy computations, large memory and long processing time. Hereby, a CNN-GRU network model is proposed and trained to perform road segmentation using data captured by the front camera of a vehicle. GRU network obtains a long spatial sequence with lower computational complexity, comparing to traditional encoder-decoder architecture. The proposed road detector is evaluated on the KITTI road benchmark and achieves high accuracy for road segmentation at real-time processing speed. △ Less

Submitted 14 April, 2018; originally announced April 2018.

Comments: submitted to IV2018

Showing 1–36 of 36 results for author: Lyu, Y