Search | arXiv e-print repository

arXiv:2401.16325 [pdf, other]

Making the unmodulated Pyramid wavefront sensor smart. Closed-loop demonstration of neural network wavefront reconstruction with MagAO-X

Authors: Rico Landman, Sebastiaan Haffert, Jared Males, Laird Close, Warren Foster, Kyle Van Gorkom, Olivier Guyon, Alex Hedglen, Maggie Kautz, Jay Kueny, Joseph Long, Jennifer Lumbres, Eden McEwen, Avalon McLeod, Lauren Schatz

Abstract: Almost all current and future high-contrast imaging instruments will use a Pyramid wavefront sensor (PWFS) as a primary or secondary wavefront sensor. The main issue with the PWFS is its nonlinear response to large phase aberrations, especially under strong atmospheric turbulence. Most instruments try to increase its linearity range by using dynamic modulation, but this leads to decreased sensitiv… ▽ More Almost all current and future high-contrast imaging instruments will use a Pyramid wavefront sensor (PWFS) as a primary or secondary wavefront sensor. The main issue with the PWFS is its nonlinear response to large phase aberrations, especially under strong atmospheric turbulence. Most instruments try to increase its linearity range by using dynamic modulation, but this leads to decreased sensitivity, most prominently for low-order modes, and makes it blind to petal-piston modes. In the push toward high-contrast imaging of fainter stars and deeper contrasts, there is a strong interest in using the PWFS in its unmodulated form. Here, we present closed-loop lab results of a nonlinear reconstructor for the unmodulated PWFS of the Magellan Adaptive Optics eXtreme (MagAO-X) system based on convolutional neural networks (CNNs). We show that our nonlinear reconstructor has a dynamic range of >600 nm root-mean-square (RMS), significantly outperforming the linear reconstructor that only has a 50 nm RMS dynamic range. The reconstructor behaves well in closed loop and can obtain >80% Strehl at 875 nm under a large variety of conditions and reaches higher Strehl ratios than the linear reconstructor under all simulated conditions. The CNN reconstructor also achieves the theoretical sensitivity limit of a PWFS, showing that it does not lose its sensitivity in exchange for dynamic range. The current CNN's computational time is 690 microseconds, which enables loop speeds of >1 kHz. On-sky tests are foreseen soon and will be important for pushing future high-contrast imaging instruments toward their limits. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted for publication in A&A

arXiv:2312.11460 [pdf, other]

Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response

Authors: Junfeng Long, Zirui Wang, Quanyi Li, Jiawei Gao, Liu Cao, Jiangmiao Pang

Abstract: Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introdu… ▽ More Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introduce Hybrid Internal Model (HIM) to estimate them according to the response of the robot. The response, which we refer to as the hybrid internal embedding, contains the robot's explicit velocity and implicit stability representation, corresponding to two primary goals for locomotion tasks: explicitly tracking velocity and implicitly maintaining stability. We use contrastive learning to optimize the embedding to be close to the robot's successor state, in which the response is naturally embedded. HIM has several appealing benefits: It only needs the robot's proprioceptions, i.e., those from joint encoders and IMU as observations. It innovatively maintains consistent observations between simulation reference and reality that avoids information loss in mimicking learning. It exploits batch-level information that is more robust to noises and keeps better sample efficiency. It only requires 1 hour of training on an RTX 4090 to enable a quadruped robot to traverse any terrain under any disturbances. A wealth of real-world experiments demonstrates its agility, even in high-difficulty tasks and cases never occurred during the training process, revealing remarkable open-world generalizability. △ Less

Submitted 1 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Use 1 hour to train a quadruped robot capable of traversing any terrain under any disturbances in the open world, Project Page: https://github.com/OpenRobotLab/HIMLoco

arXiv:2309.13585 [pdf, other]

Identification of Ghost Targets for Automotive Radar in the Presence of Multipath

Authors: Le Zheng, Jiamin Long, Marco Lops, Fan Liu, Xueyao Hu

Abstract: Colocated multiple-input multiple-output (MIMO) technology has been widely used in automotive radars as it provides accurate angular estimation of the objects with relatively small number of transmitting and receiving antennas. Since the Direction Of Departure (DOD) and the Direction Of Arrival (DOA) of line-of-sight targets coincide, MIMO signal processing allows forming a larger virtual array fo… ▽ More Colocated multiple-input multiple-output (MIMO) technology has been widely used in automotive radars as it provides accurate angular estimation of the objects with relatively small number of transmitting and receiving antennas. Since the Direction Of Departure (DOD) and the Direction Of Arrival (DOA) of line-of-sight targets coincide, MIMO signal processing allows forming a larger virtual array for angle finding. However, multiple paths im**ing the receiver is a major limiting factor, in that radar signals may bounce off obstacles, creating echoes for which the DOD does not equal the DOA. Thus, in complex scenarios with multiple scatterers, the direct paths of the intended targets may be corrupted by indirect paths from other objects, which leads to inaccurate angle estimation or ghost targets. In this paper, we focus on detecting the presence of ghosts due to multipath by regarding it as the problem of deciding between a composite hypothesis, ${\cal H}_0$ say, that the observations only contain an unknown number of direct paths sharing the same (unknown) DOD's and DOA's, and a composite alternative, ${\cal H}_1$ say, that the observations also contain an unknown number of indirect paths, for which DOD's and DOA's do not coincide. We exploit the Generalized Likelihood Ratio Test (GLRT) philosophy to determine the detector structure, wherein the unknown parameters are replaced by carefully designed estimators. The angles of both the active direct paths and of the multi-paths are indeed estimated through a sparsity-enforced Compressed Sensing (CS) approach with Levenberg-Marquardt (LM) optimization to estimate the angular parameters in the continuous domain. An extensive performance analysis is finally offered in order to validate the proposed solution. △ Less

Submitted 26 September, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

Comments: 13 pages, 10 figures

arXiv:2308.02263 [pdf, other]

Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

Authors: **yu Long, Jetic Gū, Binhao Bai, Zhibo Yang, ** Wei, Junli Li

Abstract: Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels. Transformer based models have recently bested RNN and CNN models in speech enhancement, however at the same time they are much more computationally expensive and require much more high quality training data, which is always hard to come by. In this paper, we pre… ▽ More Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels. Transformer based models have recently bested RNN and CNN models in speech enhancement, however at the same time they are much more computationally expensive and require much more high quality training data, which is always hard to come by. In this paper, we present an improvement for speech enhancement models that maintains the expressiveness of self-attention while significantly reducing model complexity, which we have termed Spectrum Attention Fusion. We carefully construct a convolutional module to replace several self-attention layers in a speech Transformer, allowing the model to more efficiently fuse spectral features. Our proposed model is able to achieve comparable or better results against SOTA models but with significantly smaller parameters (0.58M) on the Voice Bank + DEMAND dataset. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2212.07867 [pdf, other]

Localizing Scan Targets from Human Pose for Autonomous Lung Ultrasound Imaging

Authors: Jianzhi Long, Jicang Cai, Abdullah Al-Battal, Shiwei **, **g Zhang, Dacheng Tao, Truong Nguyen

Abstract: Ultrasound is progressing toward becoming an affordable and versatile solution to medical imaging. With the advent of COVID-19 global pandemic, there is a need to fully automate ultrasound imaging as it requires trained operators in close proximity to patients for a long period of time, therefore increasing risk of infection. In this work, we investigate the important yet seldom-studied problem of… ▽ More Ultrasound is progressing toward becoming an affordable and versatile solution to medical imaging. With the advent of COVID-19 global pandemic, there is a need to fully automate ultrasound imaging as it requires trained operators in close proximity to patients for a long period of time, therefore increasing risk of infection. In this work, we investigate the important yet seldom-studied problem of scan target localization, under the setting of lung ultrasound imaging. We propose a purely vision-based, data driven method that incorporates learning-based computer vision techniques. We combine a human pose estimation model with a specially designed regression model to predict the lung ultrasound scan targets, and deploy multiview stereo vision to enhance the consistency of 3D target localization. While related works mostly focus on phantom experiments, we collect data from 30 human subjects for testing. Our method attains an accuracy level of 16.00(9.79) mm for probe positioning and 4.44(3.75) degree for probe orientation, with a success rate above 80% under an error threshold of 25mm for all scan targets. Moreover, our approach can serve as a general solution to other types of ultrasound modalities. The code for implementation has been released. △ Less

Submitted 25 February, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: v2 2023/02/25

ACM Class: I.4.9

arXiv:2209.10218 [pdf, other]

HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification

Authors: Xiangzuo Huo, Gang Sun, Shengwei Tian, Yan Wang, Long Yu, Jun Long, Wendong Zhang, Aolun Li

Abstract: Medical image classification has developed rapidly under the impetus of the convolutional neural network (CNN). Due to the fixed size of the receptive field of the convolution kernel, it is difficult to capture the global features of medical images. Although the self-attention-based Transformer can model long-range dependencies, it has high computational complexity and lacks local inductive bias.… ▽ More Medical image classification has developed rapidly under the impetus of the convolutional neural network (CNN). Due to the fixed size of the receptive field of the convolution kernel, it is difficult to capture the global features of medical images. Although the self-attention-based Transformer can model long-range dependencies, it has high computational complexity and lacks local inductive bias. Much research has demonstrated that global and local features are crucial for image classification. However, medical images have a lot of noisy, scattered features, intra-class variation, and inter-class similarities. This paper proposes a three-branch hierarchical multi-scale feature fusion network structure termed as HiFuse for medical image classification as a new method. It can fuse the advantages of Transformer and CNN from multi-scale hierarchies without destroying the respective modeling so as to improve the classification accuracy of various medical images. A parallel hierarchy of local and global feature blocks is designed to efficiently extract local features and global representations at various semantic scales, with the flexibility to model at different scales and linear computational complexity relevant to image size. Moreover, an adaptive hierarchical feature fusion block (HFF block) is designed to utilize the features obtained at different hierarchical levels comprehensively. The HFF block contains spatial attention, channel attention, residual inverted MLP, and shortcut to adaptively fuse semantic information between various scale features of each branch. The accuracy of our proposed model on the ISIC2018 dataset is 7.6% higher than baseline, 21.5% on the Covid-19 dataset, and 10.4% on the Kvasir dataset. Compared with other advanced models, the HiFuse model performs the best. Our code is open-source and available from https://github.com/huoxiangzuo/HiFuse. △ Less

Submitted 21 September, 2022; originally announced September 2022.

arXiv:2206.13632 [pdf, other]

Omni-Seg: A Scale-aware Dynamic Network for Renal Pathological Image Segmentation

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jun Long, Zuhayr Asad, R. Michael Womick, Zheyu Zhu, Agnes B. Fogo, Shilin Zhao, Haichun Yang, Yuankai Huo

Abstract: Comprehensive semantic segmentation on renal pathological images is challenging due to the heterogeneous scales of the objects. For example, on a whole slide image (WSI), the cross-sectional areas of glomeruli can be 64 times larger than that of the peritubular capillaries, making it impractical to segment both objects on the same patch, at the same scale. To handle this scaling issue, prior studi… ▽ More Comprehensive semantic segmentation on renal pathological images is challenging due to the heterogeneous scales of the objects. For example, on a whole slide image (WSI), the cross-sectional areas of glomeruli can be 64 times larger than that of the peritubular capillaries, making it impractical to segment both objects on the same patch, at the same scale. To handle this scaling issue, prior studies have typically trained multiple segmentation networks in order to match the optimal pixel resolution of heterogeneous tissue types. This multi-network solution is resource-intensive and fails to model the spatial relationship between tissue types. In this paper, we propose the Omni-Seg+ network, a scale-aware dynamic neural network that achieves multi-object (six tissue types) and multi-scale (5X to 40X scale) pathological image segmentation via a single neural network. The contribution of this paper is three-fold: (1) a novel scale-aware controller is proposed to generalize the dynamic neural network from single-scale to multi-scale; (2) semi-supervised consistency regularization of pseudo-labels is introduced to model the inter-scale correlation of unannotated tissue types into a single end-to-end learning paradigm; and (3) superior scale-aware generalization is evidenced by directly applying a model trained on human kidney images to mouse kidney images, without retraining. By learning from ~150,000 human pathological image patches from six tissue types at three different resolutions, our approach achieved superior segmentation performance according to human visual assessment and evaluation of image-omics (i.e., spatial transcriptomics). The official implementation is available at https://github.com/ddrrnn123/Omni-Seg. △ Less

Submitted 18 January, 2023; v1 submitted 27 June, 2022; originally announced June 2022.

arXiv:2205.07554 [pdf, other]

doi 10.1051/0004-6361/202243311

Towards on-sky adaptive optics control using reinforcement learning

Authors: J. Nousiainen, C. Rajani, M. Kasper, T. Helin, S. Y. Haffert, C. Vérinaud, J. R. Males, K. Van Gorkom, L. M. Close, J. D. Long, A. D. Hedglen, O. Guyon, L. Schatz, M. Kautz, J. Lumbres, A. Rodack, J. M. Knight, K. Miller

Abstract: The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the… ▽ More The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the habitable exoplanets are located at small angular separations from their host stars, where the current XAO systems' control laws leave strong residuals.Current AO control strategies like static matrix-based wavefront reconstruction and integrator control suffer from temporal delay error and are sensitive to mis-registration, i.e., to dynamic variations of the control system geometry. We aim to produce control methods that cope with these limitations, provide a significantly improved AO correction and, therefore, reduce the residual flux in the coronagraphic point spread function. We extend previous work in Reinforcement Learning for AO. The improved method, called PO4AO, learns a dynamics model and optimizes a control neural network, called a policy. We introduce the method and study it through numerical simulations of XAO with Pyramid wavefront sensing for the 8-m and 40-m telescope aperture cases. We further implemented PO4AO and carried out experiments in a laboratory environment using MagAO-X at the Steward laboratory. PO4AO provides the desired performance by improving the coronagraphic contrast in numerical simulations by factors 3-5 within the control region of DM and Pyramid WFS, in simulation and in the laboratory. The presented method is also quick to train, i.e., on timescales of typically 5-10 seconds, and the inference time is sufficiently small (< ms) to be used in real-time control for XAO with currently available hardware even for extremely large telescopes. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Journal ref: A&A 664, A71 (2022)

arXiv:2202.02833 [pdf, other]

CheXstray: Real-time Multi-Modal Data Concordance for Drift Detection in Medical Imaging AI

Authors: Arjun Soin, Jameson Merkow, ** Long, Joseph Paul Cohen, Smitha Saligrama, Stephen Kaiser, Steven Borg, Ivan Tarapov, Matthew P Lungren

Abstract: Clinical Artificial lntelligence (AI) applications are rapidly expanding worldwide, and have the potential to impact to all areas of medical practice. Medical imaging applications constitute a vast majority of approved clinical AI applications. Though healthcare systems are eager to adopt AI solutions a fundamental question remains: \textit{what happens after the AI model goes into production?} We… ▽ More Clinical Artificial lntelligence (AI) applications are rapidly expanding worldwide, and have the potential to impact to all areas of medical practice. Medical imaging applications constitute a vast majority of approved clinical AI applications. Though healthcare systems are eager to adopt AI solutions a fundamental question remains: \textit{what happens after the AI model goes into production?} We use the CheXpert and PadChest public datasets to build and test a medical imaging AI drift monitoring workflow to track data and model drift without contemporaneous ground truth. We simulate drift in multiple experiments to compare model performance with our novel multi-modal drift metric, which uses DICOM metadata, image appearance representation from a variational autoencoder (VAE), and model output probabilities as input. Through experimentation, we demonstrate a strong proxy for ground truth performance using unsupervised distributional shifts in relevant metadata, predicted probabilities, and VAE latent representation. Our key contributions include (1) proof-of-concept for medical imaging drift detection that includes the use of VAE and domain specific statistical methods, (2) a multi-modal methodology to measure and unify drift metrics, (3) new insights into the challenges and solutions to observe deployed medical imaging AI, and (4) creation of open-source tools that enable others to easily run their own workflows and scenarios. This work has important implications. It addresses the concerning translation gap found in continuous medical imaging AI model monitoring common in dynamic healthcare environments. △ Less

Submitted 17 March, 2022; v1 submitted 6 February, 2022; originally announced February 2022.

Comments: Added code url

arXiv:2102.01678 [pdf, other]

doi 10.1109/TMI.2021.3101985

Learning domain-agnostic visual representation for computational pathology using medically-irrelevant style transfer augmentation

Authors: Rikiya Yamashita, ** Long, Snikitha Banda, Jeanne Shen, Daniel L. Rubin

Abstract: Suboptimal generalization of machine learning models on unseen data is a key challenge which hampers the clinical applicability of such models to medical imaging. Although various methods such as domain adaptation and domain generalization have evolved to combat this challenge, learning robust and generalizable representations is core to medical image understanding, and continues to be a problem.… ▽ More Suboptimal generalization of machine learning models on unseen data is a key challenge which hampers the clinical applicability of such models to medical imaging. Although various methods such as domain adaptation and domain generalization have evolved to combat this challenge, learning robust and generalizable representations is core to medical image understanding, and continues to be a problem. Here, we propose STRAP (Style TRansfer Augmentation for histoPathology), a form of data augmentation based on random style transfer from non-medical style source such as artistic paintings, for learning domain-agnostic visual representations in computational pathology. Style transfer replaces the low-level texture content of an image with the uninformative style of randomly selected style source image, while preserving the original high-level semantic content. This improves robustness to domain shift and can be used as a simple yet powerful tool for learning domain-agnostic representations. We demonstrate that STRAP leads to state-of-the-art performance, particularly in the presence of domain shifts, on two particular classification tasks in computational pathology. △ Less

Submitted 3 June, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

arXiv:2001.08651 [pdf, other]

Tensor-Based Grading: A Novel Patch-Based Grading Approach for the Analysis of Deformation Fields in Huntington's Disease

Authors: Kilian Hett, Hans Johnson, Pierrick Coupé, Jane Paulsen, Jeffrey Long, Ipek Oguz

Abstract: The improvements in magnetic resonance imaging have led to the development of numerous techniques to better detect structural alterations caused by neurodegenerative diseases. Among these, the patch-based grading framework has been proposed to model local patterns of anatomical changes. This approach is attractive because of its low computational cost and its competitive performance. Other studies… ▽ More The improvements in magnetic resonance imaging have led to the development of numerous techniques to better detect structural alterations caused by neurodegenerative diseases. Among these, the patch-based grading framework has been proposed to model local patterns of anatomical changes. This approach is attractive because of its low computational cost and its competitive performance. Other studies have proposed to analyze the deformations of brain structures using tensor-based morphometry, which is a highly interpretable approach. In this work, we propose to combine the advantages of these two approaches by extending the patch-based grading framework with a new tensor-based grading method that enables us to model patterns of local deformation using a log-Euclidean metric. We evaluate our new method in a study of the putamen for the classification of patients with pre-manifest Huntington's disease and healthy controls. Our experiments show a substantial increase in classification accuracy (87.5 $\pm$ 0.5 vs. 81.3 $\pm$ 0.6) compared to the existing patch-based grading methods, and a good complement to putamen volume, which is a primary imaging-based marker for the study of Huntington's disease. △ Less

Submitted 23 January, 2020; originally announced January 2020.

Journal ref: IEEE ISBI 2020: International Symposium on Biomedical Imaging, Apr 2020, Iowa City, United States

Showing 1–11 of 11 results for author: Long, J