Search | arXiv e-print repository

Kubernetes Deployment Options for On-Prem Clusters

Authors: Lincoln Bryant, Robert W. Gardner, Feng** Hu, David Jordan, Ryan P. Taylor

Abstract: Over the last decade, the Kubernetes container orchestration platform has become essential to many scientific workflows. Despite its popularity, deploying a production-ready Kubernetes cluster on-premises can be challenging for system administrators. Many of the proprietary integrations that application developers take for granted in commercial cloud environments must be replaced with alternatives… ▽ More Over the last decade, the Kubernetes container orchestration platform has become essential to many scientific workflows. Despite its popularity, deploying a production-ready Kubernetes cluster on-premises can be challenging for system administrators. Many of the proprietary integrations that application developers take for granted in commercial cloud environments must be replaced with alternatives when deployed locally. This article will compare three popular deployment strategies for sites deploying Kubernetes on-premise: Kubeadm with Kubespray, OpenShift / OKD and Rancher via K3S/RKE2. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2405.09549 [pdf, other]

Deep-learning-based clustering of OCT images for biomarker discovery in age-related macular degeneration (Pinnacle study report 4)

Authors: Robbie Holland, Rebecca Kaye, Ahmed M. Hagag, Oliver Leingang, Thomas R. P. Taylor, Hrvoje Bogunović, Ursula Schmidt-Erfurth, Hendrik P. N. Scholl, Daniel Rueckert, Andrew J. Lotery, Sobha Sivaprasad, Martin J. Menten

Abstract: Diseases are currently managed by grading systems, where patients are stratified by grading systems into stages that indicate patient risk and guide clinical management. However, these broad categories typically lack prognostic value, and proposals for new biomarkers are currently limited to anecdotal observations. In this work, we introduce a deep-learning-based biomarker proposal system for the… ▽ More Diseases are currently managed by grading systems, where patients are stratified by grading systems into stages that indicate patient risk and guide clinical management. However, these broad categories typically lack prognostic value, and proposals for new biomarkers are currently limited to anecdotal observations. In this work, we introduce a deep-learning-based biomarker proposal system for the purpose of accelerating biomarker discovery in age-related macular degeneration (AMD). It works by first training a neural network using self-supervised contrastive learning to discover, without any clinical annotations, features relating to both known and unknown AMD biomarkers present in 46,496 retinal optical coherence tomography (OCT) images. To interpret the discovered biomarkers, we partition the images into 30 subsets, termed clusters, that contain similar features. We then conduct two parallel 1.5-hour semi-structured interviews with two independent teams of retinal specialists that describe each cluster in clinical language. Overall, both teams independently identified clearly distinct characteristics in 27 of 30 clusters, of which 23 were related to AMD. Seven were recognised as known biomarkers already used in established grading systems and 16 depicted biomarker combinations or subtypes that are either not yet used in grading systems, were only recently proposed, or were unknown. Clusters separated incomplete from complete retinal atrophy, intraretinal from subretinal fluid and thick from thin choroids, and in simulation outperformed clinically-used grading systems in prognostic value. Overall, contrastive learning enabled the automatic proposal of AMD biomarkers that go beyond the set used by clinically established grading systems. Ultimately, we envision that equip** clinicians with discovery-oriented deep-learning tools can accelerate discovery of novel prognostic biomarkers. △ Less

Submitted 12 March, 2024; originally announced May 2024.

arXiv:2403.17013 [pdf, other]

Temporal-Spatial Processing of Event Camera Data via Delay-Loop Reservoir Neural Network

Authors: Richard Lau, Anthony Tylan-Tyler, Lihan Yao, Rey de Castro Roberto, Robert Taylor, Isaiah Jones

Abstract: This paper describes a temporal-spatial model for video processing with special applications to processing event camera videos. We propose to study a conjecture motivated by our previous study of video processing with delay loop reservoir (DLR) neural network, which we call Temporal-Spatial Conjecture (TSC). The TSC postulates that there is significant information content carried in the temporal r… ▽ More This paper describes a temporal-spatial model for video processing with special applications to processing event camera videos. We propose to study a conjecture motivated by our previous study of video processing with delay loop reservoir (DLR) neural network, which we call Temporal-Spatial Conjecture (TSC). The TSC postulates that there is significant information content carried in the temporal representation of a video signal and that machine learning algorithms would benefit from separate optimization of the spatial and temporal components for intelligent processing. To verify or refute the TSC, we propose a Visual Markov Model (VMM) which decompose the video into spatial and temporal components and estimate the mutual information (MI) of these components. Since computation of video mutual information is complex and time consuming, we use a Mutual Information Neural Network to estimate the bounds of the mutual information. Our result shows that the temporal component carries significant MI compared to that of the spatial component. This finding has often been overlooked in neural network literature. In this paper, we will exploit this new finding to guide our design of a delay-loop reservoir neural network for event camera classification, which results in a 18% improvement on classification accuracy. △ Less

Submitted 12 February, 2024; originally announced March 2024.

Comments: 10 pages, 12 figures, Darpa Distribution Statement A. Approved for public release. Distribution Unlimited

arXiv:2403.08059 [pdf, other]

FluoroSAM: A Language-aligned Foundation Model for X-ray Image Segmentation

Authors: Benjamin D. Killeen, Liam J. Wang, Han Zhang, Mehran Armand, Russell H. Taylor, Dave Dreizin, Greg Osgood, Mathias Unberath

Abstract: Automated X-ray image segmentation would accelerate research and development in diagnostic and interventional precision medicine. Prior efforts have contributed task-specific models capable of solving specific image analysis problems, but the utility of these models is restricted to their particular task domain, and expanding to broader use requires additional data, labels, and retraining efforts.… ▽ More Automated X-ray image segmentation would accelerate research and development in diagnostic and interventional precision medicine. Prior efforts have contributed task-specific models capable of solving specific image analysis problems, but the utility of these models is restricted to their particular task domain, and expanding to broader use requires additional data, labels, and retraining efforts. Recently, foundation models (FMs) -- machine learning models trained on large amounts of highly variable data thus enabling broad applicability -- have emerged as promising tools for automated image analysis. Existing FMs for medical image analysis focus on scenarios and modalities where objects are clearly defined by visually apparent boundaries, such as surgical tool segmentation in endoscopy. X-ray imaging, by contrast, does not generally offer such clearly delineated boundaries or structure priors. During X-ray image formation, complex 3D structures are projected in transmission onto the imaging plane, resulting in overlap** features of varying opacity and shape. To pave the way toward an FM for comprehensive and automated analysis of arbitrary medical X-ray images, we develop FluoroSAM, a language-aligned variant of the Segment-Anything Model, trained from scratch on 1.6M synthetic X-ray images. FluoroSAM is trained on data including masks for 128 organ types and 464 non-anatomical objects, such as tools and implants. In real X-ray images of cadaveric specimens, FluoroSAM is able to segment bony anatomical structures based on text-only prompting with 0.51 and 0.79 DICE with point-based refinement, outperforming competing SAM variants for all structures. FluoroSAM is also capable of zero-shot generalization to segmenting classes beyond the training set thanks to its language alignment, which we demonstrate for full lung segmentation on real chest X-rays. △ Less

Submitted 27 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.18088 [pdf, other]

Bimanual Manipulation of Steady Hand Eye Robots with Adaptive Sclera Force Control: Cooperative vs. Teleoperation Strategies

Authors: Mojtaba Esfandiari, Golchehr Amirkhani, Peter Gehlbach, Russell H. Taylor, Iulian Iordachita

Abstract: Performing intricate eye microsurgery, such as retinal vein cannulation (RVC), as a potential treatment for retinal vein occlusion (RVO), without the assistance of a surgical robotic system is very challenging to do safely. The main limitation has to do with the physiological hand tremor of surgeons. Robot-assisted eye surgery technology may resolve the problems of hand tremors and fatigue and imp… ▽ More Performing intricate eye microsurgery, such as retinal vein cannulation (RVC), as a potential treatment for retinal vein occlusion (RVO), without the assistance of a surgical robotic system is very challenging to do safely. The main limitation has to do with the physiological hand tremor of surgeons. Robot-assisted eye surgery technology may resolve the problems of hand tremors and fatigue and improve the safety and precision of RVC. The Steady-Hand Eye Robot (SHER) is an admittance-based robotic system that can filter out hand tremors and enables ophthalmologists to manipulate a surgical instrument inside the eye cooperatively. However, the admittance-based cooperative control mode does not address crucial safety considerations, such as minimizing contact force between the surgical instrument and the sclera surface to prevent tissue damage. An adaptive sclera force control algorithm was proposed to address this limitation using an FBG-based force-sensing tool to measure and minimize the tool-sclera interaction force. Additionally, features like haptic feedback or hand motion scaling, which can improve the safety and precision of surgery, require a teleoperation control framework. We implemented a bimanual adaptive teleoperation (BMAT) control mode using SHER 2.0 and SHER 2.1 and compared its performance with a bimanual adaptive cooperative (BMAC) mode. Both BMAT and BMAC modes were tested in sitting and standing postures during a vessel-following experiment under a surgical microscope. It is shown, for the first time to the best of our knowledge in robot-assisted retinal surgery, that integrating the adaptive sclera force control algorithm with the bimanual teleoperation framework enables surgeons to safely perform bimanual telemanipulation of the eye without over-stretching it, even in the absence of registration between the two robots. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.11840 [pdf, other]

An Endoscopic Chisel: Intraoperative Imaging Carves 3D Anatomical Models

Authors: Jan Emily Mangulabnan, Roger D. Soberanis-Mukul, Timo Teufel, Manish Sahu, Jose L. Porras, S. Swaroop Vedula, Masaru Ishii, Gregory Hager, Russell H. Taylor, Mathias Unberath

Abstract: Purpose: Preoperative imaging plays a pivotal role in sinus surgery where CTs offer patient-specific insights of complex anatomy, enabling real-time intraoperative navigation to complement endoscopy imaging. However, surgery elicits anatomical changes not represented in the preoperative model, generating an inaccurate basis for navigation during surgery progression. Methods: We propose a first v… ▽ More Purpose: Preoperative imaging plays a pivotal role in sinus surgery where CTs offer patient-specific insights of complex anatomy, enabling real-time intraoperative navigation to complement endoscopy imaging. However, surgery elicits anatomical changes not represented in the preoperative model, generating an inaccurate basis for navigation during surgery progression. Methods: We propose a first vision-based approach to update the preoperative 3D anatomical model leveraging intraoperative endoscopic video for navigated sinus surgery where relative camera poses are known. We rely on comparisons of intraoperative monocular depth estimates and preoperative depth renders to identify modified regions. The new depths are integrated in these regions through volumetric fusion in a truncated signed distance function representation to generate an intraoperative 3D model that reflects tissue manipulation. Results: We quantitatively evaluate our approach by sequentially updating models for a five-step surgical progression in an ex vivo specimen. We compute the error between correspondences from the updated model and ground-truth intraoperative CT in the region of anatomical modification. The resulting models show a decrease in error during surgical progression as opposed to increasing when no update is employed. Conclusion: Our findings suggest that preoperative 3D anatomical models can be updated using intraoperative endoscopy video in navigated sinus surgery. Future work will investigate improvements to monocular depth estimation as well as removing the need for external navigation systems. The resulting ability to continuously update the patient model may provide surgeons with a more precise understanding of the current anatomical state and paves the way toward a digital twin paradigm for sinus surgery. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2401.11721 [pdf, other]

Beyond the Manual Touch: Situational-aware Force Control for Increased Safety in Robot-assisted Skullbase Surgery

Authors: Hisashi Ishida, Deepa Galaiya, Nimesh Nagururu, Francis Creighton, Peter Kazanzides, Russell Taylor, Manish Sahu

Abstract: Purpose - Skullbase surgery demands exceptional precision when removing bone in the lateral skull base. Robotic assistance can alleviate the effect of human sensory-motor limitations. However, the stiffness and inertia of the robot can significantly impact the surgeon's perception and control of the tool-to-tissue interaction forces. Methods - We present a situational-aware, force control techniqu… ▽ More Purpose - Skullbase surgery demands exceptional precision when removing bone in the lateral skull base. Robotic assistance can alleviate the effect of human sensory-motor limitations. However, the stiffness and inertia of the robot can significantly impact the surgeon's perception and control of the tool-to-tissue interaction forces. Methods - We present a situational-aware, force control technique aimed at regulating interaction forces during robot-assisted skullbase drilling. The contextual interaction information derived from the digital twin environment is used to enhance sensory perception and suppress undesired high forces. Results - To validate our approach, we conducted initial feasibility experiments involving a medical and two engineering students. The experiment focused on further drilling around critical structures following cortical mastoidectomy. The experiment results demonstrate that robotic assistance coupled with our proposed control scheme effectively limited undesired interaction forces when compared to robotic assistance without the proposed force control. Conclusions - The proposed force control techniques show promise in significantly reducing undesired interaction forces during robot-assisted skullbase surgery. These findings contribute to the ongoing efforts to enhance surgical precision and safety in complex procedures involving the lateral skull base. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: *These authors contributed equally to this work

arXiv:2401.11715 [pdf, other]

Integrating 3D Slicer with a Dynamic Simulator for Situational Aware Robotic Interventions

Authors: Manish Sahu, Hisashi Ishida, Laura Connolly, Hongyi Fan, Anton Deguet, Peter Kazanzides, Francis X. Creighton, Russell H. Taylor, Adnan Munawar

Abstract: Image-guided robotic interventions represent a transformative frontier in surgery, blending advanced imaging and robotics for improved precision and outcomes. This paper addresses the critical need for integrating open-source platforms to enhance situational awareness in image-guided robotic research. We present an open-source toolset that seamlessly combines a physics-based constraint formulation… ▽ More Image-guided robotic interventions represent a transformative frontier in surgery, blending advanced imaging and robotics for improved precision and outcomes. This paper addresses the critical need for integrating open-source platforms to enhance situational awareness in image-guided robotic research. We present an open-source toolset that seamlessly combines a physics-based constraint formulation framework, AMBF, with a state-of-the-art imaging platform application, 3D Slicer. Our toolset facilitates the creation of highly customizable interactive digital twins, that incorporates processing and visualization of medical imaging, robot kinematics, and scene dynamics for real-time robot control. Through a feasibility study, we showcase real-time synchronization of a physical robotic interventional environment in both 3D Slicer and AMBF, highlighting low-latency updates and improved visualization. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: *These authors contributed equally

arXiv:2401.11709 [pdf, other]

Haptic-Assisted Collaborative Robot Framework for Improved Situational Awareness in Skull Base Surgery

Authors: Hisashi Ishida, Manish Sahu, Adnan Munawar, Nimesh Nagururu, Deepa Galaiya, Peter Kazanzides, Francis X. Creighton, Russell H. Taylor

Abstract: Skull base surgery is a demanding field in which surgeons operate in and around the skull while avoiding critical anatomical structures including nerves and vasculature. While image-guided surgical navigation is the prevailing standard, limitation still exists requiring personalized planning and recognizing the irreplaceable role of a skilled surgeon. This paper presents a collaboratively controll… ▽ More Skull base surgery is a demanding field in which surgeons operate in and around the skull while avoiding critical anatomical structures including nerves and vasculature. While image-guided surgical navigation is the prevailing standard, limitation still exists requiring personalized planning and recognizing the irreplaceable role of a skilled surgeon. This paper presents a collaboratively controlled robotic system tailored for assisted drilling in skull base surgery. Our central hypothesis posits that this collaborative system, enriched with haptic assistive modes to enforce virtual fixtures, holds the potential to significantly enhance surgical safety, streamline efficiency, and alleviate the physical demands on the surgeon. The paper describes the intricate system development work required to enable these virtual fixtures through haptic assistive modes. To validate our system's performance and effectiveness, we conducted initial feasibility experiments involving a medical student and two experienced surgeons. The experiment focused on drilling around critical structures following cortical mastoidectomy, utilizing dental stone phantom and cadaveric models. Our experimental results demonstrate that our proposed haptic feedback mechanism enhances the safety of drilling around critical structures compared to systems lacking haptic assistance. With the aid of our system, surgeons were able to safely skeletonize the critical structures without breaching any critical structure even under obstructed view of the surgical site. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: *These authors contributed equally

arXiv:2312.10309 [pdf, other]

Enabling Mammography with Co-Robotic Ultrasound

Authors: Yuxin Chen, Yifan Yin, Julian Brown, Kevin Wang, Yi Wang, Ziyi Wang, Russell H. Taylor, Yixuan Wu, Emad M. Boctor

Abstract: Ultrasound (US) imaging is a vital adjunct to mammography in breast cancer screening and diagnosis, but its reliance on hand-held transducers often lacks repeatability and heavily depends on sonographers' skills. Integrating US systems from different vendors further complicates clinical standards and workflows. This research introduces a co-robotic US platform for repeatable, accurate, and vendor-… ▽ More Ultrasound (US) imaging is a vital adjunct to mammography in breast cancer screening and diagnosis, but its reliance on hand-held transducers often lacks repeatability and heavily depends on sonographers' skills. Integrating US systems from different vendors further complicates clinical standards and workflows. This research introduces a co-robotic US platform for repeatable, accurate, and vendor-independent breast US image acquisition. The platform can autonomously perform 3D volume scans or swiftly acquire real-time 2D images of suspicious lesions. Utilizing a Universal Robot UR5 with an RGB camera, a force sensor, and an L7-4 linear array transducer, the system achieves autonomous navigation, motion control, and image acquisition. The calibrations, including camera-mammogram, robot-camera, and robot-US, were rigorously conducted and validated. Governed by a PID force control, the robot-held transducer maintains a constant contact force with the compression plate during the scan for safety and patient comfort. The framework was validated on a lesion-mimicking phantom. Our results indicate that the developed co-robotic US platform promises to enhance the precision and repeatability of breast cancer screening and diagnosis. Additionally, the platform offers straightforward integration into most mammographic devices to ensure vendor-independence. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.01631 [pdf, other]

Cooperative vs. Teleoperation Control of the Steady Hand Eye Robot with Adaptive Sclera Force Control: A Comparative Study

Authors: Mojtaba Esfandiari, Ji Woong Kim, Botao Zhao, Golchehr Amirkhani, Muhammad Hadi, Peter Gehlbach, Russell H. Taylor, Iulian Iordachita

Abstract: A surgeon's physiological hand tremor can significantly impact the outcome of delicate and precise retinal surgery, such as retinal vein cannulation (RVC) and epiretinal membrane peeling. Robot-assisted eye surgery technology provides ophthalmologists with advanced capabilities such as hand tremor cancellation, hand motion scaling, and safety constraints that enable them to perform these otherwise… ▽ More A surgeon's physiological hand tremor can significantly impact the outcome of delicate and precise retinal surgery, such as retinal vein cannulation (RVC) and epiretinal membrane peeling. Robot-assisted eye surgery technology provides ophthalmologists with advanced capabilities such as hand tremor cancellation, hand motion scaling, and safety constraints that enable them to perform these otherwise challenging and high-risk surgeries with high precision and safety. Steady-Hand Eye Robot (SHER) with cooperative control mode can filter out surgeon's hand tremor, yet another important safety feature, that is, minimizing the contact force between the surgical instrument and sclera surface for avoiding tissue damage cannot be met in this control mode. Also, other capabilities, such as hand motion scaling and haptic feedback, require a teleoperation control framework. In this work, for the first time, we implemented a teleoperation control mode incorporated with an adaptive sclera force control algorithm using a PHANTOM Omni haptic device and a force-sensing surgical instrument equipped with Fiber Bragg Grating (FBG) sensors attached to the SHER 2.1 end-effector. This adaptive sclera force control algorithm allows the robot to dynamically minimize the tool-sclera contact force. Moreover, for the first time, we compared the performance of the proposed adaptive teleoperation mode with the cooperative mode by conducting a vessel-following experiment inside an eye phantom under a microscope. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2310.14364 [pdf, other]

A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video

Authors: Jan Emily Mangulabnan, Roger D. Soberanis-Mukul, Timo Teufel, Isabela Hernández, Jonas Winter, Manish Sahu, Jose L. Porras, S. Swaroop Vedula, Masaru Ishii, Gregory Hager, Russell H. Taylor, Mathias Unberath

Abstract: Generating accurate 3D reconstructions from endoscopic video is a promising avenue for longitudinal radiation-free analysis of sinus anatomy and surgical outcomes. Several methods for monocular reconstruction have been proposed, yielding visually pleasant 3D anatomical structures by retrieving relative camera poses with structure-from-motion-type algorithms and fusion of monocular depth estimates.… ▽ More Generating accurate 3D reconstructions from endoscopic video is a promising avenue for longitudinal radiation-free analysis of sinus anatomy and surgical outcomes. Several methods for monocular reconstruction have been proposed, yielding visually pleasant 3D anatomical structures by retrieving relative camera poses with structure-from-motion-type algorithms and fusion of monocular depth estimates. However, due to the complex properties of the underlying algorithms and endoscopic scenes, the reconstruction pipeline may perform poorly or fail unexpectedly. Further, acquiring medical data conveys additional challenges, presenting difficulties in quantitatively benchmarking these models, understanding failure cases, and identifying critical components that contribute to their precision. In this work, we perform a quantitative analysis of a self-supervised approach for sinus reconstruction using endoscopic sequences paired with optical tracking and high-resolution computed tomography acquired from nine ex-vivo specimens. Our results show that the generated reconstructions are in high agreement with the anatomy, yielding an average point-to-mesh error of 0.91 mm between reconstructions and CT segmentations. However, in a point-to-point matching scenario, relevant for endoscope tracking and navigation, we found average target registration errors of 6.58 mm. We identified that pose and depth estimation inaccuracies contribute equally to this error and that locally consistent sequences with shorter trajectories generate more accurate reconstructions. These results suggest that achieving global consistency between relative camera poses and estimated depths with the anatomy is essential. In doing so, we can ensure proper synergy between all components of the pipeline for improved reconstructions that will facilitate clinical application of this innovative technology. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.12209 [pdf, other]

Fast Parameter Inference on Pulsar Timing Arrays with Normalizing Flows

Authors: David Shih, Marat Freytsis, Stephen R. Taylor, Jeff A. Dror, Nolan Smyth

Abstract: Pulsar timing arrays (PTAs) perform Bayesian posterior inference with expensive MCMC methods. Given a dataset of ~10-100 pulsars and O(10^3) timing residuals each, producing a posterior distribution for the stochastic gravitational wave background (SGWB) can take days to a week. The computational bottleneck arises because the likelihood evaluation required for MCMC is extremely costly when conside… ▽ More Pulsar timing arrays (PTAs) perform Bayesian posterior inference with expensive MCMC methods. Given a dataset of ~10-100 pulsars and O(10^3) timing residuals each, producing a posterior distribution for the stochastic gravitational wave background (SGWB) can take days to a week. The computational bottleneck arises because the likelihood evaluation required for MCMC is extremely costly when considering the dimensionality of the search space. Fortunately, generating simulated data is fast, so modern simulation-based inference techniques can be brought to bear on the problem. In this paper, we demonstrate how conditional normalizing flows trained on simulated data can be used for extremely fast and accurate estimation of the SGWB posteriors, reducing the sampling time from weeks to a matter of seconds. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 8 pages, 3 figures

arXiv:2309.03356 [pdf, other]

doi 10.1109/TMRB.2023.3336975

Steady-Hand Eye Robot 3.0: Optimization and Benchtop Evaluation for Subretinal Injection

Authors: Alireza Alamdar, David E. Usevitch, Jiahao Wu, Russell H. Taylor, Peter Gehlbach, Iulian Iordachita

Abstract: Subretinal injection methods and other procedures for treating retinal conditions and diseases (many considered incurable) have been limited in scope due to limited human motor control. This study demonstrates the next generation, cooperatively controlled Steady-Hand Eye Robot (SHER 3.0), a precise and intuitive-to-use robotic platform achieving clinical standards for targeting accuracy and resolu… ▽ More Subretinal injection methods and other procedures for treating retinal conditions and diseases (many considered incurable) have been limited in scope due to limited human motor control. This study demonstrates the next generation, cooperatively controlled Steady-Hand Eye Robot (SHER 3.0), a precise and intuitive-to-use robotic platform achieving clinical standards for targeting accuracy and resolution for subretinal injections. The system design and basic kinematics are reported and a deflection model for the incorporated delta stage and validation experiments are presented. This model optimizes the delta stage parameters, maximizing the global conditioning index and minimizing torsional compliance. Five tests measuring accuracy, repeatability, and deflection show the optimized stage design achieves a tip accuracy of <30 $μ$m, tip repeatability of 9.3 $μ$m and 0.02°, and deflections between 20-350 $μ$m/N. Future work will use updated control models to refine tip positioning outcomes and will be tested on in vivo animal models. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2307.09288 [pdf, other]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs. △ Less

Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

arXiv:2306.12590 [pdf]

Arc-to-line frame registration method for ultrasound and photoacoustic image-guided intraoperative robot-assisted laparoscopic prostatectomy

Authors: Hyunwoo Song, Shuojue Yang, Zijian Wu, Hamid Moradi, Russell H. Taylor, ** U. Kang, Septimiu E. Salcudean, Emad M. Boctor

Abstract: Purpose: To achieve effective robot-assisted laparoscopic prostatectomy, the integration of transrectal ultrasound (TRUS) imaging system which is the most widely used imaging modelity in prostate imaging is essential. However, manual manipulation of the ultrasound transducer during the procedure will significantly interfere with the surgery. Therefore, we propose an image co-registration algorithm… ▽ More Purpose: To achieve effective robot-assisted laparoscopic prostatectomy, the integration of transrectal ultrasound (TRUS) imaging system which is the most widely used imaging modelity in prostate imaging is essential. However, manual manipulation of the ultrasound transducer during the procedure will significantly interfere with the surgery. Therefore, we propose an image co-registration algorithm based on a photoacoustic marker method, where the ultrasound / photoacoustic (US/PA) images can be registered to the endoscopic camera images to ultimately enable the TRUS transducer to automatically track the surgical instrument Methods: An optimization-based algorithm is proposed to co-register the images from the two different imaging modalities. The principles of light propagation and an uncertainty in PM detection were assumed in this algorithm to improve the stability and accuracy of the algorithm. The algorithm is validated using the previously developed US/PA image-guided system with a da Vinci surgical robot. Results: The target-registration-error (TRE) is measured to evaluate the proposed algorithm. In both simulation and experimental demonstration, the proposed algorithm achieved a sub-centimeter accuracy which is acceptable in practical clinics. The result is also comparable with our previous approach, and the proposed method can be implemented with a normal white light stereo camera and doesn't require highly accurate localization of the PM. Conclusion: The proposed frame registration algorithm enabled a simple yet efficient integration of commercial US/PA imaging system into laparoscopic surgical setting by leveraging the characteristic properties of acoustic wave propagation and laser excitation, contributing to automated US/PA image-guided surgical intervention applications. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 12 pages, 9 figures

arXiv:2306.03092 [pdf, other]

Neuralangelo: High-Fidelity Neural Surface Reconstruction

Authors: Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H. Taylor, Mathias Unberath, Ming-Yu Liu, Chen-Hsuan Lin

Abstract: Neural surface reconstruction has been shown to be powerful for recovering dense 3D surfaces via image-based neural rendering. However, current methods struggle to recover detailed structures of real-world scenes. To address the issue, we present Neuralangelo, which combines the representation power of multi-resolution 3D hash grids with neural surface rendering. Two key ingredients enable our app… ▽ More Neural surface reconstruction has been shown to be powerful for recovering dense 3D surfaces via image-based neural rendering. However, current methods struggle to recover detailed structures of real-world scenes. To address the issue, we present Neuralangelo, which combines the representation power of multi-resolution 3D hash grids with neural surface rendering. Two key ingredients enable our approach: (1) numerical gradients for computing higher-order derivatives as a smoothing operation and (2) coarse-to-fine optimization on the hash grids controlling different levels of details. Even without auxiliary inputs such as depth, Neuralangelo can effectively recover dense 3D surface structures from multi-view images with fidelity significantly surpassing previous methods, enabling detailed large-scale scene reconstruction from RGB video captures. △ Less

Submitted 12 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: CVPR 2023, project page: https://research.nvidia.com/labs/dir/neuralangelo

arXiv:2304.09285 [pdf, other]

Pelphix: Surgical Phase Recognition from X-ray Images in Percutaneous Pelvic Fixation

Authors: Benjamin D. Killeen, Han Zhang, Jan Mangulabnan, Mehran Armand, Russel H. Taylor, Greg Osgood, Mathias Unberath

Abstract: Surgical phase recognition (SPR) is a crucial element in the digital transformation of the modern operating theater. While SPR based on video sources is well-established, incorporation of interventional X-ray sequences has not yet been explored. This paper presents Pelphix, a first approach to SPR for X-ray-guided percutaneous pelvic fracture fixation, which models the procedure at four levels of… ▽ More Surgical phase recognition (SPR) is a crucial element in the digital transformation of the modern operating theater. While SPR based on video sources is well-established, incorporation of interventional X-ray sequences has not yet been explored. This paper presents Pelphix, a first approach to SPR for X-ray-guided percutaneous pelvic fracture fixation, which models the procedure at four levels of granularity -- corridor, activity, view, and frame value -- simulating the pelvic fracture fixation workflow as a Markov process to provide fully annotated training data. Using added supervision from detection of bony corridors, tools, and anatomy, we learn image representations that are fed into a transformer model to regress surgical phases at the four granularity levels. Our approach demonstrates the feasibility of X-ray-based SPR, achieving an average accuracy of 93.8% on simulated sequences and 67.57% in cadaver across all granularity levels, with up to 88% accuracy for the target corridor in real data. This work constitutes the first step toward SPR for the X-ray domain, establishing an approach to categorizing phases in X-ray-guided surgery, simulating realistic image sequences to enable machine learning model development, and demonstrating that this approach is feasible for the analysis of real procedures. As X-ray-based SPR continues to mature, it will benefit procedures in orthopedic surgery, angiography, and interventional radiology by equip** intelligent surgical systems with situational awareness in the operating room. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2304.08464 [pdf, other]

Applications of Uncalibrated Image Based Visual Servoing in Micro- and Macroscale Robotics

Authors: Yifan Yin, Yutai Wang, Yunpu Zhang, Russell H. Taylor, Balazs P. Vagvolgyi

Abstract: We present a robust markerless image based visual servoing method that enables precision robot control without hand-eye and camera calibrations in 1, 3, and 5 degrees-of-freedom. The system uses two cameras for observing the workspace and a combination of classical image processing algorithms and deep learning based methods to detect features on camera images. The only restriction on the placement… ▽ More We present a robust markerless image based visual servoing method that enables precision robot control without hand-eye and camera calibrations in 1, 3, and 5 degrees-of-freedom. The system uses two cameras for observing the workspace and a combination of classical image processing algorithms and deep learning based methods to detect features on camera images. The only restriction on the placement of the two cameras is that relevant image features must be visible in both views. The system enables precise robot-tool to workspace interactions even when the physical setup is disturbed, for example if cameras are moved or the workspace shifts during manipulation. The usefulness of the visual servoing method is demonstrated and evaluated in two applications: in the calibration of a micro-robotic system that dissects mosquitoes for the automated production of a malaria vaccine, and a macro-scale manipulation system for fastening screws using a UR10 robot. Evaluation results indicate that our image based visual servoing method achieves human-like manipulation accuracy in challenging setups even without camera calibration. △ Less

Submitted 17 April, 2023; originally announced April 2023.

arXiv:2304.02583 [pdf, other]

A force-sensing surgical drill for real-time force feedback in robotic mastoidectomy

Authors: Yuxin Chen, Anna Goodridge, Manish Sahu, Aditi Kishore, Seena Vafaee, Harsha Mohan, Katherina Sapozhnikov, Francis Creighton, Russell Taylor, Deepa Galaiya

Abstract: Purpose: Robotic assistance in otologic surgery can reduce the task load of operating surgeons during the removal of bone around the critical structures in the lateral skull base. However, safe deployment into the anatomical passageways necessitates the development of advanced sensing capabilities to actively limit the interaction forces between the surgical tools and critical anatomy. Methods:… ▽ More Purpose: Robotic assistance in otologic surgery can reduce the task load of operating surgeons during the removal of bone around the critical structures in the lateral skull base. However, safe deployment into the anatomical passageways necessitates the development of advanced sensing capabilities to actively limit the interaction forces between the surgical tools and critical anatomy. Methods: We introduce a surgical drill equipped with a force sensor that is capable of measuring accurate tool-tissue interaction forces to enable force control and feedback to surgeons. The design, calibration and validation of the force-sensing surgical drill mounted on a cooperatively controlled surgical robot are described in this work. Results: The force measurements on the tip of the surgical drill are validated with raw-egg drilling experiments, where a force sensor mounted below the egg serves as ground truth. The average root mean square error (RMSE) for points and path drilling experiments are 41.7 (pm 12.2) mN and 48.3 (pm 13.7) mN respectively. Conclusions: The force-sensing prototype measures forces with sub-millinewton resolution and the results demonstrate that the calibrated force-sensing drill generates accurate force measurements with minimal error compared to the measured drill forces. The development of such sensing capabilities is crucial for the safe use of robotic systems in a clinical context. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: Accepted at IPCAI2023

arXiv:2303.05704 [pdf, other]

A Data-Driven Model with Hysteresis Compensation for I2RIS Robot

Authors: Mojtaba Esfandiari, Yanlin Zhou, Shervin Dehghani, Muhammad Hadi, Adnan Munawar, Henry Phalen, Peter Gehlbach, Russell H. Taylor, Iulian Iordachita

Abstract: Retinal microsurgery is a high-precision surgery performed on an exceedingly delicate tissue. It now requires extensively trained and highly skilled surgeons. Given the restricted range of instrument motion in the confined intraocular space, and also potentially restricting instrument contact with the sclera, snake-like robots may prove to be a promising technology to provide surgeons with greater… ▽ More Retinal microsurgery is a high-precision surgery performed on an exceedingly delicate tissue. It now requires extensively trained and highly skilled surgeons. Given the restricted range of instrument motion in the confined intraocular space, and also potentially restricting instrument contact with the sclera, snake-like robots may prove to be a promising technology to provide surgeons with greater flexibility, dexterity, space access, and positioning accuracy during retinal procedures requiring high precision and advantageous tooltip approach angles, such as retinal vein cannulation and epiretinal membrane peeling. Kinematics modeling of these robots is an essential step toward accurate position control, however, as opposed to conventional manipulators, modeling of these robots does not follow a straightforward method due to their complex mechanical structure and actuation mechanisms. Especially, in wire-driven snake-like robots, the hysteresis problem due to the wire tension condition can have a significant impact on the positioning accuracy of these robots. In this paper, we proposed an experimental kinematics model with a hysteresis compensation algorithm using the probabilistic Gaussian mixture models (GMM) Gaussian mixture regression (GMR) approach. Experimental results on the two-degree-of-freedom (DOF) integrated robotic intraocular snake (I2RIS) show that the proposed model provides 0.4 deg accuracy, which is an overall 60% and 70% of improvement for yaw and pitch degrees of freedom, respectively, compared to a previous model of this robot. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2303.01733 [pdf, other]

Improving Surgical Situational Awareness with Signed Distance Field: A Pilot Study in Virtual Reality

Authors: Hisashi Ishida, Juan Antonio Barragan, Adnan Munawar, Zhaoshuo Li, Andy Ding, Peter Kazanzides, Danielle Trakimas, Francis X. Creighton, Russell H. Taylor

Abstract: The introduction of image-guided surgical navigation (IGSN) has greatly benefited technically demanding surgical procedures by providing real-time support and guidance to the surgeon during surgery. \hi{To develop effective IGSN, a careful selection of the surgical information and the medium to present this information to the surgeon is needed. However, this is not a trivial task due to the broad… ▽ More The introduction of image-guided surgical navigation (IGSN) has greatly benefited technically demanding surgical procedures by providing real-time support and guidance to the surgeon during surgery. \hi{To develop effective IGSN, a careful selection of the surgical information and the medium to present this information to the surgeon is needed. However, this is not a trivial task due to the broad array of available options.} To address this problem, we have developed an open-source library that facilitates the development of multimodal navigation systems in a wide range of surgical procedures relying on medical imaging data. To provide guidance, our system calculates the minimum distance between the surgical instrument and the anatomy and then presents this information to the user through different mechanisms. The real-time performance of our approach is achieved by calculating Signed Distance Fields at initialization from segmented anatomical volumes. Using this framework, we developed a multimodal surgical navigation system to help surgeons navigate anatomical variability in a skull base surgery simulation environment. Three different feedback modalities were explored: visual, auditory, and haptic. To evaluate the proposed system, a pilot user study was conducted in which four clinicians performed mastoidectomy procedures with and without guidance. Each condition was assessed using objective performance and subjective workload metrics. This pilot user study showed improvements in procedural safety without additional time or workload. These results demonstrate our pipeline's successful use case in the context of mastoidectomy. △ Less

Submitted 1 August, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: First two authors contributed equally. 6 pages

Journal ref: International Conference on Intelligent Robots and Systems (IROS) 2023

arXiv:2302.13878 [pdf, other]

Fully Immersive Virtual Reality for Skull-base Surgery: Surgical Training and Beyond

Authors: Adnan Munawar, Zhaoshuo Li, Nimesh Nagururu, Danielle Trakimas, Peter Kazanzides, Russell H. Taylor, Francis X. Creighton

Abstract: Purpose: A virtual reality (VR) system, where surgeons can practice procedures on virtual anatomies, is a scalable and cost-effective alternative to cadaveric training. The fully digitized virtual surgeries can also be used to assess the surgeon's skills using measurements that are otherwise hard to collect in reality. Thus, we present the Fully Immersive Virtual Reality System (FIVRS) for skull-b… ▽ More Purpose: A virtual reality (VR) system, where surgeons can practice procedures on virtual anatomies, is a scalable and cost-effective alternative to cadaveric training. The fully digitized virtual surgeries can also be used to assess the surgeon's skills using measurements that are otherwise hard to collect in reality. Thus, we present the Fully Immersive Virtual Reality System (FIVRS) for skull-base surgery, which combines surgical simulation software with a high-fidelity hardware setup. Methods: FIVRS allows surgeons to follow normal clinical workflows inside the VR environment. FIVRS uses advanced rendering designs and drilling algorithms for realistic bone ablation. A head-mounted display with ergonomics similar to that of surgical microscopes is used to improve immersiveness. Extensive multi-modal data is recorded for post-analysis, including eye gaze, motion, force, and video of the surgery. A user-friendly interface is also designed to ease the learning curve of using FIVRS. Results: We present results from a user study involving surgeons with various levels of expertise. The preliminary data recorded by FIVRS differentiates between participants with different levels of expertise, promising future research on automatic skill assessment. Furthermore, informal feedback from the study participants about the system's intuitiveness and immersiveness was positive. Conclusion: We present FIVRS, a fully immersive VR system for skull-base surgery. FIVRS features a realistic software simulation coupled with modern hardware for improved realism. The system is completely open-source and provides feature-rich data in an industry-standard format. △ Less

Submitted 31 May, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: IPCAI/IJCARS 2023

arXiv:2212.14131 [pdf, other]

TAToo: Vision-based Joint Tracking of Anatomy and Tool for Skull-base Surgery

Authors: Zhaoshuo Li, Hongchao Shu, Ruixing Liang, Anna Goodridge, Manish Sahu, Francis X. Creighton, Russell H. Taylor, Mathias Unberath

Abstract: Purpose: Tracking the 3D motion of the surgical tool and the patient anatomy is a fundamental requirement for computer-assisted skull-base surgery. The estimated motion can be used both for intra-operative guidance and for downstream skill analysis. Recovering such motion solely from surgical videos is desirable, as it is compliant with current clinical workflows and instrumentation. Methods: We… ▽ More Purpose: Tracking the 3D motion of the surgical tool and the patient anatomy is a fundamental requirement for computer-assisted skull-base surgery. The estimated motion can be used both for intra-operative guidance and for downstream skill analysis. Recovering such motion solely from surgical videos is desirable, as it is compliant with current clinical workflows and instrumentation. Methods: We present Tracker of Anatomy and Tool (TAToo). TAToo jointly tracks the rigid 3D motion of patient skull and surgical drill from stereo microscopic videos. TAToo estimates motion via an iterative optimization process in an end-to-end differentiable form. For robust tracking performance, TAToo adopts a probabilistic formulation and enforces geometric constraints on the object level. Results: We validate TAToo on both simulation data, where ground truth motion is available, as well as on anthropomorphic phantom data, where optical tracking provides a strong baseline. We report sub-millimeter and millimeter inter-frame tracking accuracy for skull and drill, respectively, with rotation errors below 1°. We further illustrate how TAToo may be used in a surgical navigation setting. Conclusion: We present TAToo, which simultaneously tracks the surgical tool and the patient anatomy in skull-base surgery. TAToo directly predicts the motion from surgical videos, without the need of any markers. Our results show that the performance of TAToo compares favorably to competing approaches. Future work will include fine-tuning of our depth network to reach a 1 mm clinical accuracy goal desired for surgical applications in the skull base. △ Less

Submitted 16 May, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: IPCAI/IJCARS 2023, code available at: https://github.com/mli0603/TAToo

arXiv:2211.11863 [pdf, other]

Twin-S: A Digital Twin for Skull-base Surgery

Authors: Hongchao Shu, Ruixing Liang, Zhaoshuo Li, Anna Goodridge, Xiangyu Zhang, Hao Ding, Nimesh Nagururu, Manish Sahu, Francis X. Creighton, Russell H. Taylor, Adnan Munawar, Mathias Unberath

Abstract: Purpose: Digital twins are virtual interactive models of the real world, exhibiting identical behavior and properties. In surgical applications, computational analysis from digital twins can be used, for example, to enhance situational awareness. Methods: We present a digital twin framework for skull-base surgeries, named Twin-S, which can be integrated within various image-guided interventions se… ▽ More Purpose: Digital twins are virtual interactive models of the real world, exhibiting identical behavior and properties. In surgical applications, computational analysis from digital twins can be used, for example, to enhance situational awareness. Methods: We present a digital twin framework for skull-base surgeries, named Twin-S, which can be integrated within various image-guided interventions seamlessly. Twin-S combines high-precision optical tracking and real-time simulation. We rely on rigorous calibration routines to ensure that the digital twin representation precisely mimics all real-world processes. Twin-S models and tracks the critical components of skull-base surgery, including the surgical tool, patient anatomy, and surgical camera. Significantly, Twin-S updates and reflects real-world drilling of the anatomical model in frame rate. Results: We extensively evaluate the accuracy of Twin-S, which achieves an average 1.39 mm error during the drilling process. We further illustrate how segmentation masks derived from the continuously updated digital twin can augment the surgical microscope view in a mixed reality setting, where bone requiring ablation is highlighted to provide surgeons additional situational awareness. Conclusion: We present Twin-S, a digital twin environment for skull-base surgery. Twin-S tracks and updates the virtual model in real-time given measurements from modern tracking technologies. Future research on complementing optical tracking with higher-precision vision-based approaches may further increase the accuracy of Twin-S. △ Less

Submitted 6 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.10441 [pdf]

doi 10.3389/fbioe.2022.1010275

Rastreo muscular móvil usando magnetomicrometría -- traducción al español del articulo "Untethered Muscle Tracking Using Magnetomicrometry" por el autor Cameron R. Taylor

Authors: Cameron R. Taylor, Seong Ho Yeon, William H. Clark, Ellen G. Clarrissimeaux, Mary Kate O'Donnell, Thomas J. Roberts, Hugh M. Herr

Abstract: Muscle tissue drives nearly all movement in the animal kingdom, providing power, mobility, and dexterity. Technologies for measuring muscle tissue motion, such as sonomicrometry, fluoromicrometry, and ultrasound, have significantly advanced our understanding of biomechanics. Yet, the field lacks the ability to monitor muscle tissue motion for animal behavior outside the lab. Towards addressing thi… ▽ More Muscle tissue drives nearly all movement in the animal kingdom, providing power, mobility, and dexterity. Technologies for measuring muscle tissue motion, such as sonomicrometry, fluoromicrometry, and ultrasound, have significantly advanced our understanding of biomechanics. Yet, the field lacks the ability to monitor muscle tissue motion for animal behavior outside the lab. Towards addressing this issue, we previously introduced magnetomicrometry, a method that uses magnetic beads to wirelessly monitor muscle tissue length changes, and we validated magnetomicrometry via tightly-controlled in situ testing. In this study we validate the accuracy of magnetomicrometry against fluoromicrometry during untethered running in an in vivo turkey model. We demonstrate real-time muscle tissue length tracking of the freely-moving turkeys executing various motor activities, including ramp ascent and descent, vertical ascent and descent, and free roaming movement. Given the demonstrated capacity of magnetomicrometry to track muscle movement in untethered animals, we feel that this technique will enable new scientific explorations and an improved understanding of muscle function. -- -- El tejido muscular es el motor de casi todos los movimientos del reino animal, ya que proporciona fuerza, movilidad y destreza. Las tecnologías para medir el movimiento del tejido muscular, como la sonomicrometría, la fluoromicrometría y el ultrasonido, han avanzado considerablemente la comprensión de la biomecánica. Sin embargo, este campo carece de la capacidad de rastrear el movimiento del tejido muscular en el comportamiento animal fuera del laboratorio. Para abordar este problema, presentamos previamente la magnetomicrometría, un método que utiliza pequeños imanes para rastrear de forma inalámbrica los cambios de longitud del tejido muscular, y validamos la magnetomicrometría mediante pruebas estrechamente controladas in situ. En este estudio validamos la precisión de la magnetomicrometría en comparación con la fluoromicrometría usando un modelo de pavo in vivo mientras corre libremente. Demostramos el rastreo en tiempo real de la longitud del tejido muscular de los pavos que se mueven libremente ejecutando varias actividades motoras, incluyendo el ascenso y el descenso en rampa, el ascenso y el descenso vertical, y el movimiento libre. Dada la capacidad demostrada de la magnetomicrometría para rastrear el movimiento muscular en animales en un contexto móvil, creemos que esta técnica permitirá nuevas exploraciones científicas y una mejor comprensión de la función muscular. △ Less

Submitted 19 November, 2022; originally announced November 2022.

Comments: in Spanish language. Translation of the postprint, with the published version in English appended to the end of the PDF. Shared First Authorship: Cameron R. Taylor and Seong Ho Yeon; Shared Senior and Corresponding Authorship: Thomas J. Roberts and Hugh M. Herr

Journal ref: Front. Bioeng. Biotechnol. 10:1010275 (2022)

arXiv:2211.09085 [pdf, other]

Galactica: A Large Language Model for Science

Authors: Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, Robert Stojnic

Abstract: Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can sto… ▽ More Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%. It also sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%. And despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. We believe these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2210.11719 [pdf, other]

Context-Enhanced Stereo Transformer

Authors: Weiyu Guo, Zhaoshuo Li, Yongkui Yang, Zheng Wang, Russell H. Taylor, Mathias Unberath, Alan Yuille, Yingwei Li

Abstract: Stereo depth estimation is of great interest for computer vision research. However, existing methods struggles to generalize and predict reliably in hazardous regions, such as large uniform regions. To overcome these limitations, we propose Context Enhanced Path (CEP). CEP improves the generalization and robustness against common failure cases in existing solutions by capturing the long-range glob… ▽ More Stereo depth estimation is of great interest for computer vision research. However, existing methods struggles to generalize and predict reliably in hazardous regions, such as large uniform regions. To overcome these limitations, we propose Context Enhanced Path (CEP). CEP improves the generalization and robustness against common failure cases in existing solutions by capturing the long-range global information. We construct our stereo depth estimation model, Context Enhanced Stereo Transformer (CSTR), by plugging CEP into the state-of-the-art stereo depth estimation method Stereo Transformer. CSTR is examined on distinct public datasets, such as Scene Flow, Middlebury-2014, KITTI-2015, and MPI-Sintel. We find CSTR outperforms prior approaches by a large margin. For example, in the zero-shot synthetic-to-real setting, CSTR outperforms the best competing approaches on Middlebury-2014 dataset by 11%. Our extensive experiments demonstrate that the long-range information is critical for stereo matching task and CEP successfully captures such information. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted by ECCV2022

arXiv:2210.07929 [pdf, other]

Object Storage, Persistent Memory, and Data Infrastructure for HPC Materials Informatics

Authors: Stephanie R Taylor

Abstract: Speculation is provided on how infrastructure choices fit into the materials data ecosystem. Special attention is paid to object storage, the Intel DAOS API, storage-class memory (SCM), and the prospect of non-von Neumann computing. Lastly, the hypothesized implications of data infrastructure choices on a sample materials informatics problem is discussed: computational materials discovery of phase… ▽ More Speculation is provided on how infrastructure choices fit into the materials data ecosystem. Special attention is paid to object storage, the Intel DAOS API, storage-class memory (SCM), and the prospect of non-von Neumann computing. Lastly, the hypothesized implications of data infrastructure choices on a sample materials informatics problem is discussed: computational materials discovery of phase-change materials with properties tailored for phase-change memory (PCM). The motivation for selecting PCM as a sample materials informatics case study comes from its relevance to emerging SCM hardware. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: 27 pages, 51 figures

arXiv:2208.11940 [pdf]

Rail break and derailment prediction using Probabilistic Graphical Modelling

Authors: Rebecca M. C. Taylor, Johan A. du Preez

Abstract: Rail breaks are one of the most common causes of derailments internationally. This is no different for the South African Iron Ore line. Many rail breaks occur as a heavy-haul train passes over a crack, large defect or defective weld. In such cases, it is usually too late for the train to slow down in time to prevent a de-railment. Knowing the risk of a rail break occurring associated with a train… ▽ More Rail breaks are one of the most common causes of derailments internationally. This is no different for the South African Iron Ore line. Many rail breaks occur as a heavy-haul train passes over a crack, large defect or defective weld. In such cases, it is usually too late for the train to slow down in time to prevent a de-railment. Knowing the risk of a rail break occurring associated with a train passing over a section of rail allows for better implementation of maintenance initiatives and mitigating measures. In this paper the Ore Line's specific challenges are discussed and the currently available data that can be used to create a rail break risk prediction model is reviewed. The development of a basic rail break risk prediction model for the Ore Line is then presented. Finally the insight gained from the model is demonstrated by means of discussing various scenarios of various rail break risk. In future work, we are planning on extending this basic model to allow input from live monitoring systems such as the ultrasonic broken rail detection system. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: Proceedings of the 11'th International Heavy Haul Association Conference 2017

Journal ref: Proceedings of the 11'th International Heavy Haul Association Conference (IHHA 2017), pages 799-805 Cape Town, South Africa

arXiv:2208.09299 [pdf, other]

SimLDA: A tool for topic model evaluation

Authors: Rebecca M. C. Taylor, Johan A. du Preez

Abstract: Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it… ▽ More Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Using coherence measures we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Comments: Conference Proceedings

arXiv:2208.00565 [pdf, other]

Modeling Human Response to Robot Errors for Timely Error Detection

Authors: Maia Stiber, Russell Taylor, Chien-Ming Huang

Abstract: In human-robot collaboration, robot errors are inevitable -- damaging user trust, willingness to work together, and task performance. Prior work has shown that people naturally respond to robot errors socially and that in social interactions it is possible to use human responses to detect errors. However, there is little exploration in the domain of non-social, physical human-robot collaboration s… ▽ More In human-robot collaboration, robot errors are inevitable -- damaging user trust, willingness to work together, and task performance. Prior work has shown that people naturally respond to robot errors socially and that in social interactions it is possible to use human responses to detect errors. However, there is little exploration in the domain of non-social, physical human-robot collaboration such as assembly and tool retrieval. In this work, we investigate how people's organic, social responses to robot errors may be used to enable timely automatic detection of errors in physical human-robot interactions. We conducted a data collection study to obtain facial responses to train a real-time detection algorithm and a case study to explore the generalizability of our method with different task settings and errors. Our results show that natural social responses are effective signals for timely detection and localization of robot errors even in non-social contexts and that our method is robust across a variety of task contexts, robot errors, and user responses. This work contributes to robust error detection without detailed task specifications. △ Less

Submitted 31 July, 2022; originally announced August 2022.

Comments: Accepted to 2022 International Conference on Intelligent Robots and Systems (IROS), 8 pages, 6 figures

arXiv:2206.06127 [pdf, other]

SyntheX: Scaling Up Learning-based X-ray Image Analysis Through In Silico Experiments

Authors: Cong Gao, Benjamin D. Killeen, Yicheng Hu, Robert B. Grupp, Russell H. Taylor, Mehran Armand, Mathias Unberath

Abstract: Artificial intelligence (AI) now enables automated interpretation of medical images for clinical use. However, AI's potential use for interventional images (versus those involved in triage or diagnosis), such as for guidance during surgery, remains largely untapped. This is because surgical AI systems are currently trained using post hoc analysis of data collected during live surgeries, which has… ▽ More Artificial intelligence (AI) now enables automated interpretation of medical images for clinical use. However, AI's potential use for interventional images (versus those involved in triage or diagnosis), such as for guidance during surgery, remains largely untapped. This is because surgical AI systems are currently trained using post hoc analysis of data collected during live surgeries, which has fundamental and practical limitations, including ethical considerations, expense, scalability, data integrity, and a lack of ground truth. Here, we demonstrate that creating realistic simulated images from human models is a viable alternative and complement to large-scale in situ data collection. We show that training AI image analysis models on realistically synthesized data, combined with contemporary domain generalization or adaptation techniques, results in models that on real data perform comparably to models trained on a precisely matched real data training set. Because synthetic generation of training data from human-based models scales easily, we find that our model transfer paradigm for X-ray image analysis, which we refer to as SyntheX, can even outperform real data-trained models due to the effectiveness of training on a larger dataset. We demonstrate the potential of SyntheX on three clinical tasks: Hip image analysis, surgical robotic tool detection, and COVID-19 lung lesion segmentation. SyntheX provides an opportunity to drastically accelerate the conception, design, and evaluation of intelligent systems for X-ray-based medicine. In addition, simulated image environments provide the opportunity to test novel instrumentation, design complementary surgical approaches, and envision novel techniques that improve outcomes, save time, or mitigate human error, freed from the ethical and practical considerations of live human data collection. △ Less

Submitted 13 June, 2022; originally announced June 2022.

arXiv:2205.12332 [pdf, other]

Constant Curvature Curve Tube Codes for Low-Latency Analog Error Correction

Authors: Anders M. Buvarp, Robert M. Taylor Jr., Kumar Vijay Mishra, Lamine M. Mili, Amir I. Zaghloul

Abstract: Recent research in ultra-reliable and low latency communications (URLLC) for future wireless systems has spurred interest in short block-length codes. In this context, we analyze arbitrary harmonic bandwidth (BW) expansions for a class of high-dimension constant curvature curve codes for analog error correction of independent continuous-alphabet uniform sources. In particular, we employ the circum… ▽ More Recent research in ultra-reliable and low latency communications (URLLC) for future wireless systems has spurred interest in short block-length codes. In this context, we analyze arbitrary harmonic bandwidth (BW) expansions for a class of high-dimension constant curvature curve codes for analog error correction of independent continuous-alphabet uniform sources. In particular, we employ the circumradius function from knot theory to prescribe insulating tubes about the centerline of constant curvature curves. We then use tube packing density within a hypersphere to optimize the curve parameters. The resulting constant curvature curve tube (C3T) codes possess the smallest possible latency, i.e., block-length is unity under BW expansion map**. Further, the codes perform within $5$ dB signal-to-distortion ratio of the optimal performance theoretically achievable at a signal-to-noise ratio (SNR) $< -5$ dB for BW expansion factor $n \leq 10$. Furthermore, we propose a neural-network-based method to decode C3T codes. We show that, at low SNR, the neural-network-based C3T decoder outperforms the maximum likelihood and minimum mean-squared error decoders for all $n$. The best possible digital codes require two to three orders of magnitude higher latency compared to C3T codes, thereby demonstrating the latter's utility for URLLC. △ Less

Submitted 2 August, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: 15 pages, 4 tables, 11 figures

arXiv:2202.09487 [pdf, other]

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Authors: Xingtong Liu, Zhaoshuo Li, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath

Abstract: In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Map** system by combining the learning-based appearance and optimizable geometry priors and factor grap… ▽ More In endoscopy, many applications (e.g., surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Map** system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllppl920/SAGE-SLAM.git. △ Less

Submitted 22 February, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: Accepted to ICRA 2022

arXiv:2201.00383 [pdf, other]

Integrating Artificial Intelligence and Augmented Reality in Robotic Surgery: An Initial dVRK Study Using a Surgical Education Scenario

Authors: Yonghao Long, Jianfeng Cao, Anton Deguet, Russell H. Taylor, Qi Dou

Abstract: Robot-assisted surgery has become progressively more and more popular due to its clinical advantages. In the meanwhile, the artificial intelligence and augmented reality in robotic surgery are develo** rapidly and receive lots of attention. However, current methods have not discussed the coherent integration of AI and AR in robotic surgery. In this paper, we develop a novel system by seamlessly… ▽ More Robot-assisted surgery has become progressively more and more popular due to its clinical advantages. In the meanwhile, the artificial intelligence and augmented reality in robotic surgery are develo** rapidly and receive lots of attention. However, current methods have not discussed the coherent integration of AI and AR in robotic surgery. In this paper, we develop a novel system by seamlessly merging artificial intelligence module and augmented reality visualization to automatically generate the surgical guidance for robotic surgery education. Specifically, we first leverage reinforcement leaning to learn from expert demonstration and then generate 3D guidance trajectory, providing prior context information of the surgical procedure. Along with other information such as text hint, the 3D trajectory is then overlaid in the stereo view of dVRK, where the user can perceive the 3D guidance and learn the procedure. The proposed system is evaluated through a preliminary experiment on surgical education task peg-transfer, which proves its feasibility and potential as the next generation of robot-assisted surgery education solution. △ Less

Submitted 3 March, 2022; v1 submitted 2 January, 2022; originally announced January 2022.

arXiv:2111.09337 [pdf, other]

Temporally Consistent Online Depth Estimation in Dynamic Scenes

Authors: Zhaoshuo Li, Wei Ye, Dilin Wang, Francis X. Creighton, Russell H. Taylor, Ganesh Venkatesh, Mathias Unberath

Abstract: Temporally consistent depth estimation is crucial for online applications such as augmented reality. While stereo depth estimation has received substantial attention as a promising way to generate 3D information, there is relatively little work focused on maintaining temporal stability. Indeed, based on our analysis, current techniques still suffer from poor temporal consistency. Stabilizing depth… ▽ More Temporally consistent depth estimation is crucial for online applications such as augmented reality. While stereo depth estimation has received substantial attention as a promising way to generate 3D information, there is relatively little work focused on maintaining temporal stability. Indeed, based on our analysis, current techniques still suffer from poor temporal consistency. Stabilizing depth temporally in dynamic scenes is challenging due to concurrent object and camera motion. In an online setting, this process is further aggravated because only past frames are available. We present a framework named Consistent Online Dynamic Depth (CODD) to produce temporally consistent depth estimates in dynamic scenes in an online setting. CODD augments per-frame stereo networks with novel motion and fusion networks. The motion network accounts for dynamics by predicting a per-pixel SE3 transformation and aligning the observations. The fusion network improves temporal depth consistency by aggregating the current and past estimates. We conduct extensive experiments and demonstrate quantitatively and qualitatively that CODD outperforms competing methods in terms of temporal consistency and performs on par in terms of per-frame accuracy. △ Less

Submitted 8 December, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

Comments: WACV 2023, project page: https://mli0603.github.io/codd/

arXiv:2111.08097 [pdf, other]

doi 10.1080/21681163.2021.1999331

Virtual Reality for Synergistic Surgical Training and Data Generation

Authors: Adnan Munawar, Zhaoshuo Li, Punit Kunjam, Nimesh Nagururu, Andy S. Ding, Peter Kazanzides, Thomas Looi, Francis X. Creighton, Russell H. Taylor, Mathias Unberath

Abstract: Surgical simulators not only allow planning and training of complex procedures, but also offer the ability to generate structured data for algorithm development, which may be applied in image-guided computer assisted interventions. While there have been efforts on either develo** training platforms for surgeons or data generation engines, these two features, to our knowledge, have not been offer… ▽ More Surgical simulators not only allow planning and training of complex procedures, but also offer the ability to generate structured data for algorithm development, which may be applied in image-guided computer assisted interventions. While there have been efforts on either develo** training platforms for surgeons or data generation engines, these two features, to our knowledge, have not been offered together. We present our developments of a cost-effective and synergistic framework, named Asynchronous Multibody Framework Plus (AMBF+), which generates data for downstream algorithm development simultaneously with users practicing their surgical skills. AMBF+ offers stereoscopic display on a virtual reality (VR) device and haptic feedback for immersive surgical simulation. It can also generate diverse data such as object poses and segmentation maps. AMBF+ is designed with a flexible plugin setup which allows for unobtrusive extension for simulation of different surgical procedures. We show one use case of AMBF+ as a virtual drilling simulator for lateral skull-base surgery, where users can actively modify the patient anatomy using a virtual surgical drill. We further demonstrate how the data generated can be used for validating and training downstream computer vision algorithms △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: MICCAI 2021 AE-CAI "Outstanding Paper Award" Code: https://github.com/LCSR-SICKKIDS/volumetric_drilling

arXiv:2111.01480 [pdf, other]

A derivation of variational message passing (VMP) for latent Dirichlet allocation (LDA)

Authors: Rebecca M. C. Taylor, Dirko Coetsee, Johan A. du Preez

Abstract: Latent Dirichlet Allocation (LDA) is a probabilistic model used to uncover latent topics in a corpus of documents. Inference is often performed using variational Bayes (VB) algorithms, which calculate a lower bound to the posterior distribution over the parameters. Deriving the variational update equations for new models requires considerable manual effort; variational message passing (VMP) has em… ▽ More Latent Dirichlet Allocation (LDA) is a probabilistic model used to uncover latent topics in a corpus of documents. Inference is often performed using variational Bayes (VB) algorithms, which calculate a lower bound to the posterior distribution over the parameters. Deriving the variational update equations for new models requires considerable manual effort; variational message passing (VMP) has emerged as a "black-box" tool to expedite the process of variational inference. But applying VMP in practice still presents subtle challenges, and the existing literature does not contain the steps that are necessary to implement VMP for the standard smoothed LDA model, nor are available black-box probabilistic graphical modelling software able to do the word-topic updates necessary to implement LDA. In this paper, we therefore present a detailed derivation of the VMP update equations for LDA. We see this as a first step to enabling other researchers to calculate the VMP updates for similar graphical models. △ Less

Submitted 25 August, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: 24 pages,not yet submitted anywhere

MSC Class: G.3

arXiv:2110.00635 [pdf, other]

ALBU: An approximate Loopy Belief message passing algorithm for LDA to improve performance on small data sets

Authors: Rebecca M. C. Taylor, Johan A. du Preez

Abstract: Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it… ▽ More Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Additionally, to perform more fine grained evaluations and comparisons, we use simulations that enable comparisons with the ground truth via Kullback-Leibler divergence (KLD). Using coherence measures for the text corpora and KLD with the simulations we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets. △ Less

Submitted 19 August, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

Comments: Springer has accepted to publish the Proceedings of 2022 Computing Conference in the series "Lecture Notes in Networks and Systems"

MSC Class: 62F15 ACM Class: G.3

Journal ref: In Science and Information Conference (pp. 723-746). Springer, Cham (2022)

arXiv:2109.06163 [pdf, other]

On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation

Authors: Zhaoshuo Li, Nathan Drenkow, Hao Ding, Andy S. Ding, Alexander Lu, Francis X. Creighton, Russell H. Taylor, Mathias Unberath

Abstract: Scene depth estimation from stereo and monocular imagery is critical for extracting 3D information for downstream tasks such as scene understanding. Recently, learning-based methods for depth estimation have received much attention due to their high performance and flexibility in hardware choice. However, collecting ground truth data for supervised training of these algorithms is costly or outrigh… ▽ More Scene depth estimation from stereo and monocular imagery is critical for extracting 3D information for downstream tasks such as scene understanding. Recently, learning-based methods for depth estimation have received much attention due to their high performance and flexibility in hardware choice. However, collecting ground truth data for supervised training of these algorithms is costly or outright impossible. This circumstance suggests a need for alternative learning approaches that do not require corresponding depth measurements. Indeed, self-supervised learning of depth estimation provides an increasingly popular alternative. It is based on the idea that observed frames can be synthesized from neighboring frames if accurate depth of the scene is known - or in this case, estimated. We show empirically that - contrary to common belief - improvements in image synthesis do not necessitate improvement in depth estimation. Rather, optimizing for image synthesis can result in diverging performance with respect to the main prediction objective - depth. We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data. Based on our experiments on four datasets (spanning street, indoor, and medical) and five architectures (monocular and stereo), we conclude that this diverging phenomenon is independent of the dataset domain and not mitigated by commonly used regularization techniques. To underscore the importance of this finding, we include a survey of methods which use image synthesis, totaling 127 papers over the last six years. This observed divergence has not been previously reported or studied in depth, suggesting room for future improvement of self-supervised approaches which might be impacted the finding. △ Less

Submitted 10 October, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: preprint

arXiv:2108.02238 [pdf, other]

The Impact of Machine Learning on 2D/3D Registration for Image-guided Interventions: A Systematic Review and Perspective

Authors: Mathias Unberath, Cong Gao, Yicheng Hu, Max Judish, Russell H Taylor, Mehran Armand, Robert Grupp

Abstract: Image-based navigation is widely considered the next frontier of minimally invasive surgery. It is believed that image-based navigation will increase the access to reproducible, safe, and high-precision surgery as it may then be performed at acceptable costs and effort. This is because image-based techniques avoid the need of specialized equipment and seamlessly integrate with contemporary workflo… ▽ More Image-based navigation is widely considered the next frontier of minimally invasive surgery. It is believed that image-based navigation will increase the access to reproducible, safe, and high-precision surgery as it may then be performed at acceptable costs and effort. This is because image-based techniques avoid the need of specialized equipment and seamlessly integrate with contemporary workflows. Further, it is expected that image-based navigation will play a major role in enabling mixed reality environments and autonomous, robotic workflows. A critical component of image guidance is 2D/3D registration, a technique to estimate the spatial relationships between 3D structures, e.g., volumetric imagery or tool models, and 2D images thereof, such as fluoroscopy or endoscopy. While image-based 2D/3D registration is a mature technique, its transition from the bench to the bedside has been restrained by well-known challenges, including brittleness of the optimization objective, hyperparameter selection, and initialization, difficulties around inconsistencies or multiple objects, and limited single-view performance. One reason these challenges persist today is that analytical solutions are likely inadequate considering the complexity, variability, and high-dimensionality of generic 2D/3D registration problems. The recent advent of machine learning-based approaches to imaging problems that, rather than specifying the desired functional map**, approximate it using highly expressive parametric models holds promise for solving some of the notorious challenges in 2D/3D registration. In this manuscript, we review the impact of machine learning on 2D/3D registration to systematically summarize the recent advances made by introduction of this novel technology. Grounded in these insights, we then offer our perspective on the most pressing needs, significant open problems, and possible next steps. △ Less

Submitted 4 August, 2021; originally announced August 2021.

arXiv:2107.02975 [pdf, other]

Neural Natural Language Processing for Unstructured Data in Electronic Health Records: a Review

Authors: Irene Li, Jessica Pan, Jeremy Goldwasser, Neha Verma, Wai Pan Wong, Muhammed Yavuz Nuzumlalı, Benjamin Rosand, Yixin Li, Matthew Zhang, David Chang, R. Andrew Taylor, Harlan M. Krumholz, Dragomir Radev

Abstract: Electronic health records (EHRs), digital collections of patient healthcare events and observations, are ubiquitous in medicine and critical to healthcare delivery, operations, and research. Despite this central role, EHRs are notoriously difficult to process automatically. Well over half of the information stored within EHRs is in the form of unstructured text (e.g. provider notes, operation repo… ▽ More Electronic health records (EHRs), digital collections of patient healthcare events and observations, are ubiquitous in medicine and critical to healthcare delivery, operations, and research. Despite this central role, EHRs are notoriously difficult to process automatically. Well over half of the information stored within EHRs is in the form of unstructured text (e.g. provider notes, operation reports) and remains largely untapped for secondary use. Recently, however, newer neural network and deep learning approaches to Natural Language Processing (NLP) have made considerable advances, outperforming traditional statistical and rule-based systems on a variety of tasks. In this survey paper, we summarize current neural NLP methods for EHR applications. We focus on a broad scope of tasks, namely, classification and prediction, word embeddings, extraction, generation, and other topics such as question answering, phenoty**, knowledge graphs, medical dialogue, multilinguality, interpretability, etc. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: 33 pages, 11 figures

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2107.00229 [pdf, other]

E-DSSR: Efficient Dynamic Surgical Scene Reconstruction with Transformer-based Stereoscopic Depth Perception

Authors: Yonghao Long, Zhaoshuo Li, Chi Hang Yee, Chi Fai Ng, Russell H. Taylor, Mathias Unberath, Qi Dou

Abstract: Reconstructing the scene of robotic surgery from the stereo endoscopic video is an important and promising topic in surgical data science, which potentially supports many applications such as surgical visual perception, robotic surgery education and intra-operative context awareness. However, current methods are mostly restricted to reconstructing static anatomy assuming no tissue deformation, too… ▽ More Reconstructing the scene of robotic surgery from the stereo endoscopic video is an important and promising topic in surgical data science, which potentially supports many applications such as surgical visual perception, robotic surgery education and intra-operative context awareness. However, current methods are mostly restricted to reconstructing static anatomy assuming no tissue deformation, tool occlusion and de-occlusion, and camera movement. However, these assumptions are not always satisfied in minimal invasive robotic surgeries. In this work, we present an efficient reconstruction pipeline for highly dynamic surgical scenes that runs at 28 fps. Specifically, we design a transformer-based stereoscopic depth perception for efficient depth estimation and a light-weight tool segmentor to handle tool occlusion. After that, a dynamic reconstruction algorithm which can estimate the tissue deformation and camera movement, and aggregate the information over time is proposed for surgical scene reconstruction. We evaluate the proposed pipeline on two datasets, the public Hamlyn Centre Endoscopic Video Dataset and our in-house DaVinci robotic surgery dataset. The results demonstrate that our method can recover the scene obstructed by the surgical tool and handle the movement of camera in realistic surgical scenarios effectively at real-time speed. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: Accepted to MICCAI 2021

arXiv:2104.09869 [pdf]

doi 10.1109/MRA.2021.3101646

Accelerating Surgical Robotics Research: A Review of 10 Years With the da Vinci Research Kit

Authors: Claudia D'Ettorre, Andrea Mariani, Agostino Stilli, Ferdinando Rodriguez y Baena, Pietro Valdastri, Anton Deguet, Peter Kazanzides, Russell H. Taylor, Gregory S. Fischer, Simon P. DiMaio, Arianna Menciassi, Danail Stoyanov

Abstract: Robotic-assisted surgery is now well-established in clinical practice and has become the gold standard clinical treatment option for several clinical indications. The field of robotic-assisted surgery is expected to grow substantially in the next decade with a range of new robotic devices emerging to address unmet clinical needs across different specialities. A vibrant surgical robotics research c… ▽ More Robotic-assisted surgery is now well-established in clinical practice and has become the gold standard clinical treatment option for several clinical indications. The field of robotic-assisted surgery is expected to grow substantially in the next decade with a range of new robotic devices emerging to address unmet clinical needs across different specialities. A vibrant surgical robotics research community is pivotal for conceptualizing such new systems as well as for develo** and training the engineers and scientists to translate them into practice. The da Vinci Research Kit (dVRK), an academic and industry collaborative effort to re-purpose decommissioned da Vinci surgical systems (Intuitive Surgical Inc, CA, USA) as a research platform for surgical robotics research, has been a key initiative for addressing a barrier to entry for new research groups in surgical robotics. In this paper, we present an extensive review of the publications that have been facilitated by the dVRK over the past decade. We classify research efforts into different categories and outline some of the major challenges and needs for the robotics community to maintain this initiative and build upon it. △ Less

Submitted 17 November, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

arXiv:2101.12719 [pdf, other]

Predicting Nanorobot Shapes via Generative Models

Authors: Emma Benjaminson, Rebecca E. Taylor, Matthew Travers

Abstract: The field of DNA nanotechnology has made it possible to assemble, with high yields, different structures that have actionable properties. For example, researchers have created components that can be actuated. An exciting next step is to combine these components into multifunctional nanorobots that could, potentially, perform complex tasks like swimming to a target location in the human body, detec… ▽ More The field of DNA nanotechnology has made it possible to assemble, with high yields, different structures that have actionable properties. For example, researchers have created components that can be actuated. An exciting next step is to combine these components into multifunctional nanorobots that could, potentially, perform complex tasks like swimming to a target location in the human body, detect an adverse reaction and then release a drug load to stop it. However, as we start to assemble more complex nanorobots, the yield of the desired nanorobot begins to decrease as the number of possible component combinations increases. Therefore, the ultimate goal of this work is to develop a predictive model to maximize yield. However, training predictive models typically requires a large dataset. For the nanorobots we are interested in assembling, this will be difficult to collect. This is because high-fidelity data, which allows us to characterize the shape and size of individual structures, is very time-consuming to collect, whereas low-fidelity data is readily available but only captures bulk statistics for different processes. Therefore, this work combines low- and high-fidelity data to train a generative model using a two-step process. We first use a relatively small, high-fidelity dataset to train a generative model. At run time, the model takes low-fidelity data and uses it to approximate the high-fidelity content. We do this by biasing the model towards samples with specific properties as measured by low-fidelity data. In this work we bias our distribution towards a desired node degree of a graphical model that we take as a surrogate representation of the nanorobots that this work will ultimately focus on. We have not yet accumulated a high-fidelity dataset of nanorobots, so we leverage the MolGAN architecture [1] and the QM9 small molecule dataset [2-3] to demonstrate our approach. △ Less

Submitted 29 January, 2021; originally announced January 2021.

Comments: 8 pages, 2 figures, accepted to Machine Learning for Engineering Modeling, Simulation, and Design Workshop at Neural Information Processing Systems 2020, December 12, 2020

arXiv:2101.07339 [pdf, other]

MONAH: Multi-Modal Narratives for Humans to analyze conversations

Authors: Joshua Y. Kim, Greyson Y. Kim, Chunfeng Liu, Rafael A. Calvo, Silas C. R. Taylor, Kalina Yacef

Abstract: In conversational analyses, humans manually weave multimodal information into the transcripts, which is significantly time-consuming. We introduce a system that automatically expands the verbatim transcripts of video-recorded conversations using multimodal data streams. This system uses a set of preprocessing rules to weave multimodal annotations into the verbatim transcripts and promote interpret… ▽ More In conversational analyses, humans manually weave multimodal information into the transcripts, which is significantly time-consuming. We introduce a system that automatically expands the verbatim transcripts of video-recorded conversations using multimodal data streams. This system uses a set of preprocessing rules to weave multimodal annotations into the verbatim transcripts and promote interpretability. Our feature engineering contributions are two-fold: firstly, we identify the range of multimodal features relevant to detect rapport-building; secondly, we expand the range of multimodal annotations and show that the expansion leads to statistically significant improvements in detecting rapport-building. △ Less

Submitted 19 January, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

Comments: 14 pages

ACM Class: I.7.2

arXiv:2012.07756 [pdf]

Medical Robots for Infectious Diseases: Lessons and Challenges from the COVID-19 Pandemic

Authors: Antonio Di Lallo, Robin R. Murphy, Axel Krieger, Junxi Zhu, Russell H. Taylor, Hao Su

Abstract: Medical robots can play an important role in mitigating the spread of infectious diseases and delivering quality care to patients during the COVID-19 pandemic. Methods and procedures involving medical robots in the continuum of care, ranging from disease prevention, screening, diagnosis, treatment, and homecare have been extensively deployed and also present incredible opportunities for future dev… ▽ More Medical robots can play an important role in mitigating the spread of infectious diseases and delivering quality care to patients during the COVID-19 pandemic. Methods and procedures involving medical robots in the continuum of care, ranging from disease prevention, screening, diagnosis, treatment, and homecare have been extensively deployed and also present incredible opportunities for future development. This paper provides an overview of the current state-of-the-art, highlighting the enabling technologies and unmet needs for prospective technological advances within the next 5-10 years. We also identify key research and knowledge barriers that need to be addressed in develo** effective and flexible solutions to ensure preparedness for rapid and scalable deployment to combat infectious diseases. △ Less

Submitted 14 December, 2020; originally announced December 2020.

Comments: 8 pages, 9 figures, publish in RA-magazine

Report number: 20201214

arXiv:2012.06292 [pdf, other]

Spotlight-based 3D Instrument Guidance for Retinal Surgery

Authors: Mingchuan Zhou, Jiahao Wu, Ali Ebrahimi, Niravkumar Patel, Changyan He, Peter Gehlbach, Russell H Taylor, Alois Knoll, M Ali Nasseri, Iulian I Iordachita

Abstract: Retinal surgery is a complex activity that can be challenging for a surgeon to perform effectively and safely. Image guided robot-assisted surgery is one of the promising solutions that bring significant surgical enhancement in treatment outcome and reduce the physical limitations of human surgeons. In this paper, we demonstrate a novel method for 3D guidance of the instrument based on the project… ▽ More Retinal surgery is a complex activity that can be challenging for a surgeon to perform effectively and safely. Image guided robot-assisted surgery is one of the promising solutions that bring significant surgical enhancement in treatment outcome and reduce the physical limitations of human surgeons. In this paper, we demonstrate a novel method for 3D guidance of the instrument based on the projection of spotlight in the single microscope images. The spotlight projection mechanism is firstly analyzed and modeled with a projection on both a plane and a sphere surface. To test the feasibility of the proposed method, a light fiber is integrated into the instrument which is driven by the Steady-Hand Eye Robot (SHER). The spot of light is segmented and tracked on a phantom retina using the proposed algorithm. The static calibration and dynamic test results both show that the proposed method can easily archive 0.5 mm of tip-to-surface distance which is within the clinically acceptable accuracy for intraocular visual guidance. △ Less

Submitted 11 December, 2020; originally announced December 2020.

arXiv:2011.02910 [pdf, other]

Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers

Authors: Zhaoshuo Li, Xingtong Liu, Nathan Drenkow, Andy Ding, Francis X. Creighton, Russell H. Taylor, Mathias Unberath

Abstract: Stereo depth estimation relies on optimal correspondence matching between pixels on epipolar lines in the left and right images to infer depth. In this work, we revisit the problem from a sequence-to-sequence correspondence perspective to replace cost volume construction with dense pixel matching using position information and attention. This approach, named STereo TRansformer (STTR), has several… ▽ More Stereo depth estimation relies on optimal correspondence matching between pixels on epipolar lines in the left and right images to infer depth. In this work, we revisit the problem from a sequence-to-sequence correspondence perspective to replace cost volume construction with dense pixel matching using position information and attention. This approach, named STereo TRansformer (STTR), has several advantages: It 1) relaxes the limitation of a fixed disparity range, 2) identifies occluded regions and provides confidence estimates, and 3) imposes uniqueness constraints during the matching process. We report promising results on both synthetic and real-world datasets and demonstrate that STTR generalizes across different domains, even without fine-tuning. △ Less

Submitted 25 August, 2021; v1 submitted 5 November, 2020; originally announced November 2020.

Comments: Our code is available at https://github.com/mli0603/stereo-transformer

Journal ref: ICCV 2021 Oral

Showing 1–50 of 102 results for author: Taylor, R