Search | arXiv e-print repository

MAIRA-2: Grounded Radiology Report Generation

Authors: Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Anton Schwaighofer, Sam Bond-Taylor, Maximilian Ilse, Fernando Pérez-García, Valentina Salvatelli, Harshita Sharma, Felix Meissen, Mercy Ranjit, Shaury Srivastav, Julia Gong, Fabian Falck, Ozan Oktay, Anja Thieme, Matthew P. Lungren, Maria Teodora Wetscherek, Javier Alvarez-Valle, Stephanie L. Hyland

Abstract: Radiology reporting is a complex task that requires detailed image understanding, integration of multiple inputs, including comparison with prior imaging, and precise language generation. This makes it ideal for the development and use of generative multimodal models. Here, we extend report generation to include the localisation of individual findings on the image - a task we call grounded report… ▽ More Radiology reporting is a complex task that requires detailed image understanding, integration of multiple inputs, including comparison with prior imaging, and precise language generation. This makes it ideal for the development and use of generative multimodal models. Here, we extend report generation to include the localisation of individual findings on the image - a task we call grounded report generation. Prior work indicates that grounding is important for clarifying image understanding and interpreting AI-generated text. Therefore, grounded reporting stands to improve the utility and transparency of automated report drafting. To enable evaluation of grounded reporting, we propose a novel evaluation framework - RadFact - leveraging the reasoning capabilities of large language models (LLMs). RadFact assesses the factuality of individual generated sentences, as well as correctness of generated spatial localisations when present. We introduce MAIRA-2, a large multimodal model combining a radiology-specific image encoder with a LLM, and trained for the new task of grounded report generation on chest X-rays. MAIRA-2 uses more comprehensive inputs than explored previously: the current frontal image, the current lateral image, the prior frontal image and prior report, as well as the Indication, Technique and Comparison sections of the current report. We demonstrate that these additions significantly improve report quality and reduce hallucinations, establishing a new state of the art on findings generation (without grounding) on MIMIC-CXR while demonstrating the feasibility of grounded reporting as a novel and richer task. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 44 pages, 20 figures

arXiv:2406.02660 [pdf, other]

doi 10.1051/0004-6361/202450590

First VLBI detection of Fornax A

Authors: G. F. Paraschos, M. Wielgus, P. Benke, V. Mpisketzis, F. Rösch, K. Dasyra, E. Ros, M. Kadler, R. Ojha, P. G. Edwards, L. Hyland, J. F. H. Quick, S. Weston

Abstract: Radio galaxies harbouring jetted active galactic nuclei are a frequent target of very-long-baseline interferometry (VLBI) because they play an essential role in exploring how jets form and propagate. Hence, only few have not been detected with VLBI yet; Fornax A is one of the most famous examples. Here we present the first detection of the compact core region of Fornax A with VLBI. At 8.4 GHz the… ▽ More Radio galaxies harbouring jetted active galactic nuclei are a frequent target of very-long-baseline interferometry (VLBI) because they play an essential role in exploring how jets form and propagate. Hence, only few have not been detected with VLBI yet; Fornax A is one of the most famous examples. Here we present the first detection of the compact core region of Fornax A with VLBI. At 8.4 GHz the faint core is consistent with an unresolved point source. We constrained its flux density to be $S_0 = 47.5-62.3\,\textrm{mJy}$ and its diameter to be $D^\textrm{min}_0 \leq 70\,μ\textrm{as}$. The high values of the measured brightness temperature ($T_\textrm{B} \gtrsim 10^{11}\,\textrm{K}$) imply that the observed radiation is of non-thermal origin, likely associated with the synchrotron emission from the active galactic nucleus. We also investigated the possibility of a second radio source being present within the field of view. Adding a second Gaussian component to the geometrical model-fit does not significantly improve the quality of the fit and we, therefore, conclude that our detection corresponds to the compact core of Fornax A. Analysis of the non-trivial closure phases provides evidence for the detection of more extended flux density, on the angular scale of $\sim4000\,μ\textrm{as}$. Finally, the fractional circular polarisation of the core is consistent with zero, with a conservative upper limit being $m_\textrm{circ} \leq 4\%$. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 4 pages, 2 figures, accepted for publication in A&A

Journal ref: A&A 687, L6 (2024)

arXiv:2405.12370 [pdf, other]

Swift J1727.8-1613 has the Largest Resolved Continuous Jet Ever Seen in an X-ray Binary

Authors: Callan M. Wood, James C. A. Miller-Jones, Arash Bahramian, Steven J. Tingay, Steve Prabu, Thomas D. Russell, Pikky Atri, Francesco Carotenuto, Diego Altamirano, Sara E. Motta, Lucas Hyland, Cormac Reynolds, Stuart Weston, Rob Fender, Elmar Körding, Dipankar Maitra, Sera Markoff, Simone Migliari, David M. Russell, Craig L. Sarazin, Gregory R. Sivakoff, Roberto Soria, Alexandra J. Tetarenko, Valeriu Tudose

Abstract: Multi-wavelength polarimetry and radio observations of Swift J1727.8-1613 at the beginning of its recent 2023 outburst suggested the presence of a bright compact jet aligned in the north-south direction, which could not be confirmed without high angular resolution images. Using the Very Long Baseline Array and the Long Baseline Array, we imaged Swift J1727.8-1613, during the hard/hard-intermediate… ▽ More Multi-wavelength polarimetry and radio observations of Swift J1727.8-1613 at the beginning of its recent 2023 outburst suggested the presence of a bright compact jet aligned in the north-south direction, which could not be confirmed without high angular resolution images. Using the Very Long Baseline Array and the Long Baseline Array, we imaged Swift J1727.8-1613, during the hard/hard-intermediate state, revealing a bright core and a large, two-sided, asymmetrical, resolved jet. The jet extends in the north-south direction, at a position angle of $-0.60\pm0.07°$ East of North. At 8.4 GHz, the entire resolved jet structure is $\sim110 (d/2.7\,\text{kpc})/\sin i$ AU long, with the southern approaching jet extending $\sim80 (d/2.7\,\text{kpc})/\sin i$ AU from the core, where $d$ is the distance to the source and $i$ is the inclination of the jet axis to the line of sight. These images reveal the most resolved continuous X-ray binary jet, and possibly the most physically extended continuous X-ray binary jet ever observed. Based on the brightness ratio of the approaching and receding jets, we put a lower limit on the intrinsic jet speed of $β\geq0.27$ and an upper limit on the jet inclination of $i\leq74°$. In our first observation we also detected a rapidly fading discrete jet knot $66.89\pm0.04$ mas south of the core, with a proper motion of $0.66\pm0.05$ mas hour$^{-1}$, which we interpret as the result of a downstream internal shock or a jet-ISM interaction, as opposed to a transient relativistic jet launched at the beginning of the outburst. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Submitted to ApJL

arXiv:2405.05299 [pdf, other]

Challenges for Responsible AI Design and Workflow Integration in Healthcare: A Case Study of Automatic Feeding Tube Qualification in Radiology

Authors: Anja Thieme, Abhijith Rajamohan, Benjamin Cooper, Heather Groombridge, Robert Simister, Barney Wong, Nicholas Woznitza, Mark Ames Pinnock, Maria Teodora Wetscherek, Cecily Morrison, Hannah Richardson, Fernando Pérez-García, Stephanie L. Hyland, Shruthi Bannur, Daniel C. Castro, Kenza Bouzid, Anton Schwaighofer, Mercy Ranjit, Harshita Sharma, Matthew P. Lungren, Ozan Oktay, Javier Alvarez-Valle, Aditya Nori, Stephen Harris, Joseph Jacob

Abstract: Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication. If not placed correctly, they can cause serious harm, even death to patients. Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images to reduce risks of sub-optimally or critically placed NGTs being missed or delay… ▽ More Nasogastric tubes (NGTs) are feeding tubes that are inserted through the nose into the stomach to deliver nutrition or medication. If not placed correctly, they can cause serious harm, even death to patients. Recent AI developments demonstrate the feasibility of robustly detecting NGT placement from Chest X-ray images to reduce risks of sub-optimally or critically placed NGTs being missed or delayed in their detection, but gaps remain in clinical practice integration. In this study, we present a human-centered approach to the problem and describe insights derived following contextual inquiry and in-depth interviews with 15 clinical stakeholders. The interviews helped understand challenges in existing workflows, and how best to align technical capabilities with user needs and expectations. We discovered the trade-offs and complexities that need consideration when choosing suitable workflow stages, target users, and design configurations for different AI proposals. We explored how to balance AI benefits and risks for healthcare staff and patients within broader organizational and medical-legal constraints. We also identified data issues related to edge cases and data biases that affect model training and evaluation; how data documentation practices influence data preparation and labelling; and how to measure relevant AI outcomes reliably in future evaluations. We discuss how our work informs design and development of AI applications that are clinically useful, ethical, and acceptable in real-world healthcare services. △ Less

Submitted 8 May, 2024; originally announced May 2024.

ACM Class: H.5.m; I.2.m

arXiv:2402.14252 [pdf, other]

doi 10.1145/3613904.3642013

Multimodal Healthcare AI: Identifying and Designing Clinically Relevant Vision-Language Applications for Radiology

Authors: Nur Yildirim, Hannah Richardson, Maria T. Wetscherek, Junaid Bajwa, Joseph Jacob, Mark A. Pinnock, Stephen Harris, Daniel Coelho de Castro, Shruthi Bannur, Stephanie L. Hyland, Pratik Ghosh, Mercy Ranjit, Kenza Bouzid, Anton Schwaighofer, Fernando Pérez-García, Harshita Sharma, Ozan Oktay, Matthew Lungren, Javier Alvarez-Valle, Aditya Nori, Anja Thieme

Abstract: Recent advances in AI combine large language models (LLMs) with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-language models (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient's medical image, or answering visual que… ▽ More Recent advances in AI combine large language models (LLMs) with vision encoders that bring forward unprecedented technical capabilities to leverage for a wide range of healthcare applications. Focusing on the domain of radiology, vision-language models (VLMs) achieve good performance results for tasks such as generating radiology findings based on a patient's medical image, or answering visual questions (e.g., 'Where are the nodules in this chest X-ray?'). However, the clinical utility of potential applications of these capabilities is currently underexplored. We engaged in an iterative, multidisciplinary design process to envision clinically relevant VLM interactions, and co-designed four VLM use concepts: Draft Report Generation, Augmented Report Review, Visual Search and Querying, and Patient Imaging History Highlights. We studied these concepts with 13 radiologists and clinicians who assessed the VLM concepts as valuable, yet articulated many design considerations. Reflecting on our findings, we discuss implications for integrating VLM capabilities in radiology, and for healthcare AI more generally. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: to appear at CHI 2024

arXiv:2401.10815 [pdf, other]

RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision

Authors: Fernando Pérez-García, Harshita Sharma, Sam Bond-Taylor, Kenza Bouzid, Valentina Salvatelli, Maximilian Ilse, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Matthew P. Lungren, Maria Wetscherek, Noel Codella, Stephanie L. Hyland, Javier Alvarez-Valle, Ozan Oktay

Abstract: Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images, serving as a foundational element in multimodal systems within the computer vision and medical imaging domains. However, resulting features are limited by the information contained within the text. This is particularly problematic in medical imaging, where radiologists'… ▽ More Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images, serving as a foundational element in multimodal systems within the computer vision and medical imaging domains. However, resulting features are limited by the information contained within the text. This is particularly problematic in medical imaging, where radiologists' written findings focus on specific observations; a challenge compounded by the scarcity of paired imaging-text data due to concerns over leakage of personal health information. In this work, we fundamentally challenge the prevailing reliance on language supervision for learning general purpose biomedical imaging encoders. We introduce RAD-DINO, a biomedical image encoder pre-trained solely on unimodal biomedical imaging data that obtains similar or greater performance than state-of-the-art biomedical language supervised models on a diverse range of benchmarks. Specifically, the quality of learned representations is evaluated on standard imaging tasks (classification and semantic segmentation), and a vision-language alignment task (text report generation from images). To further demonstrate the drawback of language supervision, we show that features from RAD-DINO correlate with other medical records (e.g., sex or age) better than language-supervised models, which are generally not mentioned in radiology reports. Finally, we conduct a series of ablations determining the factors in RAD-DINO's performance; notably, we observe that RAD-DINO's downstream performance scales well with the quantity and diversity of training data, demonstrating that image-only supervision is a scalable approach for training a foundational biomedical image encoder. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2311.13668 [pdf, other]

MAIRA-1: A specialised large multimodal model for radiology report generation

Authors: Stephanie L. Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Mercy Ranjit, Anton Schwaighofer, Fernando Pérez-García, Valentina Salvatelli, Shaury Srivastav, Anja Thieme, Noel Codella, Matthew P. Lungren, Maria Teodora Wetscherek, Ozan Oktay, Javier Alvarez-Valle

Abstract: We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities… ▽ More We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with pre-trained vision encoders. On natural images, this has been shown to allow multimodal models to gain image understanding and description capabilities. Our proposed model (MAIRA-1) leverages a CXR-specific image encoder in conjunction with a fine-tuned large language model based on Vicuna-7B, and text-based data augmentation, to produce reports with state-of-the-art quality. In particular, MAIRA-1 significantly improves on the radiologist-aligned RadCliQ metric and across all lexical metrics considered. Manual review of model outputs demonstrates promising fluency and accuracy of generated reports while uncovering failure modes not captured by existing evaluation practices. More information and resources can be found on the project website: https://aka.ms/maira. △ Less

Submitted 26 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: 18 pages, 9 tables, 5 figures. v2 adds test IDs and image encoder citation. v3 fixes error in NPV/specificity

arXiv:2310.10206 [pdf, other]

doi 10.1051/0004-6361/202347823

TANAMI: Tracking Active Galactic Nuclei with Austral Milliarcsecond Interferometry. III. First-epoch S band images

Authors: Petra Benke, Florian Rösch, Eduardo Ros, Matthias Kadler, Roopesh Ojha, Philip G. Edwards, Shinji Horiuchi, Lucas J. Hyland, Chris Phillips, Jonathan F. H. Quick, Jamie Stevens, Anastasios K. Tzioumis, Stuart Weston

Abstract: With the emergence of very high energy astronomy (VHE; E>100 GeV), new open questions were presented to astronomers studying the multi-wavelength emission from blazars. Answers to these open questions, such as the Doppler crisis, and finding the location of the high-energy activity have eluded us thus far. Recently, quasi-simultaneous multi-wavelength monitoring programs have shown considerable su… ▽ More With the emergence of very high energy astronomy (VHE; E>100 GeV), new open questions were presented to astronomers studying the multi-wavelength emission from blazars. Answers to these open questions, such as the Doppler crisis, and finding the location of the high-energy activity have eluded us thus far. Recently, quasi-simultaneous multi-wavelength monitoring programs have shown considerable success in investigating blazar activity. After the launch of the Fermi Gamma-ray Space Telescope in 2008, such quasi-simultaneous observations across the electromagnetic spectrum became possible. In addition, with very long baseline interferometry (VLBI) observations we can resolve the central parsec region of active galactic nuclei (AGN) and compare morphological changes to the gamma-ray activity to study high-energy emitting blazars. To achieve our goals, we need sensitive, long-term VLBI monitoring of a complete sample of VHE detected AGN. We performed VLBI observations of TeV-detected AGN and high likelihood neutrino associations as of December of 2021 with the Long Baseline Array (LBA) and other southern hemisphere radio telescopes at 2.3 GHz. In this paper we present first light TANAMI S-band images, focusing on the TeV-detected sub-sample of the full TANAMI sample. Apart from these very high energy-detected sources, we also show images of the two flux density calibrators and two additional sources included in the observations. We study the redshift, 0.1-100 GeV photon flux and S-band core brightness temperature distributions of the TeV-detected objects, and find that flat spectrum radio quasars and low synchrotron peaked sources on average show higher brightness temperatures than high-synchrotron-peaked BL Lacs. Sources with bright GeV gamma-ray emission also show higher brightness temperature values than gamma-low sources. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Journal ref: A&A 681, A69 (2024)

arXiv:2304.14740 [pdf, other]

doi 10.1038/s41550-023-01899-w

A Keplerian disk with a four-arm spiral birthing an episodically accreting high-mass protostar

Authors: R. A. Burns, Y. Uno, N. Sakai, J. Blanchard, Z. Rosli, G. Orosz, Y. Yonekura, Y. Tanabe, K. Sugiyama, T. Hirota, Kee-Tae Kim, A. Aberfelds, A. E. Volvach, A. Bartkiewicz, A. Caratti o Garatti, A. M. Sobolev, B. Stecklum, C. Brogan, C. Phillips, D. A. Ladeyschikov, D. Johnstone, G. Surcis, G. C. MacLeod, H. Linz, J. O. Chibueze , et al. (12 additional authors not shown)

Abstract: High-mass protostars (M$_{\star} >$ 8 M$_{\odot}$) are thought to gain the majority of their mass via short, intense bursts of growth. This episodic accretion is thought to be facilitated by gravitationally unstable and subsequently inhomogeneous accretion disks. Limitations of observational capabilities, paired with a lack of observed accretion burst events has withheld affirmative confirmation o… ▽ More High-mass protostars (M$_{\star} >$ 8 M$_{\odot}$) are thought to gain the majority of their mass via short, intense bursts of growth. This episodic accretion is thought to be facilitated by gravitationally unstable and subsequently inhomogeneous accretion disks. Limitations of observational capabilities, paired with a lack of observed accretion burst events has withheld affirmative confirmation of the association between disk accretion, instability and the accretion burst phenomenon in high-mass protostars. Following its 2019 accretion burst, a heat-wave driven by a burst of radiation propagated outward from the high-mass protostar G358.93-0.03-MM1. Six VLBI (very long baseline interferometry) observations of the raditively pumped 6.7 GHz methanol maser were conducted during this period, tracing ever increasing disk radii as the heat-wave propagated outward. Concatenating the VLBI maps provided a sparsely sampled, milliarcsecond view of the spatio-kinematics of the accretion disk covering a physical range of $\sim$ 50 - 900 AU. We term this observational approach `heat-wave map**'. We report the discovery of a Keplerian accretion disk with a spatially resolved four-arm spiral pattern around G358.93-0.03-MM1. This result positively implicates disk accretion and spiral arm instabilities into the episodic accretion high-mass star formation paradigm. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: Published in Nature Astronomy in 2023

arXiv:2304.14739 [pdf, other]

doi 10.1038/s41550-019-0989-3

A heat-wave of accretion energy traced by masers in the G358-MM1 high-mass protostar

Authors: R. A. Burns, K. Sugiyama, T. Hirota, Kee-Tae Kim, A. M. Sobolev, B. Stecklum, G. C. MacLeod, Y. Yonekura, M. Olech, G. Orosz, S. P. Ellingsen, L. Hyland, A. Caratti o Garatti, C. Brogan, T. R. Hunter, C. Phillips, S. P. van den Heever, J. Eislöffel, H. Linz, G. Surcis, J. O. Chibueze, W. Baan, B. Kramer

Abstract: High-mass stars are thought to accumulate much of their mass via short, infrequent bursts of disk-aided accretion. Such accretion events are rare and difficult to observe directly but are known to drive enhanced maser emission. In this Letter we report high-resolution, multi-epoch methanol maser observations toward G358.93-0.03 which reveal an interesting phenomenon; the sub-luminal propagation of… ▽ More High-mass stars are thought to accumulate much of their mass via short, infrequent bursts of disk-aided accretion. Such accretion events are rare and difficult to observe directly but are known to drive enhanced maser emission. In this Letter we report high-resolution, multi-epoch methanol maser observations toward G358.93-0.03 which reveal an interesting phenomenon; the sub-luminal propagation of a thermal radiation "heat-wave" emanating from an accreting high-mass proto-star. The extreme transformation of the maser emission implies a sudden intensification of thermal infrared radiation from within the inner (40 mas, 270 au) region. Subsequently, methanol masers trace the radial passage of thermal radiation through the environment at $\geq$ 4-8\% the speed of light. Such a high translocation rate contrasts with the $\leq$ 10 km s$^{-1}$ physical gas motions of methanol masers typically observed using very long baseline interferometry (VLBI). The observed scenario can readily be attributed to an accretion event in the high-mass proto-star G358.93-0.03-MM1. While being the third case in its class, G358.93-0.03-MM1 exhibits unique attributes hinting at a possible `zoo' of accretion burst types. These results promote the advantages of maser observations in understanding high-mass star formation, both through single-dish maser monitoring campaigns and via their international cooperation as VLBI arrays. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: Published in Nature Astronomy in 2020

arXiv:2303.13386 [pdf, other]

Compositional Zero-Shot Domain Transfer with Text-to-Text Models

Authors: Fangyu Liu, Qianchu Liu, Shruthi Bannur, Fernando Pérez-García, Naoto Usuyama, Sheng Zhang, Tristan Naumann, Aditya Nori, Hoifung Poon, Javier Alvarez-Valle, Ozan Oktay, Stephanie L. Hyland

Abstract: Label scarcity is a bottleneck for improving task performance in specialised domains. We propose a novel compositional transfer learning framework (DoT5 - domain compositional zero-shot T5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from MLM of unlabelled in-domain free text) and task knowledge (from task training on more readily availa… ▽ More Label scarcity is a bottleneck for improving task performance in specialised domains. We propose a novel compositional transfer learning framework (DoT5 - domain compositional zero-shot T5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from MLM of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: we simultaneously train NLG for in-domain label-to-data generation which enables data augmentation for self-finetuning and NLU for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on NLI, text summarisation and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current SOTA in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted at TACL, pre-MIT Press publication version. 16 pages, 4 figures

arXiv:2301.04558 [pdf, other]

Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

Authors: Shruthi Bannur, Stephanie Hyland, Qianchu Liu, Fernando Pérez-García, Maximilian Ilse, Daniel C. Castro, Benedikt Boecking, Harshita Sharma, Kenza Bouzid, Anja Thieme, Anton Schwaighofer, Maria Wetscherek, Matthew P. Lungren, Aditya Nori, Javier Alvarez-Valle, Ozan Oktay

Abstract: Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. Prior work in biomedical VLP has mostly relied on the alignment of single image and report pairs even though clinical notes commonly refer to prior images. This does not only introduce poor alignment between the modalities but also a missed opportunity to exploit rich self-superv… ▽ More Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. Prior work in biomedical VLP has mostly relied on the alignment of single image and report pairs even though clinical notes commonly refer to prior images. This does not only introduce poor alignment between the modalities but also a missed opportunity to exploit rich self-supervision through existing temporal content in the data. In this work, we explicitly account for prior images and reports when available during both training and fine-tuning. Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model. It is designed to be versatile to arising challenges such as pose variations and missing input images across time. The resulting model excels on downstream tasks both in single- and multi-image setups, achieving state-of-the-art performance on (I) progression classification, (II) phrase grounding, and (III) report generation, whilst offering consistent improvements on disease classification and sentence-similarity tasks. We release a novel multi-modal temporal benchmark dataset, MS-CXR-T, to quantify the quality of vision-language representations in terms of temporal semantics. Our experimental results show the advantages of incorporating prior images and reports to make most use of the data. △ Less

Submitted 16 March, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: To appear in CVPR 2023

arXiv:2212.03555 [pdf, other]

doi 10.3847/1538-4357/acdbc5

Inverse MultiView II: Microarcsecond Trigonometric Parallaxes for Southern Hemisphere 6.7~GHz Methanol Masers G232.62+00.99 and G323.74$-$00.26

Authors: Lucas J. Hyland, Mark J. Reid, Gabor Orosz, Simon P. Ellingsen, Stuart D. Weston, Jayendar Kumar, Richard Dodson, Maria J. Rioja, Warren J. Hankey, Patrick M. Yates-Jones, Tim Natusch, Sergei Gulyaev, Karl M. Menten, Andreas Brunthaler

Abstract: We present the first results from the Southern Hemisphere Parallax Interferometric Radio Astrometry Legacy Survey (\spirals): $10μ$as-accurate parallaxes and proper motions for two southern hemisphere 6.7 GHz methanol masers obtained using the inverse MultiView calibration method. Using an array of radio telescopes in Australia and New Zealand, we measured the trigonometric parallax and proper mot… ▽ More We present the first results from the Southern Hemisphere Parallax Interferometric Radio Astrometry Legacy Survey (\spirals): $10μ$as-accurate parallaxes and proper motions for two southern hemisphere 6.7 GHz methanol masers obtained using the inverse MultiView calibration method. Using an array of radio telescopes in Australia and New Zealand, we measured the trigonometric parallax and proper motions for the masers associated with the star formation region G232.62+00.99 of $π= 0.610\pm0.011$~mas, $μ_x=-2.266\pm0.021$~mas~y$^{-1}$ and $μ_y=2.249\pm0.049$~mas~y$^{-1}$, which implies its distance to be $d=1.637\pm0.029$~kpc. These measurements represent an improvement in accuracy by more than a factor of 3 over the previous measurements obtained through Very Long Baseline Array observations of the 12~GHz methanol masers associated with this region. We also measure the trigonometric parallax and proper motion for G323.74--00.26 as $π= 0.364\pm0.009$~mas, $μ_x=-3.239\pm0.025$~mas~y$^{-1}$ and $μ_y=-3.976\pm0.039$~mas~y$^{-1}$, which implies a distance of $d=2.747\pm0.068$~kpc. These are the most accurate measurements of trigonometric parallax obtained for 6.7~GHz class II methanol masers to date. We confirm that G232.62+00.99 is in the Local arm and find that G323.74--00.26 is in the Scutum-Centaurus arm. We also investigate the structure and internal dynamics of both masers. △ Less

Submitted 16 May, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

Comments: 13 pages, 9 figures, 3 tables. Accepted for publication in ApJ

arXiv:2205.13398 [pdf, other]

Looking for Out-of-Distribution Environments in Multi-center Critical Care Data

Authors: Dimitris Spathis, Stephanie L. Hyland

Abstract: Clinical machine learning models show a significant performance drop when tested in settings not seen during training. Domain generalisation models promise to alleviate this problem, however, there is still scepticism about whether they improve over traditional training. In this work, we take a principled approach to identifying Out of Distribution (OoD) environments, motivated by the problem of c… ▽ More Clinical machine learning models show a significant performance drop when tested in settings not seen during training. Domain generalisation models promise to alleviate this problem, however, there is still scepticism about whether they improve over traditional training. In this work, we take a principled approach to identifying Out of Distribution (OoD) environments, motivated by the problem of cross-hospital generalization in critical care. We propose model-based and heuristic approaches to identify OoD environments and systematically compare models with different levels of held-out information. We find that access to OoD data does not translate to increased performance, pointing to inherent limitations in defining potential OoD environments potentially due to data harmonisation and sampling. Echoing similar results with other popular clinical benchmarks in the literature, new approaches are required to evaluate robust models on health records. △ Less

Submitted 11 November, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 17 pages

arXiv:2205.00092 [pdf, other]

doi 10.3847/1538-4357/ac6d5b

Inverse Multview I: Multi-Calibrator inverse phase referencing for Microarcsecond VLBI Astrometry

Authors: Lucas J. Hyland, Mark J. Reid, Simon P. Ellingsen, Maria J. Rioja, Richard Dodson, Gabor Orosz, Colin R. Masson, Jamie M. McCallum

Abstract: Very Long Baseline Interferometry (VLBI) astrometry is a well established technique for achieving $\pm10~μ$as parallax accuracies at frequencies well above 10~GHz. At lower frequencies, uncompensated interferometer delays associated with the ionosphere play the dominant role in limiting the astrometric accuracy. Multiview is a novel VLBI calibration method, which uses observations of multiple quas… ▽ More Very Long Baseline Interferometry (VLBI) astrometry is a well established technique for achieving $\pm10~μ$as parallax accuracies at frequencies well above 10~GHz. At lower frequencies, uncompensated interferometer delays associated with the ionosphere play the dominant role in limiting the astrometric accuracy. Multiview is a novel VLBI calibration method, which uses observations of multiple quasars to accurately model and remove time-variable, directional-dependent changes to the interferometer delay. Here we extend the Multiview technique by phase referencing data to the target source ("inverse Multiview") and test its performance. Multiple observations with a four-antenna VLBI array operating at 8.3~GHz show single-epoch astrometric accuracies near $20~μ$as for target-reference quasar separations up to about 7 degrees. This represents an improvement in astrometric accuracy by up to an order of magnitude compared to standard phase referencing. △ Less

Submitted 13 February, 2023; v1 submitted 29 April, 2022; originally announced May 2022.

Comments: 11 pages, 5 figures

Journal ref: 2022 ApJ 932 52

arXiv:2204.09817 [pdf, other]

doi 10.1007/978-3-031-20059-5_1

Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing

Authors: Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Stephanie Hyland, Maria Wetscherek, Tristan Naumann, Aditya Nori, Javier Alvarez-Valle, Hoifung Poon, Ozan Oktay

Abstract: Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research. Biomedical text with its complex semantics poses additional challenges in vision--language modelling compared to the general domain, and previous work has used insufficiently adapted models that lack domain-speci… ▽ More Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research. Biomedical text with its complex semantics poses additional challenges in vision--language modelling compared to the general domain, and previous work has used insufficiently adapted models that lack domain-specific language understanding. In this paper, we show that principled textual semantic modelling can substantially improve contrastive learning in self-supervised vision--language processing. We release a language model that achieves state-of-the-art results in radiology natural language inference through its improved vocabulary and novel language pretraining objective leveraging semantics and discourse characteristics in radiology reports. Further, we propose a self-supervised joint vision--language approach with a focus on better text modelling. It establishes new state of the art results on a wide range of publicly available benchmarks, in part by leveraging our new domain-specific language model. We release a new dataset with locally-aligned phrase grounding annotations by radiologists to facilitate the study of complex semantic modelling in biomedical vision--language processing. A broad evaluation, including on this new dataset, shows that our contrastive learning approach, aided by textual-semantic modelling, outperforms prior methods in segmentation tasks, despite only using a global-alignment objective. △ Less

Submitted 21 July, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

Comments: To appear in ECCV 2022. Code: https://aka.ms/biovil-code Dataset: https://aka.ms/ms-cxr Demo Notebook: https://aka.ms/biovil-demo-notebook

Journal ref: Computer Vision - ECCV 2022, LNCS vol 13696, pp 1-21

arXiv:2110.09669 [pdf, ps, other]

doi 10.1093/mnras/stab3040

Molecular line search toward the flaring 6.7-GHz methanol masers of G24.33+0.13 and G359.6-0.243: rare maser transitions detected

Authors: Tiege McCarthy, Gabor Orosz, Simon Ellingsen, Shari Breen, Maxim Voronkov, Ross Burns, Mateusz Olech, Yoshinori Yonekura, Tomoya Hirota, Lucas Hyland, Pawel Wolak

Abstract: We have performed a molecular line search toward the flaring 6.7-GHz masers G24.33+0.13 and G359.62-0.24 using the Australia Telescope Compact Array. We present spectra of the 6.7-GHz class~II methanol and 22.2-GHz water masers toward these sources and provide comparison with other recent flaring events these sources have experienced. We also detect the fourth example of a 23.4-GHz class~I methano… ▽ More We have performed a molecular line search toward the flaring 6.7-GHz masers G24.33+0.13 and G359.62-0.24 using the Australia Telescope Compact Array. We present spectra of the 6.7-GHz class~II methanol and 22.2-GHz water masers toward these sources and provide comparison with other recent flaring events these sources have experienced. We also detect the fourth example of a 23.4-GHz class~I methanol maser, and the eleventh example of a 4.8-GHz formaldehyde maser toward G24.33+0.13. Alongside these results, we observe the previously detected ammonia (3,3) emission and report upper limits on the presence of various other cm-wavelength methanol, ammonia and OH transitions. Our results are consistent with the flaring of G24.33+0.13 being driven by a variable accretion rate in the host high-mass young stellar object. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: Accepted into MNRAS 2021 October 17. 10 pages, 4 figures and 3 tables

arXiv:2105.05728 [pdf, other]

Early prediction of respiratory failure in the intensive care unit

Authors: Matthias Hüser, Martin Faltys, Xinrui Lyu, Chris Barber, Stephanie L. Hyland, Tobias M. Merz, Gunnar Rätsch

Abstract: The development of respiratory failure is common among patients in intensive care units (ICU). Large data quantities from ICU patient monitoring systems make timely and comprehensive analysis by clinicians difficult but are ideal for automatic processing by machine learning algorithms. Early prediction of respiratory system failure could alert clinicians to patients at risk of respiratory failure… ▽ More The development of respiratory failure is common among patients in intensive care units (ICU). Large data quantities from ICU patient monitoring systems make timely and comprehensive analysis by clinicians difficult but are ideal for automatic processing by machine learning algorithms. Early prediction of respiratory system failure could alert clinicians to patients at risk of respiratory failure and allow for early patient reassessment and treatment adjustment. We propose an early warning system that predicts moderate/severe respiratory failure up to 8 hours in advance. Our system was trained on HiRID-II, a data-set containing more than 60,000 admissions to a tertiary care ICU. An alarm is typically triggered several hours before the beginning of respiratory failure. Our system outperforms a clinical baseline mimicking traditional clinical decision-making based on pulse-oximetric oxygen saturation and the fraction of inspired oxygen. To provide model introspection and diagnostics, we developed an easy-to-use web browser-based system to explore model input data and predictions visually. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: 14 pages, 5 figures

arXiv:2011.11554

ML4H Abstract Track 2020

Authors: Emily Alsentzer, Matthew B. A. McDermott, Fabian Falck, Suproteem K. Sarkar, Subhrajit Roy, Stephanie L. Hyland

Abstract: A collection of the accepted abstracts for the Machine Learning for Health (ML4H) workshop at NeurIPS 2020. This index is not complete, as some accepted abstracts chose to opt-out of inclusion. A collection of the accepted abstracts for the Machine Learning for Health (ML4H) workshop at NeurIPS 2020. This index is not complete, as some accepted abstracts chose to opt-out of inclusion. △ Less

Submitted 19 November, 2020; originally announced November 2020.

arXiv:1912.02919 [pdf, other]

An Empirical Study on the Intrinsic Privacy of SGD

Authors: Stephanie L. Hyland, Shruti Tople

Abstract: Introducing noise in the training of machine learning systems is a powerful way to protect individual privacy via differential privacy guarantees, but comes at a cost to utility. This work looks at whether the inherent randomness of stochastic gradient descent (SGD) could contribute to privacy, effectively reducing the amount of \emph{additional} noise required to achieve a given privacy guarantee… ▽ More Introducing noise in the training of machine learning systems is a powerful way to protect individual privacy via differential privacy guarantees, but comes at a cost to utility. This work looks at whether the inherent randomness of stochastic gradient descent (SGD) could contribute to privacy, effectively reducing the amount of \emph{additional} noise required to achieve a given privacy guarantee. We conduct a large-scale empirical study to examine this question. Training a grid of over 120,000 models across four datasets (tabular and images) on convex and non-convex objectives, we demonstrate that the random seed has a larger impact on model weights than any individual training example. We test the distribution over weights induced by the seed, finding that the simple convex case can be modelled with a multivariate Gaussian posterior, while neural networks exhibit multi-modal and non-Gaussian weight distributions. By casting convex SGD as a Gaussian mechanism, we then estimate an `intrinsic' data-dependent $ε_i(\mathcal{D})$, finding values as low as 6.3, drop** to 1.9 using empirical estimates. We use a membership inference attack to estimate $ε$ for non-convex SGD and demonstrate that hiding the random seed from the adversary results in a statistically significant reduction in attack performance, corresponding to a reduction in the effective $ε$. These results provide empirical evidence that SGD exhibits appreciable variability relative to its dataset sensitivity, and this `intrinsic noise' has the potential to be leveraged to improve the utility of privacy-preserving machine learning. △ Less

Submitted 28 February, 2022; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: 21 pages, 11 figures, 8 tables

arXiv:1904.12973 [pdf]

Unsupervised Extraction of Phenotypes from Cancer Clinical Notes for Association Studies

Authors: Stefan G. Stark, Stephanie L. Hyland, Melanie F. Pradier, Kjong Lehmann, Andreas Wicki, Fernando Perez Cruz, Julia E. Vogt, Gunnar Rätsch

Abstract: The recent adoption of Electronic Health Records (EHRs) by health care providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts. These datasets, in combination with machine learning and statistical approaches, generate new opportunities for research and clinical care. However, many methods require the patien… ▽ More The recent adoption of Electronic Health Records (EHRs) by health care providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts. These datasets, in combination with machine learning and statistical approaches, generate new opportunities for research and clinical care. However, many methods require the patient representations to be in structured formats, while the information in the EHR is often locked in unstructured texts designed for human readability. In this work, we develop the methodology to automatically extract clinical features from clinical narratives from large EHR corpora without the need for prior knowledge. We consider medical terms and sentences appearing in clinical narratives as atomic information units. We propose an efficient clustering strategy suitable for the analysis of large text corpora and to utilize the clusters to represent information about the patient compactly. To demonstrate the utility of our approach, we perform an association study of clinical features with somatic mutation profiles from 4,007 cancer patients and their tumors. We apply the proposed algorithm to a dataset consisting of about 65 thousand documents with a total of about 3.2 million sentences. We identify 341 significant statistical associations between the presence of somatic mutations and clinical features. We annotated these associations according to their novelty, and report several known associations. We also propose 32 testable hypotheses where the underlying biological mechanism does not appear to be known but plausible. These results illustrate that the automated discovery of clinical features is possible and the joint analysis of clinical and genetic datasets can generate appealing new hypotheses. △ Less

Submitted 3 May, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

arXiv:1904.07990 [pdf]

Machine learning for early prediction of circulatory failure in the intensive care unit

Authors: Stephanie L. Hyland, Martin Faltys, Matthias Hüser, Xinrui Lyu, Thomas Gumbsch, Cristóbal Esteban, Christian Bock, Max Horn, Michael Moor, Bastian Rieck, Marc Zimmermann, Dean Bodenham, Karsten Borgwardt, Gunnar Rätsch, Tobias M. Merz

Abstract: Intensive care clinicians are presented with large quantities of patient information and measurements from a multitude of monitoring systems. The limited ability of humans to process such complex information hinders physicians to readily recognize and act on early signs of patient deterioration. We used machine learning to develop an early warning system for circulatory failure based on a high-res… ▽ More Intensive care clinicians are presented with large quantities of patient information and measurements from a multitude of monitoring systems. The limited ability of humans to process such complex information hinders physicians to readily recognize and act on early signs of patient deterioration. We used machine learning to develop an early warning system for circulatory failure based on a high-resolution ICU database with 240 patient years of data. This automatic system predicts 90.0% of circulatory failure events (prevalence 3.1%), with 81.8% identified more than two hours in advance, resulting in an area under the receiver operating characteristic curve of 94.0% and area under the precision-recall curve of 63.0%. The model was externally validated in a large independent patient cohort. △ Less

Submitted 19 April, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

Comments: 5 main figures, 1 main table, 13 supplementary figures, 5 supplementary tables; 250ppi images

arXiv:1812.00490 [pdf, other]

Improving Clinical Predictions through Unsupervised Time Series Representation Learning

Authors: Xinrui Lyu, Matthias Hueser, Stephanie L. Hyland, George Zerveas, Gunnar Raetsch

Abstract: In this work, we investigate unsupervised representation learning on medical time series, which bears the promise of leveraging copious amounts of existing unlabeled data in order to eventually assist clinical decision making. By evaluating on the prediction of clinically relevant outcomes, we show that in a practical setting, unsupervised representation learning can offer clear performance benefi… ▽ More In this work, we investigate unsupervised representation learning on medical time series, which bears the promise of leveraging copious amounts of existing unlabeled data in order to eventually assist clinical decision making. By evaluating on the prediction of clinically relevant outcomes, we show that in a practical setting, unsupervised representation learning can offer clear performance benefits over end-to-end supervised architectures. We experiment with using sequence-to-sequence (Seq2Seq) models in two different ways, as an autoencoder and as a forecaster, and show that the best performance is achieved by a forecasting Seq2Seq model with an integrated attention mechanism, proposed here for the first time in the setting of unsupervised learning for medical time series. △ Less

Submitted 2 December, 2018; originally announced December 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/171

arXiv:1707.03963 [pdf, other]

doi 10.1093/mnras/stx1776

MALT-45: A 7mm survey of the southern Galaxy - II. ATCA follow-up observations of 44GHz class I methanol masers

Authors: Christopher H. Jordan, Andrew J. Walsh, Shari L. Breen, Simon P. Ellingsen, Maxim A. Voronkov, Lucas J. Hyland

Abstract: We detail interferometric observations of 44GHz class I methanol masers detected by MALT-45 (a 7mm unbiased auto-correlated spectral-line Galactic-plane survey) using the Australia Telescope Compact Array. We detect 238 maser spots across 77 maser sites. Using high-resolution positions, we compare the class I CH$_3$OH masers to other star formation maser species, including CS (1-0), SiO $v=0$ and… ▽ More We detail interferometric observations of 44GHz class I methanol masers detected by MALT-45 (a 7mm unbiased auto-correlated spectral-line Galactic-plane survey) using the Australia Telescope Compact Array. We detect 238 maser spots across 77 maser sites. Using high-resolution positions, we compare the class I CH$_3$OH masers to other star formation maser species, including CS (1-0), SiO $v=0$ and the H53$α$ radio-recombination line. Comparison between the cross- and auto-correlated data has allowed us to also identify quasi-thermal emission in the 44GHz class I methanol maser line. We find that the majority of class I methanol masers have small spatial and velocity ranges ($<$0.5pc and $<$5 km s$^{-1}$), and closely trace the systemic velocities of associated clouds. Using 870$μ$m dust continuum emission from the ATLASGAL survey, we determine clump masses associated with class I masers, and find they are generally associated with clumps between 1000 and 3000 $M_\odot$. For each class I methanol maser site, we use the presence of OH masers and radio recombination lines to identify relatively evolved regions of high-mass star formation; we find that maser sites without these associations have lower luminosities and preferentially appear toward dark infrared regions. △ Less

Submitted 12 July, 2017; originally announced July 2017.

Comments: Accepted by MNRAS on 12 July 2017

arXiv:1706.02633 [pdf, other]

Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs

Authors: Cristóbal Esteban, Stephanie L. Hyland, Gunnar Rätsch

Abstract: Generative Adversarial Networks (GANs) have shown remarkable success as a framework for training models to produce realistic-looking data. In this work, we propose a Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN) to produce realistic real-valued multi-dimensional time series, with an emphasis on their application to medical data. RGANs make use of recurrent neural networks in the gener… ▽ More Generative Adversarial Networks (GANs) have shown remarkable success as a framework for training models to produce realistic-looking data. In this work, we propose a Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN) to produce realistic real-valued multi-dimensional time series, with an emphasis on their application to medical data. RGANs make use of recurrent neural networks in the generator and the discriminator. In the case of RCGANs, both of these RNNs are conditioned on auxiliary information. We demonstrate our models in a set of toy datasets, where we show visually and quantitatively (using sample likelihood and maximum mean discrepancy) that they can successfully generate realistic time-series. We also describe novel evaluation methods for GANs, where we generate a synthetic labelled training dataset, and evaluate on a real test set the performance of a model trained on the synthetic data, and vice-versa. We illustrate with these metrics that RCGANs can generate time-series data useful for supervised training, with only minor degradation in performance on real test data. This is demonstrated on digit classification from 'serialised' MNIST and by training an early warning system on a medical dataset of 17,000 patients from an intensive care unit. We further discuss and analyse the privacy concerns that may arise when using RCGANs to generate realistic synthetic medical time series data. △ Less

Submitted 3 December, 2017; v1 submitted 8 June, 2017; originally announced June 2017.

Comments: 13 pages, 4 figures, 3 tables (update with differential privacy)

arXiv:1612.00467 [pdf, ps, other]

Neural Document Embeddings for Intensive Care Patient Mortality Prediction

Authors: Paulina Grnarova, Florian Schmidt, Stephanie L. Hyland, Carsten Eickhoff

Abstract: We present an automatic mortality prediction scheme based on the unstructured textual content of clinical notes. Proposing a convolutional document embedding approach, our empirical investigation using the MIMIC-III intensive care database shows significant performance gains compared to previously employed methods such as latent topic distributions or generic doc2vec embeddings. These improvements… ▽ More We present an automatic mortality prediction scheme based on the unstructured textual content of clinical notes. Proposing a convolutional document embedding approach, our empirical investigation using the MIMIC-III intensive care database shows significant performance gains compared to previously employed methods such as latent topic distributions or generic doc2vec embeddings. These improvements are especially pronounced for the difficult problem of post-discharge mortality prediction. △ Less

Submitted 1 December, 2016; originally announced December 2016.

arXiv:1607.04903 [pdf, other]

Learning Unitary Operators with Help From u(n)

Authors: Stephanie L. Hyland, Gunnar Rätsch

Abstract: A major challenge in the training of recurrent neural networks is the so-called vanishing or exploding gradient problem. The use of a norm-preserving transition operator can address this issue, but parametrization is challenging. In this work we focus on unitary operators and describe a parametrization using the Lie algebra $\mathfrak{u}(n)$ associated with the Lie group $U(n)$ of $n \times n$ uni… ▽ More A major challenge in the training of recurrent neural networks is the so-called vanishing or exploding gradient problem. The use of a norm-preserving transition operator can address this issue, but parametrization is challenging. In this work we focus on unitary operators and describe a parametrization using the Lie algebra $\mathfrak{u}(n)$ associated with the Lie group $U(n)$ of $n \times n$ unitary matrices. The exponential map provides a correspondence between these spaces, and allows us to define a unitary matrix using $n^2$ real coefficients relative to a basis of the Lie algebra. The parametrization is closed under additive updates of these coefficients, and thus provides a simple space in which to do gradient descent. We demonstrate the effectiveness of this parametrization on the problem of learning arbitrary unitary operators, comparing to several baselines and outperforming a recently-proposed lower-dimensional parametrization. We additionally use our parametrization to generalize a recently-proposed unitary recurrent neural network to arbitrary unitary matrices, using it to solve standard long-memory tasks. △ Less

Submitted 10 January, 2017; v1 submitted 17 July, 2016; originally announced July 2016.

Comments: 9 pages, 3 figures, 5 figures inc. subfigures, to appear at AAAI-17

arXiv:1602.03551 [pdf, other]

Knowledge Transfer with Medical Language Embeddings

Authors: Stephanie L. Hyland, Theofanis Karaletsos, Gunnar Rätsch

Abstract: Identifying relationships between concepts is a key aspect of scientific knowledge synthesis. Finding these links often requires a researcher to laboriously search through scien- tific papers and databases, as the size of these resources grows ever larger. In this paper we describe how distributional semantics can be used to unify structured knowledge graphs with unstructured text to predict new r… ▽ More Identifying relationships between concepts is a key aspect of scientific knowledge synthesis. Finding these links often requires a researcher to laboriously search through scien- tific papers and databases, as the size of these resources grows ever larger. In this paper we describe how distributional semantics can be used to unify structured knowledge graphs with unstructured text to predict new relationships between medical concepts, using a probabilistic generative model. Our approach is also designed to ameliorate data sparsity and scarcity issues in the medical domain, which make language modelling more challenging. Specifically, we integrate the medical relational database (SemMedDB) with text from electronic health records (EHRs) to perform knowledge graph completion. We further demonstrate the ability of our model to predict relationships between tokens not appearing in the relational database. △ Less

Submitted 10 February, 2016; originally announced February 2016.

Comments: 6 pages, 2 figures, to appear at SDM-DMMH 2016

arXiv:1510.00259 [pdf, other]

A Generative Model of Words and Relationships from Multiple Sources

Authors: Stephanie L. Hyland, Theofanis Karaletsos, Gunnar Rätsch

Abstract: Neural language models are a powerful tool to embed words into semantic vector spaces. However, learning such models generally relies on the availability of abundant and diverse training examples. In highly specialised domains this requirement may not be met due to difficulties in obtaining a large corpus, or the limited range of expression in average use. Such domains may encode prior knowledge a… ▽ More Neural language models are a powerful tool to embed words into semantic vector spaces. However, learning such models generally relies on the availability of abundant and diverse training examples. In highly specialised domains this requirement may not be met due to difficulties in obtaining a large corpus, or the limited range of expression in average use. Such domains may encode prior knowledge about entities in a knowledge base or ontology. We propose a generative model which integrates evidence from diverse data sources, enabling the sharing of semantic information. We achieve this by generalising the concept of co-occurrence from distributional semantics to include other relationships between entities or words, which we model as affine transformations on the embedding space. We demonstrate the effectiveness of this approach by outperforming recent models on a link prediction task and demonstrating its ability to profit from partially or fully unobserved data training labels. We further demonstrate the usefulness of learning from different data sources with overlap** vocabularies. △ Less

Submitted 3 December, 2015; v1 submitted 1 October, 2015; originally announced October 2015.

Comments: 8 pages, 5 figures; incorporated feedback from reviewers; to appear in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence 2016

Showing 1–29 of 29 results for author: Hyland, L