-
GestaltMML: Enhancing Rare Genetic Disease Diagnosis through Multimodal Machine Learning Combining Facial Images and Clinical Texts
Authors:
Da Wu,
**gye Yang,
Cong Liu,
Tzung-Chien Hsieh,
Elaine Marchi,
Justin Blair,
Peter Krawitz,
Chunhua Weng,
Wendy Chung,
Gholson J. Lyon,
Ian D. Krantz,
Jennifer M. Kalish,
Kai Wang
Abstract:
Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of time. Addressing this "diagnostic odyssey" thus has substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artifi…
▽ More
Individuals with suspected rare genetic disorders often undergo multiple clinical evaluations, imaging studies, laboratory tests and genetic tests, to find a possible answer over a prolonged period of time. Addressing this "diagnostic odyssey" thus has substantial clinical, psychosocial, and economic benefits. Many rare genetic diseases have distinctive facial features, which can be used by artificial intelligence algorithms to facilitate clinical diagnosis, in prioritizing candidate diseases to be further examined by lab tests or genetic assays, or in hel** the phenotype-driven reinterpretation of genome/exome sequencing data. Existing methods using frontal facial photos were built on conventional Convolutional Neural Networks (CNNs), rely exclusively on facial images, and cannot capture non-facial phenotypic traits and demographic information essential for guiding accurate diagnoses. Here we introduce GestaltMML, a multimodal machine learning (MML) approach solely based on the Transformer architecture. It integrates facial images, demographic information (age, sex, ethnicity), and clinical notes (optionally, a list of Human Phenotype Ontology terms) to improve prediction accuracy. Furthermore, we also evaluated GestaltMML on a diverse range of datasets, including 528 diseases from the GestaltMatcher Database, several in-house datasets of Beckwith-Wiedemann syndrome (BWS, over-growth syndrome with distinct facial features), Sotos syndrome (overgrowth syndrome with overlap** features with BWS), NAA10-related neurodevelopmental syndrome, Cornelia de Lange syndrome (multiple malformation syndrome), and KBG syndrome (multiple malformation syndrome). Our results suggest that GestaltMML effectively incorporates multiple modalities of data, greatly narrowing candidate genetic diagnoses of rare diseases and may facilitate the reinterpretation of genome/exome sequencing data.
△ Less
Submitted 21 April, 2024; v1 submitted 23 December, 2023;
originally announced December 2023.
-
Deep Calibration of Market Simulations using Neural Density Estimators and Embedding Networks
Authors:
Namid R. Stillman,
Rory Baggott,
Justin Lyon,
Jianfei Zhang,
Dingqiu Zhu,
Tao Chen,
Perukrishnen Vytelingum
Abstract:
The ability to construct a realistic simulator of financial exchanges, including reproducing the dynamics of the limit order book, can give insight into many counterfactual scenarios, such as a flash crash, a margin call, or changes in macroeconomic outlook. In recent years, agent-based models have been developed that reproduce many features of an exchange, as summarised by a set of stylised facts…
▽ More
The ability to construct a realistic simulator of financial exchanges, including reproducing the dynamics of the limit order book, can give insight into many counterfactual scenarios, such as a flash crash, a margin call, or changes in macroeconomic outlook. In recent years, agent-based models have been developed that reproduce many features of an exchange, as summarised by a set of stylised facts and statistics. However, the ability to calibrate simulators to a specific period of trading remains an open challenge. In this work, we develop a novel approach to the calibration of market simulators by leveraging recent advances in deep learning, specifically using neural density estimators and embedding networks. We demonstrate that our approach is able to correctly identify high probability parameter sets, both when applied to synthetic and historical data, and without reliance on manually selected or weighted ensembles of stylised facts.
△ Less
Submitted 27 November, 2023; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Paving the Way towards 800 Gbps Quantum-Secured Optical Channel Deployment in Mission-Critical Environments
Authors:
Marco Pistoia,
Omar Amer,
Monik R. Behera,
Joseph A. Dolphin,
James F. Dynes,
Benny John,
Paul A. Haigh,
Yasushi Kawakura,
David H. Kramer,
Jeffrey Lyon,
Navid Moazzami,
Tulasi D. Movva,
Antigoni Polychroniadou,
Suresh Shetty,
Greg Sysak,
Farzam Toudeh-Fallah,
Sudhir Upadhyay,
Robert I. Woodward,
Andrew J. Shields
Abstract:
This article describes experimental research studies conducted towards understanding the implementation aspects of high-capacity quantum-secured optical channels in mission-critical metro-scale operational environments using Quantum Key Distribution (QKD) technology. To the best of our knowledge, this is the first time that an 800 Gbps quantum-secured optical channel -- along with several other De…
▽ More
This article describes experimental research studies conducted towards understanding the implementation aspects of high-capacity quantum-secured optical channels in mission-critical metro-scale operational environments using Quantum Key Distribution (QKD) technology. To the best of our knowledge, this is the first time that an 800 Gbps quantum-secured optical channel -- along with several other Dense Wavelength Division Multiplexed (DWDM) channels on the C-band and multiplexed with the QKD channel on the O-band -- was established at distances up to 100 km, with secret key-rates relevant for practical industry use cases. In addition, during the course of these trials, transporting a blockchain application over this established channel was utilized as a demonstration of securing a financial transaction in transit over a quantum-secured optical channel. The findings of this research pave the way towards the deployment of QKD-secured optical channels in high-capacity, metro-scale, mission-critical operational environments, such as Inter-Data Center Interconnects.
△ Less
Submitted 2 March, 2023; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Now Playing: Continuous low-power music recognition
Authors:
Blaise Agüera y Arcas,
Beat Gfeller,
Ruiqi Guo,
Kevin Kilgour,
Sanjiv Kumar,
James Lyon,
Julian Odell,
Marvin Ritter,
Dominik Roblek,
Matthew Sharifi,
Mihajlo Velimirović
Abstract:
Existing music recognition applications require a connection to a server that performs the actual recognition. In this paper we present a low-power music recognizer that runs entirely on a mobile device and automatically recognizes music without user interaction. To reduce battery consumption, a small music detector runs continuously on the mobile device's DSP chip and wakes up the main applicatio…
▽ More
Existing music recognition applications require a connection to a server that performs the actual recognition. In this paper we present a low-power music recognizer that runs entirely on a mobile device and automatically recognizes music without user interaction. To reduce battery consumption, a small music detector runs continuously on the mobile device's DSP chip and wakes up the main application processor only when it is confident that music is present. Once woken, the recognizer on the application processor is provided with a few seconds of audio which is fingerprinted and compared to the stored fingerprints in the on-device fingerprint database of tens of thousands of songs. Our presented system, Now Playing, has a daily battery usage of less than 1% on average, respects user privacy by running entirely on-device and can passively recognize a wide range of music.
△ Less
Submitted 29 November, 2017;
originally announced November 2017.
-
Hellinger Distance Trees for Imbalanced Streams
Authors:
R. J. Lyon,
J. M. Brooke,
J. D. Knowles,
B. W. Stappers
Abstract:
Classifiers trained on data sets possessing an imbalanced class distribution are known to exhibit poor generalisation performance. This is known as the imbalanced learning problem. The problem becomes particularly acute when we consider incremental classifiers operating on imbalanced data streams, especially when the learning objective is rare class identification. As accuracy may provide a mislea…
▽ More
Classifiers trained on data sets possessing an imbalanced class distribution are known to exhibit poor generalisation performance. This is known as the imbalanced learning problem. The problem becomes particularly acute when we consider incremental classifiers operating on imbalanced data streams, especially when the learning objective is rare class identification. As accuracy may provide a misleading impression of performance on imbalanced data, existing stream classifiers based on accuracy can suffer poor minority class performance on imbalanced streams, with the result being low minority class recall rates. In this paper we address this deficiency by proposing the use of the Hellinger distance measure, as a very fast decision tree split criterion. We demonstrate that by using Hellinger a statistically significant improvement in recall rates on imbalanced data streams can be achieved, with an acceptable increase in the false positive rate.
△ Less
Submitted 9 May, 2014;
originally announced May 2014.
-
A Study on Classification in Imbalanced and Partially-Labelled Data Streams
Authors:
R. J. Lyon,
J. M. Brooke,
J. D. Knowles,
B. W. Stappers
Abstract:
The domain of radio astronomy is currently facing significant computational challenges, foremost amongst which are those posed by the development of the world's largest radio telescope, the Square Kilometre Array (SKA). Preliminary specifications for this instrument suggest that the final design will incorporate between 2000 and 3000 individual 15 metre receiving dishes, which together can be expe…
▽ More
The domain of radio astronomy is currently facing significant computational challenges, foremost amongst which are those posed by the development of the world's largest radio telescope, the Square Kilometre Array (SKA). Preliminary specifications for this instrument suggest that the final design will incorporate between 2000 and 3000 individual 15 metre receiving dishes, which together can be expected to produce a data rate of many TB/s. Given such a high data rate, it becomes crucial to consider how this information will be processed and stored to maximise its scientific utility. In this paper, we consider one possible data processing scenario for the SKA, for the purposes of an all-sky pulsar survey. In particular we treat the selection of promising signals from the SKA processing pipeline as a data stream classification problem. We consider the feasibility of classifying signals that arrive via an unlabelled and heavily class imbalanced data stream, using currently available algorithms and frameworks. Our results indicate that existing stream learners exhibit unacceptably low recall on real astronomical data when used in standard configuration; however, good false positive performance and comparable accuracy to static learners, suggests they have definite potential as an on-line solution to this particular big data challenge.
△ Less
Submitted 30 July, 2013;
originally announced July 2013.