-
Hidden Flaws Behind Expert-Level Accuracy of GPT-4 Vision in Medicine
Authors:
Qiao **,
Fangyuan Chen,
Yiliang Zhou,
Ziyang Xu,
Justin M. Cheung,
Robert Chen,
Ronald M. Summers,
Justin F. Rousseau,
Peiyun Ni,
Marc J Landsman,
Sally L. Baxter,
Subhi J. Al'Aref,
Yijia Li,
Alex Chen,
Josef A. Brejt,
Michael F. Chiang,
Yifan Peng,
Zhiyong Lu
Abstract:
Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by…
▽ More
Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows.
△ Less
Submitted 22 April, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Posterior Contraction Rates for Matérn Gaussian Processes on Riemannian Manifolds
Authors:
Paul Rosa,
Viacheslav Borovitskiy,
Alexander Terenin,
Judith Rousseau
Abstract:
Gaussian processes are used in many machine learning applications that rely on uncertainty quantification. Recently, computational tools for working with these models in geometric settings, such as when inputs lie on a Riemannian manifold, have been developed. This raises the question: can these intrinsic models be shown theoretically to lead to better performance, compared to simply embedding all…
▽ More
Gaussian processes are used in many machine learning applications that rely on uncertainty quantification. Recently, computational tools for working with these models in geometric settings, such as when inputs lie on a Riemannian manifold, have been developed. This raises the question: can these intrinsic models be shown theoretically to lead to better performance, compared to simply embedding all relevant quantities into $\mathbb{R}^d$ and using the restriction of an ordinary Euclidean Gaussian process? To study this, we prove optimal contraction rates for intrinsic Matérn Gaussian processes defined on compact Riemannian manifolds. We also prove analogous rates for extrinsic processes using trace and extension theorems between manifold and ambient Sobolev spaces: somewhat surprisingly, the rates obtained turn out to coincide with those of the intrinsic processes, provided that their smoothness parameters are matched appropriately. We illustrate these rates empirically on a number of examples, which, mirroring prior work, show that intrinsic processes can achieve better performance in practice. Therefore, our work shows that finer-grained analyses are needed to distinguish between different levels of data-efficiency of geometric Gaussian processes, particularly in settings which involve small data set sizes and non-asymptotic behavior.
△ Less
Submitted 29 October, 2023; v1 submitted 19 September, 2023;
originally announced September 2023.
-
Pre-Training with Diffusion models for Dental Radiography segmentation
Authors:
Jérémy Rousseau,
Christian Alaka,
Emma Covili,
Hippolyte Mayard,
Laura Misrachi,
Willy Au
Abstract:
Medical radiography segmentation, and specifically dental radiography, is highly limited by the cost of labeling which requires specific expertise and labor-intensive annotations. In this work, we propose a straightforward pre-training method for semantic segmentation leveraging Denoising Diffusion Probabilistic Models (DDPM), which have shown impressive results for generative modeling. Our straig…
▽ More
Medical radiography segmentation, and specifically dental radiography, is highly limited by the cost of labeling which requires specific expertise and labor-intensive annotations. In this work, we propose a straightforward pre-training method for semantic segmentation leveraging Denoising Diffusion Probabilistic Models (DDPM), which have shown impressive results for generative modeling. Our straightforward approach achieves remarkable performance in terms of label efficiency and does not require architectural modifications between pre-training and downstream tasks. We propose to first pre-train a Unet by exploiting the DDPM training objective, and then fine-tune the resulting model on a segmentation task. Our experimental results on the segmentation of dental radiographs demonstrate that the proposed method is competitive with state-of-the-art pre-training methods.
△ Less
Submitted 27 July, 2023; v1 submitted 26 July, 2023;
originally announced July 2023.
-
Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses
Authors:
Liyan Tang,
Yifan Peng,
Yanshan Wang,
Ying Ding,
Greg Durrett,
Justin F. Rousseau
Abstract:
A human decision-maker benefits the most from an AI assistant that corrects for their biases. For problems such as generating interpretation of a radiology report given findings, a system predicting only highly likely outcomes may be less useful, where such outcomes are already obvious to the user. To alleviate biases in human decision-making, it is worth considering a broad differential diagnosis…
▽ More
A human decision-maker benefits the most from an AI assistant that corrects for their biases. For problems such as generating interpretation of a radiology report given findings, a system predicting only highly likely outcomes may be less useful, where such outcomes are already obvious to the user. To alleviate biases in human decision-making, it is worth considering a broad differential diagnosis, going beyond the most likely options. We introduce a new task, "less likely brainstorming," that asks a model to generate outputs that humans think are relevant but less likely to happen. We explore the task in two settings: a brain MRI interpretation generation setting and an everyday commonsense reasoning setting. We found that a baseline approach of training with less likely hypotheses as targets generates outputs that humans evaluate as either likely or irrelevant nearly half of the time; standard MLE training is not effective. To tackle this problem, we propose a controlled text generation method that uses a novel contrastive learning strategy to encourage models to differentiate between generating likely and less likely outputs according to humans. We compare our method with several state-of-the-art controlled text generation models via automatic and human evaluations and show that our models' capability of generating less likely outputs is improved.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Analyzing Impact of Socio-Economic Factors on COVID-19 Mortality Prediction Using SHAP Value
Authors:
Redoan Rahman,
Jooyeong Kang,
Justin F Rousseau,
Ying Ding
Abstract:
This paper applies multiple machine learning (ML) algorithms to a dataset of de-identified COVID-19 patients provided by the COVID-19 Research Database. The dataset consists of 20,878 COVID-positive patients, among which 9,177 patients died in the year 2020. This paper aims to understand and interpret the association of socio-economic characteristics of patients with their mortality instead of max…
▽ More
This paper applies multiple machine learning (ML) algorithms to a dataset of de-identified COVID-19 patients provided by the COVID-19 Research Database. The dataset consists of 20,878 COVID-positive patients, among which 9,177 patients died in the year 2020. This paper aims to understand and interpret the association of socio-economic characteristics of patients with their mortality instead of maximizing prediction accuracy. According to our analysis, a patients households annual and disposable income, age, education, and employment status significantly impacts a machine learning models prediction. We also observe several individual patient data, which gives us insight into how the feature values impact the prediction for that data point. This paper analyzes the global and local interpretation of machine learning models on socio-economic data of COVID patients.
△ Less
Submitted 27 February, 2023;
originally announced March 2023.
-
Using Explainable AI to Cross-Validate Socio-economic Disparities Among Covid-19 Patient Mortality
Authors:
Li Shi,
Redoan Rahman,
Esther Melamed,
Jacek Gwizdka,
Justin F. Rousseau,
Ying Ding
Abstract:
This paper applies eXplainable Artificial Intelligence (XAI) methods to investigate the socioeconomic disparities in COVID patient mortality. An Extreme Gradient Boosting (XGBoost) prediction model is built based on a de-identified Austin area hospital dataset to predict the mortality of COVID-19 patients. We apply two XAI methods, Shapley Additive exPlanations (SHAP) and Locally Interpretable Mod…
▽ More
This paper applies eXplainable Artificial Intelligence (XAI) methods to investigate the socioeconomic disparities in COVID patient mortality. An Extreme Gradient Boosting (XGBoost) prediction model is built based on a de-identified Austin area hospital dataset to predict the mortality of COVID-19 patients. We apply two XAI methods, Shapley Additive exPlanations (SHAP) and Locally Interpretable Model Agnostic Explanations (LIME), to compare the global and local interpretation of feature importance. This paper demonstrates the advantages of using XAI which shows the feature importance and decisive capability. Furthermore, we use the XAI methods to cross-validate their interpretations for individual patients. The XAI models reveal that Medicare financial class, older age, and gender have high impact on the mortality prediction. We find that LIME local interpretation does not show significant differences in feature importance comparing to SHAP, which suggests pattern confirmation. This paper demonstrates the importance of XAI methods in cross-validation of feature attributions.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
Attend Who is Weak: Pruning-assisted Medical Image Localization under Sophisticated and Implicit Imbalances
Authors:
Ajay Jaiswal,
Tianlong Chen,
Justin F. Rousseau,
Yifan Peng,
Ying Ding,
Zhangyang Wang
Abstract:
Deep neural networks (DNNs) have rapidly become a \textit{de facto} choice for medical image understanding tasks. However, DNNs are notoriously fragile to the class imbalance in image classification. We further point out that such imbalance fragility can be amplified when it comes to more sophisticated tasks such as pathology localization, as imbalances in such problems can have highly complex and…
▽ More
Deep neural networks (DNNs) have rapidly become a \textit{de facto} choice for medical image understanding tasks. However, DNNs are notoriously fragile to the class imbalance in image classification. We further point out that such imbalance fragility can be amplified when it comes to more sophisticated tasks such as pathology localization, as imbalances in such problems can have highly complex and often implicit forms of presence. For example, different pathology can have different sizes or colors (w.r.t.the background), different underlying demographic distributions, and in general different difficulty levels to recognize, even in a meticulously curated balanced distribution of training data. In this paper, we propose to use pruning to automatically and adaptively identify \textit{hard-to-learn} (HTL) training samples, and improve pathology localization by attending them explicitly, during training in \textit{supervised, semi-supervised, and weakly-supervised} settings. Our main inspiration is drawn from the recent finding that deep classification models have difficult-to-memorize samples and those may be effectively exposed through network pruning \cite{hooker2019compressed} - and we extend such observation beyond classification for the first time. We also present an interesting demographic analysis which illustrates HTLs ability to capture complex demographic imbalances. Our extensive experiments on the Skin Lesion Localization task in multiple training settings by paying additional attention to HTLs show significant improvement of localization performance by $\sim$2-3\%.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
RoS-KD: A Robust Stochastic Knowledge Distillation Approach for Noisy Medical Imaging
Authors:
Ajay Jaiswal,
Kumar Ashutosh,
Justin F Rousseau,
Yifan Peng,
Zhangyang Wang,
Ying Ding
Abstract:
AI-powered Medical Imaging has recently achieved enormous attention due to its ability to provide fast-paced healthcare diagnoses. However, it usually suffers from a lack of high-quality datasets due to high annotation cost, inter-observer variability, human annotator error, and errors in computer-generated labels. Deep learning models trained on noisy labelled datasets are sensitive to the noise…
▽ More
AI-powered Medical Imaging has recently achieved enormous attention due to its ability to provide fast-paced healthcare diagnoses. However, it usually suffers from a lack of high-quality datasets due to high annotation cost, inter-observer variability, human annotator error, and errors in computer-generated labels. Deep learning models trained on noisy labelled datasets are sensitive to the noise type and lead to less generalization on the unseen samples. To address this challenge, we propose a Robust Stochastic Knowledge Distillation (RoS-KD) framework which mimics the notion of learning a topic from multiple sources to ensure deterrence in learning noisy information. More specifically, RoS-KD learns a smooth, well-informed, and robust student manifold by distilling knowledge from multiple teachers trained on overlap** subsets of training data. Our extensive experiments on popular medical imaging classification tasks (cardiopulmonary disease and lesion classification) using real-world datasets, show the performance benefit of RoS-KD, its ability to distill knowledge from many popular large networks (ResNet-50, DenseNet-121, MobileNet-V2) in a comparatively small network, and its robustness to adversarial attacks (PGD, FSGM). More specifically, RoS-KD achieves >2% and >4% improvement on F1-score for lesion classification and cardiopulmonary disease classification tasks, respectively, when the underlying student is ResNet-18 against recent competitive knowledge distillation baseline. Additionally, on cardiopulmonary disease classification task, RoS-KD outperforms most of the SOTA baselines by ~1% gain in AUC score.
△ Less
Submitted 2 December, 2022; v1 submitted 15 October, 2022;
originally announced October 2022.
-
Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again
Authors:
Ajay Jaiswal,
Peihao Wang,
Tianlong Chen,
Justin F. Rousseau,
Ying Ding,
Zhangyang Wang
Abstract:
Despite the enormous success of Graph Convolutional Networks (GCNs) in modeling graph-structured data, most of the current GCNs are shallow due to the notoriously challenging problems of over-smoothening and information squashing along with conventional difficulty caused by vanishing gradients and over-fitting. Previous works have been primarily focused on the study of over-smoothening and over-sq…
▽ More
Despite the enormous success of Graph Convolutional Networks (GCNs) in modeling graph-structured data, most of the current GCNs are shallow due to the notoriously challenging problems of over-smoothening and information squashing along with conventional difficulty caused by vanishing gradients and over-fitting. Previous works have been primarily focused on the study of over-smoothening and over-squashing phenomena in training deep GCNs. Surprisingly, in comparison with CNNs/RNNs, very limited attention has been given to understanding how healthy gradient flow can benefit the trainability of deep GCNs. In this paper, firstly, we provide a new perspective of gradient flow to understand the substandard performance of deep GCNs and hypothesize that by facilitating healthy gradient flow, we can significantly improve their trainability, as well as achieve state-of-the-art (SOTA) level performance from vanilla-GCNs. Next, we argue that blindly adopting the Glorot initialization for GCNs is not optimal, and derive a topology-aware isometric initialization scheme for vanilla-GCNs based on the principles of isometry. Additionally, contrary to ad-hoc addition of skip-connections, we propose to use gradient-guided dynamic rewiring of vanilla-GCNs} with skip connections. Our dynamic rewiring method uses the gradient flow within each layer during training to introduce on-demand skip-connections adaptively. We provide extensive empirical evidence across multiple datasets that our methods improve gradient flow in deep vanilla-GCNs and significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods. Codes are available at: https://github.com/VITA-Group/GradientGCN.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors
Authors:
Liyan Tang,
Tanya Goyal,
Alexander R. Fabbri,
Philippe Laban,
Jiacheng Xu,
Semih Yavuz,
Wojciech Kryściński,
Justin F. Rousseau,
Greg Durrett
Abstract:
The propensity of abstractive summarization models to make factual errors has been studied extensively, including design of metrics to detect factual errors and annotation of errors in current systems' outputs. However, the ever-evolving nature of summarization systems, metrics, and annotated benchmarks makes factuality evaluation a moving target, and drawing clear comparisons among metrics has be…
▽ More
The propensity of abstractive summarization models to make factual errors has been studied extensively, including design of metrics to detect factual errors and annotation of errors in current systems' outputs. However, the ever-evolving nature of summarization systems, metrics, and annotated benchmarks makes factuality evaluation a moving target, and drawing clear comparisons among metrics has become increasingly difficult. In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models. Critically, our analysis shows that much of the recent improvement in the factuality detection space has been on summaries from older (pre-Transformer) models instead of more relevant recent summarization models. We further perform a finer-grained analysis per error-type and find similar performance variance across error types for different factuality metrics. Our results show that no one metric is superior in all settings or for all error types, and we provide recommendations for best practices given these insights.
△ Less
Submitted 25 May, 2023; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement
Authors:
Cian Naik,
Judith Rousseau,
Trevor Campbell
Abstract:
Bayesian coresets approximate a posterior distribution by building a small weighted subset of the data points. Any inference procedure that is too computationally expensive to be run on the full posterior can instead be run inexpensively on the coreset, with results that approximate those on the full data. However, current approaches are limited by either a significant run-time or the need for the…
▽ More
Bayesian coresets approximate a posterior distribution by building a small weighted subset of the data points. Any inference procedure that is too computationally expensive to be run on the full posterior can instead be run inexpensively on the coreset, with results that approximate those on the full data. However, current approaches are limited by either a significant run-time or the need for the user to specify a low-cost approximation to the full posterior. We propose a Bayesian coreset construction algorithm that first selects a uniformly random subset of data, and then optimizes the weights using a novel quasi-Newton method. Our algorithm is a simple to implement, black-box method, that does not require the user to specify a low-cost posterior approximation. It is the first to come with a general high-probability bound on the KL divergence of the output coreset posterior. Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required.
△ Less
Submitted 15 January, 2023; v1 submitted 17 March, 2022;
originally announced March 2022.
-
RadBERT-CL: Factually-Aware Contrastive Learning For Radiology Report Classification
Authors:
Ajay Jaiswal,
Liyan Tang,
Meheli Ghosh,
Justin Rousseau,
Yifan Peng,
Ying Ding
Abstract:
Radiology reports are unstructured and contain the imaging findings and corresponding diagnoses transcribed by radiologists which include clinical facts and negated and/or uncertain statements. Extracting pathologic findings and diagnoses from radiology reports is important for quality control, population health, and monitoring of disease progress. Existing works, primarily rely either on rule-bas…
▽ More
Radiology reports are unstructured and contain the imaging findings and corresponding diagnoses transcribed by radiologists which include clinical facts and negated and/or uncertain statements. Extracting pathologic findings and diagnoses from radiology reports is important for quality control, population health, and monitoring of disease progress. Existing works, primarily rely either on rule-based systems or transformer-based pre-trained model fine-tuning, but could not take the factual and uncertain information into consideration, and therefore generate false-positive outputs. In this work, we introduce three sedulous augmentation techniques which retain factual and critical information while generating augmentations for contrastive learning. We introduce RadBERT-CL, which fuses these information into BlueBert via a self-supervised contrastive loss. Our experiments on MIMIC-CXR show superior performance of RadBERT-CL on fine-tuning for multi-class, multi-label report classification. We illustrate that when few labeled data are available, RadBERT-CL outperforms conventional SOTA transformers (BERT/BlueBert) by significantly larger margins (6-11%). We also show that the representations learned by RadBERT-CL can capture critical medical information in the latent space.
△ Less
Submitted 19 November, 2021; v1 submitted 28 October, 2021;
originally announced October 2021.
-
SCALP -- Supervised Contrastive Learning for Cardiopulmonary Disease Classification and Localization in Chest X-rays using Patient Metadata
Authors:
Ajay Jaiswal,
Tianhao Li,
Cyprian Zander,
Yan Han,
Justin F. Rousseau,
Yifan Peng,
Ying Ding
Abstract:
Computer-aided diagnosis plays a salient role in more accessible and accurate cardiopulmonary diseases classification and localization on chest radiography. Millions of people get affected and die due to these diseases without an accurate and timely diagnosis. Recently proposed contrastive learning heavily relies on data augmentation, especially positive data augmentation. However, generating clin…
▽ More
Computer-aided diagnosis plays a salient role in more accessible and accurate cardiopulmonary diseases classification and localization on chest radiography. Millions of people get affected and die due to these diseases without an accurate and timely diagnosis. Recently proposed contrastive learning heavily relies on data augmentation, especially positive data augmentation. However, generating clinically-accurate data augmentations for medical images is extremely difficult because the common data augmentation methods in computer vision, such as sharp, blur, and crop operations, can severely alter the clinical settings of medical images. In this paper, we proposed a novel and simple data augmentation method based on patient metadata and supervised knowledge to create clinically accurate positive and negative augmentations for chest X-rays. We introduce an end-to-end framework, SCALP, which extends the self-supervised contrastive approach to a supervised setting. Specifically, SCALP pulls together chest X-rays from the same patient (positive keys) and pushes apart chest X-rays from different patients (negative keys). In addition, it uses ResNet-50 along with the triplet-attention mechanism to identify cardiopulmonary diseases, and Grad-CAM++ to highlight the abnormal regions. Our extensive experiments demonstrate that SCALP outperforms existing baselines with significant margins in both classification and localization tasks. Specifically, the average classification AUCs improve from 82.8% (SOTA using DenseNet-121) to 83.9% (SCALP using ResNet-50), while the localization results improve on average by 3.7% over different IoU thresholds.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Stable ResNet
Authors:
Soufiane Hayou,
Eugenio Clerico,
Bobby He,
George Deligiannidis,
Arnaud Doucet,
Judith Rousseau
Abstract:
Deep ResNet architectures have achieved state of the art performance on many tasks. While they solve the problem of gradient vanishing, they might suffer from gradient exploding as the depth becomes large (Yang et al. 2017). Moreover, recent results have shown that ResNet might lose expressivity as the depth goes to infinity (Yang et al. 2017, Hayou et al. 2019). To resolve these issues, we introd…
▽ More
Deep ResNet architectures have achieved state of the art performance on many tasks. While they solve the problem of gradient vanishing, they might suffer from gradient exploding as the depth becomes large (Yang et al. 2017). Moreover, recent results have shown that ResNet might lose expressivity as the depth goes to infinity (Yang et al. 2017, Hayou et al. 2019). To resolve these issues, we introduce a new class of ResNet architectures, called Stable ResNet, that have the property of stabilizing the gradient while ensuring expressivity in the infinite depth limit.
△ Less
Submitted 18 March, 2021; v1 submitted 24 October, 2020;
originally announced October 2020.
-
Information access representations and social capital in networks
Authors:
Ashkan Bashardoust,
Hannah C. Beilinson,
Sorelle A. Friedler,
Jiajie Ma,
Jade Rousseau,
Carlos E. Scheidegger,
Blair D. Sullivan,
Nasanbayar Ulzii-Orshikh,
Suresh Venkatasubramanian
Abstract:
Social network position confers power and social capital. In the setting of online social networks that have massive reach, creating mathematical representations of social capital is an important step towards understanding how network position can differentially confer advantage to different groups and how network position can itself be a source of advantage. In this paper, we use well established…
▽ More
Social network position confers power and social capital. In the setting of online social networks that have massive reach, creating mathematical representations of social capital is an important step towards understanding how network position can differentially confer advantage to different groups and how network position can itself be a source of advantage. In this paper, we use well established models for information flow on networks as a base to propose a formal descriptor of the network position of a node as represented by its information access. Combining these descriptors allows a full representation of social capital across the network. Using real-world networks, we demonstrate that this representation allows the identification of differences between groups based on network specific measures of inequality of access.
△ Less
Submitted 16 October, 2023; v1 submitted 23 October, 2020;
originally announced October 2020.
-
COVI White Paper
Authors:
Hannah Alsdurf,
Edmond Belliveau,
Yoshua Bengio,
Tristan Deleu,
Prateek Gupta,
Daphne Ippolito,
Richard Janda,
Max Jarvie,
Tyler Kolody,
Sekoul Krastev,
Tegan Maharaj,
Robert Obryk,
Dan Pilat,
Valerie Pisano,
Benjamin Prud'homme,
Meng Qu,
Nasim Rahaman,
Irina Rish,
Jean-Francois Rousseau,
Abhinav Sharma,
Brooke Struck,
Jian Tang,
Martin Weiss,
Yun William Yu
Abstract:
The SARS-CoV-2 (Covid-19) pandemic has caused significant strain on public health institutions around the world. Contact tracing is an essential tool to change the course of the Covid-19 pandemic. Manual contact tracing of Covid-19 cases has significant challenges that limit the ability of public health authorities to minimize community infections. Personalized peer-to-peer contact tracing through…
▽ More
The SARS-CoV-2 (Covid-19) pandemic has caused significant strain on public health institutions around the world. Contact tracing is an essential tool to change the course of the Covid-19 pandemic. Manual contact tracing of Covid-19 cases has significant challenges that limit the ability of public health authorities to minimize community infections. Personalized peer-to-peer contact tracing through the use of mobile apps has the potential to shift the paradigm. Some countries have deployed centralized tracking systems, but more privacy-protecting decentralized systems offer much of the same benefit without concentrating data in the hands of a state authority or for-profit corporations. Machine learning methods can circumvent some of the limitations of standard digital tracing by incorporating many clues and their uncertainty into a more graded and precise estimation of infection risk. The estimated risk can provide early risk awareness, personalized recommendations and relevant information to the user. Finally, non-identifying risk data can inform epidemiological models trained jointly with the machine learning predictor. These models can provide statistical evidence for the importance of factors involved in disease transmission. They can also be used to monitor, evaluate and optimize health policy and (de)confinement scenarios according to medical and economic productivity indicators. However, such a strategy based on mobile apps and machine learning should proactively mitigate potential ethical and privacy risks, which could have substantial impacts on society (not only impacts on health but also impacts such as stigmatization and abuse of personal data). Here, we present an overview of the rationale, design, ethical considerations and privacy strategy of `COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.
△ Less
Submitted 27 July, 2020; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Building a PubMed knowledge graph
Authors:
Jian Xu,
Sunkyu Kim,
Min Song,
Minbyul Jeong,
Donghyeon Kim,
Jaewoo Kang,
Justin F. Rousseau,
Xin Li,
Weijia Xu,
Vetle I. Torvik,
Yi Bu,
Chongyan Chen,
Islam Akef Ebeid,
Daifeng Li,
Ying Ding
Abstract:
PubMed is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguated, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of…
▽ More
PubMed is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguated, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID, and identifying fine-grained affiliation data from MapAffil. Through the integration of the credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving a F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities. The PKG is freely available on Figshare (https://figshare.com/s/6327a55355fc2c99f3a2, simplified version that exclude PubMed raw data) and TACC website (http://er.tacc.utexas.edu/datasets/ped, full version).
△ Less
Submitted 15 May, 2020; v1 submitted 8 May, 2020;
originally announced May 2020.
-
Rényi entropy and pattern matching for run-length encoded sequences
Authors:
Jerome Rousseau
Abstract:
In this note, we studied the asymptotic behaviour of the length of the longest common substring for run-length encoded sequences. When the original sequences are generated by an $α$-mixing process with exponential decay (or $ψ$-mixing with polynomial decay), we proved that this length grows logarithmically with a coefficient depending on the Rényi entropy of the pushforward measure. For Bernoulli…
▽ More
In this note, we studied the asymptotic behaviour of the length of the longest common substring for run-length encoded sequences. When the original sequences are generated by an $α$-mixing process with exponential decay (or $ψ$-mixing with polynomial decay), we proved that this length grows logarithmically with a coefficient depending on the Rényi entropy of the pushforward measure. For Bernoulli processes and Markov chains, this coefficient is computed explicitly.
△ Less
Submitted 11 December, 2020; v1 submitted 11 March, 2020;
originally announced March 2020.
-
Shortest distance between multiple orbits and generalized fractal dimensions
Authors:
Vanessa Barros,
Jerome Rousseau
Abstract:
We consider rapidly mixing dynamical systems and link the decay of the shortest distance between multiple orbits with the generalized fractal dimension. We apply this result to multidimensional expanding maps and extend it to the realm of random dynamical systems. For random sequences, we obtain a relation between the longest common substring between multiple sequences and the generalized Rényi en…
▽ More
We consider rapidly mixing dynamical systems and link the decay of the shortest distance between multiple orbits with the generalized fractal dimension. We apply this result to multidimensional expanding maps and extend it to the realm of random dynamical systems. For random sequences, we obtain a relation between the longest common substring between multiple sequences and the generalized Rényi entropy. Applications to Markov chains, Gibbs states and the stochastic scrabble are given.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Sparse Networks with Core-Periphery Structure
Authors:
Cian Naik,
François Caron,
Judith Rousseau
Abstract:
We propose a statistical model for graphs with a core-periphery structure. To do this we define a precise notion of what it means for a graph to have this structure, based on the sparsity properties of the subgraphs of core and periphery nodes. We present a class of sparse graphs with such properties, and provide methods to simulate from this class, and to perform posterior inference. We demonstra…
▽ More
We propose a statistical model for graphs with a core-periphery structure. To do this we define a precise notion of what it means for a graph to have this structure, based on the sparsity properties of the subgraphs of core and periphery nodes. We present a class of sparse graphs with such properties, and provide methods to simulate from this class, and to perform posterior inference. We demonstrate that our model can detect core-periphery structure in simulated and real-world networks.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit
Authors:
Soufiane Hayou,
Arnaud Doucet,
Judith Rousseau
Abstract:
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent in parameter space is related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019) built on this result by establishing that the output of a neural network trained using gradient descent can be approximated by a linear model when the network…
▽ More
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent in parameter space is related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019) built on this result by establishing that the output of a neural network trained using gradient descent can be approximated by a linear model when the network width is large. Indeed, under regularity conditions, the NTK converges to a time-independent kernel in the infinite-width limit. This regime is often called the NTK regime. In parallel, recent works on signal propagation (Poole et al., 2016; Schoenholz et al., 2017; Hayou et al., 2019a) studied the impact of the initialization and the activation function on signal propagation in deep neural networks. In this paper, we connect these two theories by quantifying the impact of the initialization and the activation function on the NTK when the network depth becomes large. In particular, we provide a comprehensive analysis of the convergence rates of the NTK regime to the infinite depth regime.
△ Less
Submitted 25 May, 2022; v1 submitted 31 May, 2019;
originally announced May 2019.
-
Matching strings in encoded sequences
Authors:
Adriana Coutinho,
Rodrigo Lambert,
Jérôme Rousseau
Abstract:
We investigate the longest common substring problem for encoded sequences and its asymptotic behaviour. The main result is a strong law of large numbers for a re-scaled version of this quantity, which presents an explicit relation with the Rényi entropy of the source. We apply this result to the zero-inflated contamination model and the stochastic scrabble. In the case of dynamical systems, this p…
▽ More
We investigate the longest common substring problem for encoded sequences and its asymptotic behaviour. The main result is a strong law of large numbers for a re-scaled version of this quantity, which presents an explicit relation with the Rényi entropy of the source. We apply this result to the zero-inflated contamination model and the stochastic scrabble. In the case of dynamical systems, this problem is equivalent to the shortest distance between two observed orbits and its limiting relationship with the correlation dimension of the pushforward measure. An extension to the shortest distance between orbits for random dynamical systems is also provided.
△ Less
Submitted 10 December, 2019; v1 submitted 22 March, 2019;
originally announced March 2019.
-
On the Impact of the Activation Function on Deep Neural Networks Training
Authors:
Soufiane Hayou,
Arnaud Doucet,
Judith Rousseau
Abstract:
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is…
▽ More
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Samuel et al (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `Edge of Chaos' can lead to good performance. While the work by Samuel et al (2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve the performance.
△ Less
Submitted 26 May, 2019; v1 submitted 18 February, 2019;
originally announced February 2019.
-
On the Selection of Initialization and Activation Function for Deep Neural Networks
Authors:
Soufiane Hayou,
Arnaud Doucet,
Judith Rousseau
Abstract:
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is…
▽ More
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Schoenholz et al. (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `edge of chaos' can lead to good performance. We complete this analysis by providing quantitative results showing that, for a class of ReLU-like activation functions, the information propagates indeed deeper for an initialization at the edge of chaos. By further extending this analysis, we identify a class of activation functions that improve the information propagation over ReLU-like functions. This class includes the Swish activation, $φ_{swish}(x) = x \cdot \text{sigmoid}(x)$, used in Hendrycks & Gimpel (2016), Elfwing et al. (2017) and Ramachandran et al. (2017). This provides a theoretical grounding for the excellent empirical performance of $φ_{swish}$ observed in these contributions. We complement those previous results by illustrating the benefit of using a random initialization on the edge of chaos in this context.
△ Less
Submitted 7 October, 2018; v1 submitted 21 May, 2018;
originally announced May 2018.