Search | arXiv e-print repository

arXiv:2404.12478 [pdf]

A New Reliable & Parsimonious Learning Strategy Comprising Two Layers of Gaussian Processes, to Address Inhomogeneous Empirical Correlation Structures

Authors: Gargi Roy, Dalia Chakrabarty

Abstract: We present a new strategy for learning the functional relation between a pair of variables, while addressing inhomogeneities in the correlation structure of the available data, by modelling the sought function as a sample function of a non-stationary Gaussian Process (GP), that nests within itself multiple other GPs, each of which we prove can be stationary, thereby establishing sufficiency of two… ▽ More We present a new strategy for learning the functional relation between a pair of variables, while addressing inhomogeneities in the correlation structure of the available data, by modelling the sought function as a sample function of a non-stationary Gaussian Process (GP), that nests within itself multiple other GPs, each of which we prove can be stationary, thereby establishing sufficiency of two GP layers. In fact, a non-stationary kernel is envisaged, with each hyperparameter set as dependent on the sample function drawn from the outer non-stationary GP, such that a new sample function is drawn at every pair of input values at which the kernel is computed. However, such a model cannot be implemented, and we substitute this by recalling that the average effect of drawing different sample functions from a given GP is equivalent to that of drawing a sample function from each of a set of GPs that are rendered different, as updated during the equilibrium stage of the undertaken inference (via MCMC). The kernel is fully non-parametric, and it suffices to learn one hyperparameter per layer of GP, for each dimension of the input variable. We illustrate this new learning strategy on a real dataset. △ Less

Submitted 18 April, 2024; originally announced April 2024.

MSC Class: Probability theory and stochastic processes :60-XX; Stochastic Processes : 60Gxx; Gaussian Processes : 60G15; Generalised stochastic processes: 60G20

arXiv:2403.10885 [pdf, other]

Could We Generate Cytology Images from Histopathology Images? An Empirical Study

Authors: Soumyajyoti Dey, Sukanta Chakraborty, Utso Guha Roy, Nibaran Das

Abstract: Automation in medical imaging is quite challenging due to the unavailability of annotated datasets and the scarcity of domain experts. In recent years, deep learning techniques have solved some complex medical imaging tasks like disease classification, important object localization, segmentation, etc. However, most of the task requires a large amount of annotated data for their successful implemen… ▽ More Automation in medical imaging is quite challenging due to the unavailability of annotated datasets and the scarcity of domain experts. In recent years, deep learning techniques have solved some complex medical imaging tasks like disease classification, important object localization, segmentation, etc. However, most of the task requires a large amount of annotated data for their successful implementation. To mitigate the shortage of data, different generative models are proposed for data augmentation purposes which can boost the classification performances. For this, different synthetic medical image data generation models are developed to increase the dataset. Unpaired image-to-image translation models here shift the source domain to the target domain. In the breast malignancy identification domain, FNAC is one of the low-cost low-invasive modalities normally used by medical practitioners. But availability of public datasets in this domain is very poor. Whereas, for automation of cytology images, we need a large amount of annotated data. Therefore synthetic cytology images are generated by translating breast histopathology samples which are publicly available. In this study, we have explored traditional image-to-image transfer models like CycleGAN, and Neural Style Transfer. Further, it is observed that the generated cytology images are quite similar to real breast cytology samples by measuring FID and KID scores. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: Accept at International Conference on Advanced Computing and Applications(ICACA-2024)

arXiv:2403.10884 [pdf, other]

Fuzzy Rank-based Late Fusion Technique for Cytology image Segmentation

Authors: Soumyajyoti Dey, Sukanta Chakraborty, Utso Guha Roy, Nibaran Das

Abstract: Cytology image segmentation is quite challenging due to its complex cellular structure and multiple overlap** regions. On the other hand, for supervised machine learning techniques, we need a large amount of annotated data, which is costly. In recent years, late fusion techniques have given some promising performances in the field of image classification. In this paper, we have explored a fuzzy-… ▽ More Cytology image segmentation is quite challenging due to its complex cellular structure and multiple overlap** regions. On the other hand, for supervised machine learning techniques, we need a large amount of annotated data, which is costly. In recent years, late fusion techniques have given some promising performances in the field of image classification. In this paper, we have explored a fuzzy-based late fusion techniques for cytology image segmentation. This fusion rule integrates three traditional semantic segmentation models UNet, SegNet, and PSPNet. The technique is applied on two cytology image datasets, i.e., cervical cytology(HErlev) and breast cytology(JUCYT-v1) image datasets. We have achieved maximum MeanIoU score 84.27% and 83.79% on the HErlev dataset and JUCYT-v1 dataset after the proposed late fusion technique, respectively which are better than that of the traditional fusion rules such as average probability, geometric mean, Borda Count, etc. The codes of the proposed model are available on GitHub. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: Accept at International Conference on Data, Electronics and Computing (ICDEC-2023)

arXiv:2403.08737 [pdf, other]

ILCiteR: Evidence-grounded Interpretable Local Citation Recommendation

Authors: Sayar Ghosh Roy, Jiawei Han

Abstract: Existing Machine Learning approaches for local citation recommendation directly map or translate a query, which is typically a claim or an entity mention, to citation-worthy research papers. Within such a formulation, it is challenging to pinpoint why one should cite a specific research paper for a particular query, leading to limited recommendation interpretability. To alleviate this, we introduc… ▽ More Existing Machine Learning approaches for local citation recommendation directly map or translate a query, which is typically a claim or an entity mention, to citation-worthy research papers. Within such a formulation, it is challenging to pinpoint why one should cite a specific research paper for a particular query, leading to limited recommendation interpretability. To alleviate this, we introduce the evidence-grounded local citation recommendation task, where the target latent space comprises evidence spans for recommending specific papers. Using a distantly-supervised evidence retrieval and multi-step re-ranking framework, our proposed system, ILCiteR, recommends papers to cite for a query grounded on similar evidence spans extracted from the existing research literature. Unlike past formulations that simply output recommendations, ILCiteR retrieves ranked lists of evidence span and recommended paper pairs. Secondly, previously proposed neural models for citation recommendation require expensive training on massive labeled data, ideally after every significant update to the pool of candidate papers. In contrast, ILCiteR relies solely on distant supervision from a dynamic evidence database and pre-trained Transformer-based Language Models without any model training. We contribute a novel dataset for the evidence-grounded local citation recommendation task and demonstrate the efficacy of our proposed conditional neural rank-ensembling approach for re-ranking evidence spans. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: LREC-COLING 2024

arXiv:2401.12032 [pdf, other]

MINT: A wrapper to make multi-modal and multi-image AI models interactive

Authors: Jan Freyberg, Abhijit Guha Roy, Terry Spitz, Beverly Freeman, Mike Schaekermann, Patricia Strachan, Eva Schnider, Renee Wong, Dale R Webster, Alan Karthikesalingam, Yun Liu, Krishnamurthy Dvijotham, Umesh Telang

Abstract: During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this paper we tackle a more subtle challenge: doctors take a targeted medical history to obtain only the most pertinent pieces of information; how do we enable AI to do the same? We develop a wrapper method… ▽ More During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this paper we tackle a more subtle challenge: doctors take a targeted medical history to obtain only the most pertinent pieces of information; how do we enable AI to do the same? We develop a wrapper method named MINT (Make your model INTeractive) that automatically determines what pieces of information are most valuable at each step, and ask for only the most useful information. We demonstrate the efficacy of MINT wrap** a skin disease prediction model, where multiple images and a set of optional answers to $25$ standard metadata questions (i.e., structured medical history) are used by a multi-modal deep network to provide a differential diagnosis. We show that MINT can identify whether metadata inputs are needed and if so, which question to ask next. We also demonstrate that when collecting multiple images, MINT can identify if an additional image would be beneficial, and if so, which type of image to capture. We showed that MINT reduces the number of metadata and image inputs needed by 82% and 36.2% respectively, while maintaining predictive performance. Using real-world AI dermatology system data, we show that needing fewer inputs can retain users that may otherwise fail to complete the system submission and drop off without a diagnosis. Qualitative examples show MINT can closely mimic the step-by-step decision making process of a clinical workflow and how this is different for straight forward cases versus more difficult, ambiguous cases. Finally we demonstrate how MINT is robust to different underlying multi-model classifiers and can be easily adapted to user requirements without significant model re-training. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 15 pages, 7 figures

arXiv:2401.10653 [pdf, other]

Attentive Fusion: A Transformer-based Approach to Multimodal Hate Speech Detection

Authors: Atanu Mandal, Gargi Roy, Amit Barman, Indranil Dutta, Sudip Kumar Naskar

Abstract: With the recent surge and exponential growth of social media usage, scrutinizing social media content for the presence of any hateful content is of utmost importance. Researchers have been diligently working since the past decade on distinguishing between content that promotes hatred and content that does not. Traditionally, the main focus has been on analyzing textual content. However, recent res… ▽ More With the recent surge and exponential growth of social media usage, scrutinizing social media content for the presence of any hateful content is of utmost importance. Researchers have been diligently working since the past decade on distinguishing between content that promotes hatred and content that does not. Traditionally, the main focus has been on analyzing textual content. However, recent research attempts have also commenced into the identification of audio-based content. Nevertheless, studies have shown that relying solely on audio or text-based content may be ineffective, as recent upsurge indicates that individuals often employ sarcasm in their speech and writing. To overcome these challenges, we present an approach to identify whether a speech promotes hate or not utilizing both audio and textual representations. Our methodology is based on the Transformer framework that incorporates both audio and text sampling, accompanied by our very own layer called "Attentive Fusion". The results of our study surpassed previous state-of-the-art techniques, achieving an impressive macro F1 score of 0.927 on the Test Set. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: Accepted in 20th International Conference on Natural Language Processing (ICON)

arXiv:2312.02061 [pdf, other]

Analysis of Neutron Star $f-$mode Oscillations in General Relativity with Spectral Representation of Nuclear Equations of State

Authors: Debanjan Guha Roy, Tuhin Malik, Swastik Bhattacharya, Sarmistha Banik

Abstract: We study quasinormal $f-$mode oscillations in neutron star(NS) interiors within the linearized General Relativistic formalism. We utilize approximately 9000 nuclear Equations of State (EOS) using spectral representation techniques, incorporating constraints on nuclear saturation properties, chiral Effective Field Theory ($χ$EFT) for pure neutron matter, and perturbative Quantum Chromodynamics (pQC… ▽ More We study quasinormal $f-$mode oscillations in neutron star(NS) interiors within the linearized General Relativistic formalism. We utilize approximately 9000 nuclear Equations of State (EOS) using spectral representation techniques, incorporating constraints on nuclear saturation properties, chiral Effective Field Theory ($χ$EFT) for pure neutron matter, and perturbative Quantum Chromodynamics (pQCD) for densities pertinent to NS cores. The median values of f-mode frequency, $ν_f$ (dam** time, $τ_f$) for NS with masses ranging from 1.4 - 2.0 $M_\odot$ lie between 1.80 - 2.20 kHz (0.13 - 0.22 s) for our entire EOS set. Our study reveals a weak correlation between $f-$mode frequencies and individual nuclear saturation properties, prompting the necessity for more intricate methodologies to unveil multi-parameter relationships. We observe a robust linear relationship between the radii and $f-$mode frequencies for different NS masses. Leveraging this correlation alongside NICER observations of PSR J0740+6620 and PSR J0030+0451, we establish constraints that exhibit partial and minimal overlap for observational data from Riley et al. and Miller et al. respectively with our nucleonic EOS dataset. Moreover, NICER data aligns closely with radius and frequency values for a few hadron-quark hybrid EOS models. This indicates the need to consider additional exotic particles such as deconfined quarks at suprasaturation densities. We conclude that future observations of the radius or $f-$mode frequency for more than one NS mass, particularly at the extremes of viable NS mass scale, would either rule out nucleon-only EOS or provide definitive evidence in its favour. △ Less

Submitted 27 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: 21 pages, 10 figures, 4 tables, Accepted in Astrophysical Journal

arXiv:2309.13465 [pdf]

doi 10.1103/PhysRevB.109.224418

Origin of spin-driven ferroelectricity and effect of external pressure on the complex magnetism of 6H-perovskite Ba3HoRu2O9

Authors: E. Kushwaha, G. Roy, M. Kumar, A. M. dos Santos, S. Ghosh, D. T. Adroja, V. Caignaert, O. Perez, A. Pautrat, T. Basu

Abstract: The compound Ba3HoRu2O9 magnetically orders at 50 K (TN1) followed by another complex magnetic ordering at 10.2 K (TN2). The 2nd magnetic phase transition was characterized by the co-existence of two competing magnetic ground states associated with two different magnetic wave vectors (K1=1/2 0 0 and K2=1/4 1/4 0). Here, we have discussed the origin of spin-driven ferroelectricity, which is not kno… ▽ More The compound Ba3HoRu2O9 magnetically orders at 50 K (TN1) followed by another complex magnetic ordering at 10.2 K (TN2). The 2nd magnetic phase transition was characterized by the co-existence of two competing magnetic ground states associated with two different magnetic wave vectors (K1=1/2 0 0 and K2=1/4 1/4 0). Here, we have discussed the origin of spin-driven ferroelectricity, which is not known yet. We demonstrate through time-of-flight Neutron diffraction and theoretical calculation that the non-collinear structure involving two different magnetic ions, Ru(4d) and Ho(4f), break the spatial inversion symmetry via inverse Dzyaloshinskii-Moriya (D-M) interaction through strong 4d-4f magnetic correlation, which shifts the oxygen atoms and results in non-zero polarization. Such an observation of inverse D-M interaction from two different magnetic ions which caused ferroelectricity is rarely observed. We have systematically studied the spin and dipolar dynamics, which exhibit intriguing behavior with shorter coherence lengths of 2nd magnetic phase associated with the k2-wave vector. The results manifest the development of finite-size magnetoelectric domains instead of true long-range ordering which justifies the experimentally obtained low value of ferroelectric polarization. The synchrotron XRD analysis predicts a non-centrosymmetric space group P-62c. Furthermore, we have investigated the effect of external pressure on this complex magnetism. The result reveals an enhancement of ordering temperature by the application of external pressure (1.6 K/GPa). The external pressure might favor stabilizing the magnetic ground state associated with 2nd magnetic phase. Our study shows an unconventional mechanism of spin-driven ferroelectricity. △ Less

Submitted 31 May, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: Accepted in PRB

Journal ref: Phys. Rev. B 109, 224418 (2024)

arXiv:2309.08284 [pdf]

doi 10.1007/s00502-023-01144-2

Towards an Interoperability Roadmap for the Energy Transition

Authors: Valerie Reif, Thomas I. Strasser, Joseba Jimeno, Marjolaine Farre, Oliver Genest, Amélie Gyrard, Mark McGranaghan, Gianluca Lipari, Johann Schütz, Mathias Uslar, Sebastian Vogel, Arsim Bytyqi, Rita Dornmair, Andreas Corusa, Gaurav Roy, Ferdinanda Ponci, Alberto Dognini, Antonello Monti

Abstract: Smart grid interoperability is the means to achieve the twin green and digital transition but re-mains heterogeneous and fragmented to date. This work presents the first ideas and corner-stones of an Interoperability Roadmap for the Energy Transition that is being developed by the Horizon Europe int:net project. This roadmap builds on four cornerstones that address open interoperability issues. Th… ▽ More Smart grid interoperability is the means to achieve the twin green and digital transition but re-mains heterogeneous and fragmented to date. This work presents the first ideas and corner-stones of an Interoperability Roadmap for the Energy Transition that is being developed by the Horizon Europe int:net project. This roadmap builds on four cornerstones that address open interoperability issues. These are a knowledge base to address the lack of convergence among existing initiatives, a maturity model and a network of testing and certification facilities to ad-dress the lack of practical tools for the industry, and a governance process to address the gap between standards-related approaches of Standards Development Organisations and Research and Innovation projects. A community of practice will be set up to ensure the continuity of the ongoing activities related to smart grid interoperability. To outlive the duration of the int:net project, the aim is to formalise the community of practice as a legal entity. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: 12. (Hybrid) Symposium Communications for Energy Systems (ComForEn 2023)

arXiv:2308.11318 [pdf, other]

How to identify and characterize strongly correlated topological semimetals

Authors: Diana M. Kirschbaum, Monika Lužnik, Gwenvredig Le Roy, Silke Paschen

Abstract: How strong correlations and topology interplay is a topic of great current interest. In this perspective paper, we focus on correlation-driven gapless phases. We take the time-reversal symmetric Weyl semimetal as an example because it is expected to have clear (albeit nonquantized) topological signatures in the Hall response and because the first strongly correlated representative, the noncentrosy… ▽ More How strong correlations and topology interplay is a topic of great current interest. In this perspective paper, we focus on correlation-driven gapless phases. We take the time-reversal symmetric Weyl semimetal as an example because it is expected to have clear (albeit nonquantized) topological signatures in the Hall response and because the first strongly correlated representative, the noncentrosymmetric Weyl-Kondo semimetal Ce$_3$Bi$_4$Pd$_3$, has recently been discovered. We summarize its key characteristics and use them to construct a prototype Weyl-Kondo semimetal temperature-magnetic field phase diagram. This allows for a substantiated assessment of other Weyl-Kondo semimetal candidate materials. We also put forward scaling plots of the intrinsic Berry-curvature-induced Hall response vs the inverse Weyl velocity -- a measure of correlation strength, and vs the inverse charge carrier concentration -- a measure of the proximity of Weyl nodes to the Fermi level. They suggest that the topological Hall response is maximized by strong correlations and small carrier concentrations. We hope that our work will guide the search for new Weyl-Kondo semimetals and correlated topological semimetals in general, and also trigger new theoretical work. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 22 pages, 5 figures, 2 tables

arXiv:2307.09302 [pdf, other]

Conformal prediction under ambiguous ground truth

Authors: David Stutz, Abhijit Guha Roy, Tatiana Matejovicova, Patricia Strachan, Ali Taylan Cemgil, Arnaud Doucet

Abstract: Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y \in C(X))\geq 1-α$ for a user-chosen $α\in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$. It is typically implicitly assumed that $\mathbb{P}^{Y|X}$ is the "true" posterior label… ▽ More Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}(Y \in C(X))\geq 1-α$ for a user-chosen $α\in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}^{X} \otimes \mathbb{P}^{Y|X}$. It is typically implicitly assumed that $\mathbb{P}^{Y|X}$ is the "true" posterior label distribution. However, in many real-world scenarios, the labels $Y_1,...,Y_n$ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$. For such ``voted'' labels, CP guarantees are thus w.r.t. $\mathbb{P}_{vote}=\mathbb{P}^X \otimes \mathbb{P}_{vote}^{Y|X}$ rather than the true distribution $\mathbb{P}$. In cases with unambiguous ground truth labels, the distinction between $\mathbb{P}_{vote}$ and $\mathbb{P}$ is irrelevant. However, when experts do not agree because of ambiguous labels, approximating $\mathbb{P}^{Y|X}$ with a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate $\mathbb{P}^{Y|X}$ using a non-degenerate distribution $\mathbb{P}_{agg}^{Y|X}$. We develop Monte Carlo CP procedures which provide guarantees w.r.t. $\mathbb{P}_{agg}=\mathbb{P}^X \otimes \mathbb{P}_{agg}^{Y|X}$ by sampling multiple synthetic pseudo-labels from $\mathbb{P}_{agg}^{Y|X}$ for each calibration example $X_1,...,X_n$. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. $\mathbb{P}_{vote}$ under-covers expert annotations: calibrated for $72\%$ coverage, it falls short by on average $10\%$; our Monte Carlo CP closes this gap both empirically and theoretically. △ Less

Submitted 24 October, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

arXiv:2307.02369 [pdf, other]

Interpolating Between the Gauge and Schrödinger Pictures of Quantum Dynamics

Authors: Sayak Guha Roy, Kevin Slagle

Abstract: Although spatial locality is explicit in the Heisenberg picture of quantum dynamics, spatial locality is not explicit in the Schrödinger picture equations of motion. The gauge picture is a modification of Schrödinger's picture such that locality is explicit in the equations of motion. In order to achieve this explicit locality, the gauge picture utilizes (1) a distinct wavefunction associated with… ▽ More Although spatial locality is explicit in the Heisenberg picture of quantum dynamics, spatial locality is not explicit in the Schrödinger picture equations of motion. The gauge picture is a modification of Schrödinger's picture such that locality is explicit in the equations of motion. In order to achieve this explicit locality, the gauge picture utilizes (1) a distinct wavefunction associated with each patch of space, and (2) time-dependent unitary connections to relate the Hilbert spaces associated with nearby patches. In this work, we show that by adding an additional spatially-local term to the gauge picture equations of motion, we can effectively interpolate between the gauge and Schrödinger pictures, such that when this additional term has a large coefficient, all of the gauge picture wavefunctions approach the Schrödginer picture wavefunction (and the connections approach the identity). △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 12+3 pages, 9+1 figures

arXiv:2307.02191 [pdf, other]

Evaluating AI systems under uncertain ground truth: a case study in dermatology

Authors: David Stutz, Ali Taylan Cemgil, Abhijit Guha Roy, Tatiana Matejovicova, Melih Barsbey, Patricia Strachan, Mike Schaekermann, Jan Freyberg, Rajeev Rikhye, Beverly Freeman, Javier Perez Matos, Umesh Telang, Dale R. Webster, Yuan Liu, Greg S. Corrado, Yossi Matias, Pushmeet Kohli, Yun Liu, Arnaud Doucet, Alan Karthikesalingam

Abstract: For safety, AI systems in health undergo thorough evaluations before deployment, validating their predictions against a ground truth that is assumed certain. However, this is actually not the case and the ground truth may be uncertain. Unfortunately, this is largely ignored in standard evaluation of AI models but can have severe consequences such as overestimating the future performance. To avoid… ▽ More For safety, AI systems in health undergo thorough evaluations before deployment, validating their predictions against a ground truth that is assumed certain. However, this is actually not the case and the ground truth may be uncertain. Unfortunately, this is largely ignored in standard evaluation of AI models but can have severe consequences such as overestimating the future performance. To avoid this, we measure the effects of ground truth uncertainty, which we assume decomposes into two main components: annotation uncertainty which stems from the lack of reliable annotations, and inherent uncertainty due to limited observational information. This ground truth uncertainty is ignored when estimating the ground truth by deterministically aggregating annotations, e.g., by majority voting or averaging. In contrast, we propose a framework where aggregation is done using a statistical model. Specifically, we frame aggregation of annotations as posterior inference of so-called plausibilities, representing distributions over classes in a classification setting, subject to a hyper-parameter encoding annotator reliability. Based on this model, we propose a metric for measuring annotation uncertainty and provide uncertainty-adjusted metrics for performance evaluation. We present a case study applying our framework to skin condition classification from images where annotations are provided in the form of differential diagnoses. The deterministic adjudication process called inverse rank normalization (IRN) from previous work ignores ground truth uncertainty in evaluation. Instead, we present two alternative statistical models: a probabilistic version of IRN and a Plackett-Luce-based model. We find that a large portion of the dataset exhibits significant ground truth uncertainty and standard IRN-based evaluation severely over-estimates performance without providing uncertainty estimates. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2305.05687 [pdf, other]

doi 10.3847/1538-4357/accc89

Coronal Heating as Determined by the Solar Flare Frequency Distribution Obtained by Aggregating Case Studies

Authors: James Paul Mason, Alexandra Werth, Colin G. West, Allison A. Youngblood, Donald L. Woodraska, Courtney Peck, Kevin Lacjak, Florian G. Frick, Moutamen Gabir, Reema A. Alsinan, Thomas Jacobsen, Mohammad Alrubaie, Kayla M. Chizmar, Benjamin P. Lau, Lizbeth Montoya Dominguez, David Price, Dylan R. Butler, Connor J. Biron, Nikita Feoktistov, Kai Dewey, N. E. Loomis, Michal Bodzianowski, Connor Kuybus, Henry Dietrick, Aubrey M. Wolfe , et al. (977 additional authors not shown)

Abstract: Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms th… ▽ More Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms that could explain it: nanoflares or Alfvén waves. To date, neither can be directly observed. Nanoflares are, by definition, extremely small, but their aggregate energy release could represent a substantial heating mechanism, presuming they are sufficiently abundant. One way to test this presumption is via the flare frequency distribution, which describes how often flares of various energies occur. If the slope of the power law fitting the flare frequency distribution is above a critical threshold, $α=2$ as established in prior literature, then there should be a sufficient abundance of nanoflares to explain coronal heating. We performed $>$600 case studies of solar flares, made possible by an unprecedented number of data analysts via three semesters of an undergraduate physics laboratory course. This allowed us to include two crucial, but nontrivial, analysis methods: pre-flare baseline subtraction and computation of the flare energy, which requires determining flare start and stop times. We aggregated the results of these analyses into a statistical study to determine that $α= 1.63 \pm 0.03$. This is below the critical threshold, suggesting that Alfvén waves are an important driver of coronal heating. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 1,002 authors, 14 pages, 4 figures, 3 tables, published by The Astrophysical Journal on 2023-05-09, volume 948, page 71

arXiv:2304.09218 [pdf, other]

Generative models improve fairness of medical classifiers under distribution shifts

Authors: Ira Ktena, Olivia Wiles, Isabela Albuquerque, Sylvestre-Alvise Rebuffi, Ryutaro Tanno, Abhijit Guha Roy, Shekoofeh Azizi, Danielle Belgrave, Pushmeet Kohli, Alan Karthikesalingam, Taylan Cemgil, Sven Gowal

Abstract: A ubiquitous challenge in machine learning is the problem of domain generalisation. This can exacerbate bias against groups or labels that are underrepresented in the datasets used for model development. Model bias can lead to unintended harms, especially in safety-critical applications like healthcare. Furthermore, the challenge is compounded by the difficulty of obtaining labelled data due to hi… ▽ More A ubiquitous challenge in machine learning is the problem of domain generalisation. This can exacerbate bias against groups or labels that are underrepresented in the datasets used for model development. Model bias can lead to unintended harms, especially in safety-critical applications like healthcare. Furthermore, the challenge is compounded by the difficulty of obtaining labelled data due to high cost or lack of readily available domain expertise. In our work, we show that learning realistic augmentations automatically from data is possible in a label-efficient manner using generative models. In particular, we leverage the higher abundance of unlabelled data to capture the underlying data distribution of different conditions and subgroups for an imaging modality. By conditioning generative models on appropriate labels, we can steer the distribution of synthetic examples according to specific requirements. We demonstrate that these learned augmentations can surpass heuristic ones by making models more robust and statistically fair in- and out-of-distribution. To evaluate the generality of our approach, we study 3 distinct medical imaging contexts of varying difficulty: (i) histopathology images from a publicly available generalisation benchmark, (ii) chest X-rays from publicly available clinical datasets, and (iii) dermatology images characterised by complex shifts and imaging conditions. Complementing real training samples with synthetic ones improves the robustness of models in all three medical tasks and increases fairness by improving the accuracy of diagnosis within underrepresented groups. This approach leads to stark improvements OOD across modalities: 7.7% prediction accuracy improvement in histopathology, 5.2% in chest radiology with 44.6% lower fairness gap and a striking 63.5% improvement in high-risk sensitivity for dermatology with a 7.5x reduction in fairness gap. △ Less

Submitted 18 April, 2023; originally announced April 2023.

arXiv:2302.09004 [pdf]

doi 10.1016/j.imu.2022.101156

CovidExpert: A Triplet Siamese Neural Network framework for the detection of COVID-19

Authors: Tareque Rahman Ornob, Gourab Roy, Enamul Hassan

Abstract: Patients with the COVID-19 infection may have pneumonia-like symptoms as well as respiratory problems which may harm the lungs. From medical images, coronavirus illness may be accurately identified and predicted using a variety of machine learning methods. Most of the published machine learning methods may need extensive hyperparameter adjustment and are unsuitable for small datasets. By leveragin… ▽ More Patients with the COVID-19 infection may have pneumonia-like symptoms as well as respiratory problems which may harm the lungs. From medical images, coronavirus illness may be accurately identified and predicted using a variety of machine learning methods. Most of the published machine learning methods may need extensive hyperparameter adjustment and are unsuitable for small datasets. By leveraging the data in a comparatively small dataset, few-shot learning algorithms aim to reduce the requirement of large datasets. This inspired us to develop a few-shot learning model for early detection of COVID-19 to reduce the post-effect of this dangerous disease. The proposed architecture combines few-shot learning with an ensemble of pre-trained convolutional neural networks to extract feature vectors from CT scan images for similarity learning. The proposed Triplet Siamese Network as the few-shot learning model classified CT scan images into Normal, COVID-19, and Community-Acquired Pneumonia. The suggested model achieved an overall accuracy of 98.719%, a specificity of 99.36%, a sensitivity of 98.72%, and a ROC score of 99.9% with only 200 CT scans per category for training data. △ Less

Submitted 17 February, 2023; originally announced February 2023.

ACM Class: I.2.1; I.4.9

Journal ref: Informatics in Medicine Unlocked, 37, 2023, 101156

arXiv:2301.00152 [pdf, other]

doi 10.1145/3511095.3531268

Towards Proactively Forecasting Sentence-Specific Information Popularity within Online News Documents

Authors: Sayar Ghosh Roy, Anshul Padhi, Risubh Jain, Manish Gupta, Vasudeva Varma

Abstract: Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regress… ▽ More Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regression task. For training our models, we curate InfoPop, the first dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents. To the best of our knowledge, this is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We propose a novel transfer learning approach involving sentence salience prediction as an auxiliary task. Our proposed technique coupled with a BERT-based neural model exceeds nDCG values of 0.8 for proactive sentence-specific popularity forecasting. Notably, our study presents a non-trivial takeaway: though popularity and salience are different concepts, transfer learning from salience prediction enhances popularity forecasting. We release InfoPop and make our code publicly available: https://github.com/sayarghoshroy/InfoPopularity △ Less

Submitted 31 December, 2022; originally announced January 2023.

Comments: In 33rd ACM Conference on Hypertext and Social Media [HT '22] (Main Track), Link: https://dl.acm.org/doi/10.1145/3511095.3531268

Journal ref: In HT '22. Association for Computing Machinery, New York, NY, USA, 11-20 (2022)

arXiv:2205.10829 [pdf]

Ferrocene as an iconic redox marker: from solution chemistry to molecular electronic devices

Authors: Gargee Roy, Ritu Gupta, Satya Ranjan Sahoo, Sumit Saha, Deepak Asthana, Prakash Chandra Mondal

Abstract: Ferrocene, since its discovery in 1951, has been extensively exploited as a redox probe in a variety of processes ranging from solution chemistry, medicinal chemistry, supramolecular chemistry, surface chemistry to solid-state molecular electronic and spintronic circuit elements to unravel electrochemical charge-transfer dynamics. Ferrocene represents an extremely chemically and thermally stable,… ▽ More Ferrocene, since its discovery in 1951, has been extensively exploited as a redox probe in a variety of processes ranging from solution chemistry, medicinal chemistry, supramolecular chemistry, surface chemistry to solid-state molecular electronic and spintronic circuit elements to unravel electrochemical charge-transfer dynamics. Ferrocene represents an extremely chemically and thermally stable, and highly reproducible redox probe that undergoes reversible one-electron oxidation and reduction occurring at the interfaces of electrode/ferrocene solution in response to applied anodic and cathodic potentials, respectively. It has been almost 70 years after its discovery and has become one of the most widely studied and model organometallic compounds not only for probing electrochemical charge-transfer process but also as molecular building blocks for the synthesis of chiral organometallic catalysts, potential drug candidates, polymeric compounds, electrochemical sensors, to name a few. Ferrocene and its derivatives have been a breakthrough in many aspects due to its versatile reactivity, fascinating chemical structures, unconventional metal-ligand coordination, and the magic number of electrons (18 e-). The present review discusses the recent progress made towards ferrocene-containing molecular systems exploited for redox reactions, surface attachment, spin-dependent electrochemical process to probe spin polarization, photo-electrochemistry, and integration into prototype molecular electronic devices. Overall, the present reviews demonstrate a piece of collective information about the recent advancements made towards the ferrocene and its derivatives that have been utilized as iconic redox markers. △ Less

Submitted 22 May, 2022; originally announced May 2022.

arXiv:2205.09723 [pdf, other]

Robust and Efficient Medical Imaging with Self-Supervision

Authors: Shekoofeh Azizi, Laura Culp, Jan Freyberg, Basil Mustafa, Sebastien Baur, Simon Kornblith, Ting Chen, Patricia MacWilliams, S. Sara Mahdavi, Ellery Wulczyn, Boris Babenko, Megan Wilson, Aaron Loh, Po-Hsuan Cameron Chen, Yuan Liu, Pinal Bavishi, Scott Mayer McKinney, Jim Winkens, Abhijit Guha Roy, Zach Beaver, Fiona Ryan, Justin Krogue, Mozziyar Etemadi, Umesh Telang, Yun Liu , et al. (9 additional authors not shown)

Abstract: Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such systems tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated in clinical settings different from the training environment. A common mitigation strategy is to develop separate systems for each clinical setting using site-specific d… ▽ More Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such systems tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated in clinical settings different from the training environment. A common mitigation strategy is to develop separate systems for each clinical setting using site-specific data [1]. However, this quickly becomes impractical as medical data is time-consuming to acquire and expensive to annotate [2]. Thus, the problem of "data-efficient generalization" presents an ongoing difficulty for Medical AI development. Although progress in representation learning shows promise, their benefits have not been rigorously studied, specifically for out-of-distribution settings. To meet these challenges, we present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI. REMEDIS uses a generic combination of large-scale supervised transfer learning with self-supervised learning and requires little task-specific customization. We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data. REMEDIS exhibits significantly improved in-distribution performance with up to 11.5% relative improvement in diagnostic accuracy over a strong supervised baseline. More importantly, our strategy leads to strong data-efficient generalization of medical imaging AI, matching strong supervised baselines using between 1% to 33% of retraining data across tasks. These results suggest that REMEDIS can significantly accelerate the life-cycle of medical imaging AI development thereby presenting an important step forward for medical imaging AI to deliver broad impact. △ Less

Submitted 3 July, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

arXiv:2202.00882 [pdf, ps, other]

doi 10.1093/bioinformatics/btac636

MPVNN: Mutated Pathway Visible Neural Network Architecture for Interpretable Prediction of Cancer-specific Survival Risk

Authors: Gourab Ghosh Roy, Nicholas Geard, Karin Verspoor, Shan He

Abstract: Survival risk prediction using gene expression data is important in making treatment decisions in cancer. Standard neural network (NN) survival analysis models are black boxes with lack of interpretability. More interpretable visible neural network (VNN) architectures are designed using biological pathway knowledge. But they do not model how pathway structures can change for particular cancer type… ▽ More Survival risk prediction using gene expression data is important in making treatment decisions in cancer. Standard neural network (NN) survival analysis models are black boxes with lack of interpretability. More interpretable visible neural network (VNN) architectures are designed using biological pathway knowledge. But they do not model how pathway structures can change for particular cancer types. We propose a novel Mutated Pathway VNN or MPVNN architecture, designed using prior signaling pathway knowledge and gene mutation data-based edge randomization simulating signal flow disruption. As a case study, we use the PI3K-Akt pathway and demonstrate overall improved cancer-specific survival risk prediction results of MPVNN over standard non-NN and other similar sized NN survival analysis methods. We show that trained MPVNN architecture interpretation, which points to smaller sets of genes connected by signal flow within the PI3K-Akt pathway that are important in risk prediction for particular cancer types, is reliable. △ Less

Submitted 2 February, 2022; originally announced February 2022.

Comments: 11 pages, 2 figures

Journal ref: Bioinformatics 38.22 (2022) 5026-5032

arXiv:2109.01878 [pdf, other]

Robust Mitosis Detection Using a Cascade Mask-RCNN Approach With Domain-Specific Residual Cycle-GAN Data Augmentation

Authors: Gauthier Roy, Jules Dedieu, Capucine Bertrand, Alireza Moshayedi, Ali Mammadov, Stéphanie Petit, Saima Ben Hadj, Rutger H. J. Fick

Abstract: For the MIDOG mitosis detection challenge, we created a cascade algorithm consisting of a Mask-RCNN detector, followed by a classification ensemble consisting of ResNet50 and DenseNet201 to refine detected mitotic candidates. The MIDOG training data consists of 200 frames originating from four scanners, three of which are annotated for mitotic instances with centroid annotations. Our main algorith… ▽ More For the MIDOG mitosis detection challenge, we created a cascade algorithm consisting of a Mask-RCNN detector, followed by a classification ensemble consisting of ResNet50 and DenseNet201 to refine detected mitotic candidates. The MIDOG training data consists of 200 frames originating from four scanners, three of which are annotated for mitotic instances with centroid annotations. Our main algorithmic choices are as follows: first, to enhance the generalizability of our detector and classification networks, we use a state-of-the-art residual Cycle-GAN to transform each scanner domain to every other scanner domain. During training, we then randomly load, for each image, one of the four domains. In this way, our networks can learn from the fourth non-annotated scanner domain even if we don't have annotations for it. Second, for training the detector network, rather than using centroid-based fixed-size bounding boxes, we create mitosis-specific bounding boxes. We do this by manually annotating a small selection of mitoses, training a Mask-RCNN on this small dataset, and applying it to the rest of the data to obtain full annotations. We trained the follow-up classification ensemble using only the challenge-provided positive and hard-negative examples. On the preliminary test set, the algorithm scores an F1 score of 0.7578, putting us as the second-place team on the leaderboard. △ Less

Submitted 28 September, 2021; v1 submitted 4 September, 2021; originally announced September 2021.

Comments: Gauthier Roy and Jules Dedieu contributed equally to this work

arXiv:2106.09022 [pdf, other]

A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection

Authors: Jie Ren, Stanislav Fort, Jeremiah Liu, Abhijit Guha Roy, Shreyas Padhy, Balaji Lakshminarayanan

Abstract: Mahalanobis distance (MD) is a simple and popular post-processing method for detecting out-of-distribution (OOD) inputs in neural networks. We analyze its failure modes for near-OOD detection and propose a simple fix called relative Mahalanobis distance (RMD) which improves performance and is more robust to hyperparameter choice. On a wide selection of challenging vision, language, and biology OOD… ▽ More Mahalanobis distance (MD) is a simple and popular post-processing method for detecting out-of-distribution (OOD) inputs in neural networks. We analyze its failure modes for near-OOD detection and propose a simple fix called relative Mahalanobis distance (RMD) which improves performance and is more robust to hyperparameter choice. On a wide selection of challenging vision, language, and biology OOD benchmarks (CIFAR-100 vs CIFAR-10, CLINC OOD intent detection, Genomics OOD), we show that RMD meaningfully improves upon MD performance (by up to 15% AUROC on genomics OOD). △ Less

Submitted 16 June, 2021; originally announced June 2021.

arXiv:2104.03829 [pdf, other]

doi 10.1016/j.media.2021.102274

Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions

Authors: Abhijit Guha Roy, Jie Ren, Shekoofeh Azizi, Aaron Loh, Vivek Natarajan, Basil Mustafa, Nick Pawlowski, Jan Freyberg, Yuan Liu, Zach Beaver, Nam Vo, Peggy Bui, Samantha Winter, Patricia MacWilliams, Greg S. Corrado, Umesh Telang, Yun Liu, Taylan Cemgil, Alan Karthikesalingam, Balaji Lakshminarayanan, Jim Winkens

Abstract: We develop and rigorously evaluate a deep learning based system that can accurately classify skin conditions while detecting rare conditions for which there is not enough data available for training a confident classifier. We frame this task as an out-of-distribution (OOD) detection problem. Our novel approach, hierarchical outlier detection (HOD) assigns multiple abstention classes for each train… ▽ More We develop and rigorously evaluate a deep learning based system that can accurately classify skin conditions while detecting rare conditions for which there is not enough data available for training a confident classifier. We frame this task as an out-of-distribution (OOD) detection problem. Our novel approach, hierarchical outlier detection (HOD) assigns multiple abstention classes for each training outlier class and jointly performs a coarse classification of inliers vs. outliers, along with fine-grained classification of the individual classes. We demonstrate the effectiveness of the HOD loss in conjunction with modern representation learning approaches (BiT, SimCLR, MICLe) and explore different ensembling strategies for further improving the results. We perform an extensive subgroup analysis over conditions of varying risk levels and different skin types to investigate how the OOD detection performance changes over each subgroup and demonstrate the gains of our framework in comparison to baselines. Finally, we introduce a cost metric to approximate downstream clinical impact. We use this cost metric to compare the proposed method against a baseline system, thereby making a stronger case for the overall system effectiveness in a real-world deployment scenario. △ Less

Submitted 8 April, 2021; originally announced April 2021.

Comments: Under Review, 19 Pages

Journal ref: Medical Image Analysis (2022)

arXiv:2101.03553 [pdf, other]

doi 10.18653/v1/2020.sdp-1.39

Summaformers @ LaySumm 20, LongSumm 20

Authors: Sayar Ghosh Roy, Nikhil Pinnaparaju, Risubh Jain, Manish Gupta, Vasudeva Varma

Abstract: Automatic text summarization has been widely studied as an important task in natural language processing. Traditionally, various feature engineering and machine learning based systems have been proposed for extractive as well as abstractive text summarization. Recently, deep learning based, specifically Transformer-based systems have been immensely popular. Summarization is a cognitively challengi… ▽ More Automatic text summarization has been widely studied as an important task in natural language processing. Traditionally, various feature engineering and machine learning based systems have been proposed for extractive as well as abstractive text summarization. Recently, deep learning based, specifically Transformer-based systems have been immensely popular. Summarization is a cognitively challenging task - extracting summary worthy sentences is laborious, and expressing semantics in brief when doing abstractive summarization is complicated. In this paper, we specifically look at the problem of summarizing scientific research papers from multiple domains. We differentiate between two types of summaries, namely, (a) LaySumm: A very short summary that captures the essence of the research paper in layman terms restricting overtly specific technical jargon and (b) LongSumm: A much longer detailed summary aimed at providing specific insights into various ideas touched upon in the paper. While leveraging latest Transformer-based models, our systems are simple, intuitive and based on how specific paper sections contribute to human summaries of the two types described above. Evaluations against gold standard summaries using ROUGE metrics prove the effectiveness of our approach. On blind test corpora, our system ranks first and third for the LongSumm and LaySumm tasks respectively. △ Less

Submitted 10 January, 2021; originally announced January 2021.

Comments: Proceedings of the First Workshop on Scholarly Document Processing (SDP) at EMNLP 2020

Report number: IIIT/TR/2020/75

Journal ref: In Proceedings of the First Workshop on Scholarly Document Processing, pages 336 - 343, 2020, Online. Association for Computational Linguistics

arXiv:2101.03382 [pdf, other]

Task Adaptive Pretraining of Transformers for Hostility Detection

Authors: Tathagata Raha, Sayar Ghosh Roy, Ujwal Narayan, Zubair Abid, Vasudeva Varma

Abstract: Identifying adverse and hostile content on the web and more particularly, on social media, has become a problem of paramount interest in recent years. With their ever increasing popularity, fine-tuning of pretrained Transformer-based encoder models with a classifier head are gradually becoming the new baseline for natural language classification tasks. In our work, we explore the gains attributed… ▽ More Identifying adverse and hostile content on the web and more particularly, on social media, has become a problem of paramount interest in recent years. With their ever increasing popularity, fine-tuning of pretrained Transformer-based encoder models with a classifier head are gradually becoming the new baseline for natural language classification tasks. In our work, we explore the gains attributed to Task Adaptive Pretraining (TAPT) prior to fine-tuning of Transformer-based architectures. We specifically study two problems, namely, (a) Coarse binary classification of Hindi Tweets into Hostile or Not, and (b) Fine-grained multi-label classification of Tweets into four categories: hate, fake, offensive, and defamation. Building up on an architecture which takes emojis and segmented hashtags into consideration for classification, we are able to experimentally showcase the performance upgrades due to TAPT. Our system (with team name 'iREL IIIT') ranked first in the 'Hostile Post Detection in Hindi' shared task with an F1 score of 97.16% for coarse-grained detection and a weighted F1 score of 62.96% for fine-grained multi-label classification on the provided blind test corpora. △ Less

Submitted 9 January, 2021; originally announced January 2021.

Comments: To be published in: Proceedings of the First Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation (CONSTRAINT) at AAAI 2021

arXiv:2101.03207 [pdf, other]

Leveraging Multilingual Transformers for Hate Speech Detection

Authors: Sayar Ghosh Roy, Ujwal Narayan, Tathagata Raha, Zubair Abid, Vasudeva Varma

Abstract: Detecting and classifying instances of hate in social media text has been a problem of interest in Natural Language Processing in the recent years. Our work leverages state of the art Transformer language models to identify hate speech in a multilingual setting. Capturing the intent of a post or a comment on social media involves careful evaluation of the language style, semantic content and addit… ▽ More Detecting and classifying instances of hate in social media text has been a problem of interest in Natural Language Processing in the recent years. Our work leverages state of the art Transformer language models to identify hate speech in a multilingual setting. Capturing the intent of a post or a comment on social media involves careful evaluation of the language style, semantic content and additional pointers such as hashtags and emojis. In this paper, we look at the problem of identifying whether a Twitter post is hateful and offensive or not. We further discriminate the detected toxic content into one of the following three classes: (a) Hate Speech (HATE), (b) Offensive (OFFN) and (c) Profane (PRFN). With a pre-trained multilingual Transformer-based text encoder at the base, we are able to successfully identify and classify hate speech from multiple languages. On the provided testing corpora, we achieve Macro F1 scores of 90.29, 81.87 and 75.40 for English, German and Hindi respectively while performing hate speech detection and of 60.70, 53.28 and 49.74 during fine-grained classification. In our experiments, we show the efficacy of Perspective API features for hate speech classification and the effects of exploiting a multilingual training scheme. A feature selection study is provided to illustrate impacts of specific features upon the architecture's classification head. △ Less

Submitted 8 January, 2021; originally announced January 2021.

Comments: To be published in: FIRE (Working Notes) 2020, Hate Speech and Offensive Content Identification in Indo-European Languages, HASOC 2020

arXiv:2009.08127 [pdf]

Addressing Cognitive Biases in Augmented Business Decision Systems

Authors: Thomas Baudel, Manon Verbockhaven, Guillaume Roy, Victoire Cousergue, Rida Laarach

Abstract: How do algorithmic decision aids introduced in business decision processes affect task performance? In a first experiment, we study effective collaboration. Faced with a decision, subjects alone have a success rate of 72%; Aided by a recommender that has a 75% success rate, their success rate reaches 76%. The human-system collaboration had thus a greater success rate than each taken alone. However… ▽ More How do algorithmic decision aids introduced in business decision processes affect task performance? In a first experiment, we study effective collaboration. Faced with a decision, subjects alone have a success rate of 72%; Aided by a recommender that has a 75% success rate, their success rate reaches 76%. The human-system collaboration had thus a greater success rate than each taken alone. However, we noted a complacency/authority bias that degraded the quality of decisions by 5% when the recommender was wrong. This suggests that any lingering algorithmic bias may be amplified by decision aids. In a second experiment, we evaluated the effectiveness of 5 presentation variants in reducing complacency bias. We found that optional presentation increases subjects' resistance to wrong recommendations. We conclude by arguing that our metrics, in real usage scenarios, where decision aids are embedded as system-wide features in Business Process Management software, can lead to enhanced benefits. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: 22 pages, 8 figures, submitted to ACM CHI 2021 conference

ACM Class: H.4.1; H.5.1

arXiv:2008.12680 [pdf, other]

doi 10.1007/978-3-030-59861-7_28

Bayesian Neural Networks for Uncertainty Estimation of Imaging Biomarkers

Authors: J. Senapati, A. Guha Roy, S. Pölsterl, D. Gutmann, S. Gatidis, C. Schlett, A. Peters, F. Bamberg, C. Wachinger

Abstract: Image segmentation enables to extract quantitative measures from scans that can serve as imaging biomarkers for diseases. However, segmentation quality can vary substantially across scans, and therefore yield unfaithful estimates in the follow-up statistical analysis of biomarkers. The core problem is that segmentation and biomarker analysis are performed independently. We propose to propagate seg… ▽ More Image segmentation enables to extract quantitative measures from scans that can serve as imaging biomarkers for diseases. However, segmentation quality can vary substantially across scans, and therefore yield unfaithful estimates in the follow-up statistical analysis of biomarkers. The core problem is that segmentation and biomarker analysis are performed independently. We propose to propagate segmentation uncertainty to the statistical analysis to account for variations in segmentation confidence. To this end, we evaluate four Bayesian neural networks to sample from the posterior distribution and estimate the uncertainty. We then assign confidence measures to the biomarker and propose statistical models for its integration in group analysis and disease classification. Our results for segmenting the liver in patients with diabetes mellitus clearly demonstrate the improvement of integrating biomarker uncertainty in the statistical inference. △ Less

Submitted 1 September, 2020; v1 submitted 28 August, 2020; originally announced August 2020.

Comments: MICCAI-MLMI 2020 Workshop Paper (Accepted)

arXiv:2007.05566 [pdf, other]

Contrastive Training for Improved Out-of-Distribution Detection

Authors: Jim Winkens, Rudy Bunel, Abhijit Guha Roy, Robert Stanforth, Vivek Natarajan, Joseph R. Ledsam, Patricia MacWilliams, Pushmeet Kohli, Alan Karthikesalingam, Simon Kohl, Taylan Cemgil, S. M. Ali Eslami, Olaf Ronneberger

Abstract: Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection performance. Unlike leading methods for OOD detection, our approach does not require access to examples labeled explicitly as OOD, which can be difficult to coll… ▽ More Reliable detection of out-of-distribution (OOD) inputs is increasingly understood to be a precondition for deployment of machine learning systems. This paper proposes and investigates the use of contrastive training to boost OOD detection performance. Unlike leading methods for OOD detection, our approach does not require access to examples labeled explicitly as OOD, which can be difficult to collect in practice. We show in extensive experiments that contrastive training significantly helps OOD detection performance on a number of common benchmarks. By introducing and employing the Confusion Log Probability (CLP) score, which quantifies the difficulty of the OOD detection task by capturing the similarity of inlier and outlier datasets, we show that our method especially improves performance in the `near OOD' classes -- a particularly challenging setting for previous methods. △ Less

Submitted 10 July, 2020; originally announced July 2020.

arXiv:2005.00079 [pdf, other]

Importance Driven Continual Learning for Segmentation Across Domains

Authors: Sinan Özgür Özgün, Anne-Marie Rickmann, Abhijit Guha Roy, Christian Wachinger

Abstract: The ability of neural networks to continuously learn and adapt to new tasks while retaining prior knowledge is crucial for many applications. However, current neural networks tend to forget previously learned tasks when trained on new ones, i.e., they suffer from Catastrophic Forgetting (CF). The objective of Continual Learning (CL) is to alleviate this problem, which is particularly relevant for… ▽ More The ability of neural networks to continuously learn and adapt to new tasks while retaining prior knowledge is crucial for many applications. However, current neural networks tend to forget previously learned tasks when trained on new ones, i.e., they suffer from Catastrophic Forgetting (CF). The objective of Continual Learning (CL) is to alleviate this problem, which is particularly relevant for medical applications, where it may not be feasible to store and access previously used sensitive patient data. In this work, we propose a Continual Learning approach for brain segmentation, where a single network is consecutively trained on samples from different domains. We build upon an importance driven approach and adapt it for medical image segmentation. Particularly, we introduce learning rate regularization to prevent the loss of the network's knowledge. Our results demonstrate that directly restricting the adaptation of important network parameters clearly reduces Catastrophic Forgetting for segmentation across domains. △ Less

Submitted 30 April, 2020; originally announced May 2020.

arXiv:2002.10994 [pdf, other]

doi 10.1109/TMI.2020.2972059

Recalibrating 3D ConvNets with Project & Excite

Authors: Anne-Marie Rickmann, Abhijit Guha Roy, Ignacio Sarasua, Christian Wachinger

Abstract: Fully Convolutional Neural Networks (F-CNNs) achieve state-of-the-art performance for segmentation tasks in computer vision and medical imaging. Recently, computational blocks termed squeeze and excitation (SE) have been introduced to recalibrate F-CNN feature maps both channel- and spatial-wise, boosting segmentation performance while only minimally increasing the model complexity. So far, the de… ▽ More Fully Convolutional Neural Networks (F-CNNs) achieve state-of-the-art performance for segmentation tasks in computer vision and medical imaging. Recently, computational blocks termed squeeze and excitation (SE) have been introduced to recalibrate F-CNN feature maps both channel- and spatial-wise, boosting segmentation performance while only minimally increasing the model complexity. So far, the development of SE blocks has focused on 2D architectures. For volumetric medical images, however, 3D F-CNNs are a natural choice. In this article, we extend existing 2D recalibration methods to 3D and propose a generic compress-process-recalibrate pipeline for easy comparison of such blocks. We further introduce Project & Excite (PE) modules, customized for 3D networks. In contrast to existing modules, Project \& Excite does not perform global average pooling but compresses feature maps along different spatial dimensions of the tensor separately to retain more spatial information that is subsequently used in the excitation step. We evaluate the modules on two challenging tasks, whole-brain segmentation of MRI scans and whole-body segmentation of CT scans. We demonstrate that PE modules can be easily integrated into 3D F-CNNs, boosting performance up to 0.3 in Dice Score and outperforming 3D extensions of other recalibration blocks, while only marginally increasing the model complexity. Our code is publicly available on https://github.com/ai-med/squeeze_and_excitation . △ Less

Submitted 25 February, 2020; originally announced February 2020.

Comments: Accepted for publication at IEEE Transactions on Medical Imaging

arXiv:1911.01562 [pdf, other]

DeepRacer: Educational Autonomous Racing Platform for Experimentation with Sim2Real Reinforcement Learning

Authors: Bharathan Balaji, Sunil Mallya, Sahika Genc, Saurabh Gupta, Leo Dirac, Vineet Khare, Gourav Roy, Tao Sun, Yunzhe Tao, Brian Townsend, Eddie Calleja, Sunil Muralidhara, Dhanasekar Karuppasamy

Abstract: DeepRacer is a platform for end-to-end experimentation with RL and can be used to systematically investigate the key challenges in develo** intelligent control systems. Using the platform, we demonstrate how a 1/18th scale car can learn to drive autonomously using RL with a monocular camera. It is trained in simulation with no additional tuning in physical world and demonstrates: 1) formulation… ▽ More DeepRacer is a platform for end-to-end experimentation with RL and can be used to systematically investigate the key challenges in develo** intelligent control systems. Using the platform, we demonstrate how a 1/18th scale car can learn to drive autonomously using RL with a monocular camera. It is trained in simulation with no additional tuning in physical world and demonstrates: 1) formulation and solution of a robust reinforcement learning algorithm, 2) narrowing the reality gap through joint perception and dynamics, 3) distributed on-demand compute architecture for training optimal policies, and 4) a robust evaluation method to identify when to stop training. It is the first successful large-scale deployment of deep reinforcement learning on a robotic control agent that uses only raw camera images as observations and a model-free learning method to perform robust path planning. We open source our code and video demo on GitHub: https://git.io/fjxoJ. △ Less

Submitted 4 November, 2019; originally announced November 2019.

arXiv:1906.04649 [pdf, other]

`Project & Excite' Modules for Segmentation of Volumetric Medical Scans

Authors: Anne-Marie Rickmann, Abhijit Guha Roy, Ignacio Sarasua, Nassir Navab, Christian Wachinger

Abstract: Fully Convolutional Neural Networks (F-CNNs) achieve state-of-the-art performance for image segmentation in medical imaging. Recently, squeeze and excitation (SE) modules and variations thereof have been introduced to recalibrate feature maps channel- and spatial-wise, which can boost performance while only minimally increasing model complexity. So far, the development of SE has focused on 2D imag… ▽ More Fully Convolutional Neural Networks (F-CNNs) achieve state-of-the-art performance for image segmentation in medical imaging. Recently, squeeze and excitation (SE) modules and variations thereof have been introduced to recalibrate feature maps channel- and spatial-wise, which can boost performance while only minimally increasing model complexity. So far, the development of SE has focused on 2D images. In this paper, we propose `Project & Excite' (PE) modules that base upon the ideas of SE and extend them to operating on 3D volumetric images. `Project & Excite' does not perform global average pooling, but squeezes feature maps along different slices of a tensor separately to retain more spatial information that is subsequently used in the excitation step. We demonstrate that PE modules can be easily integrated in 3D U-Net, boosting performance by 5% Dice points, while only increasing the model complexity by 2%. We evaluate the PE module on two challenging tasks, whole-brain segmentation of MRI scans and whole-body segmentation of CT scans. Code: https://github.com/ai-med/squeeze_and_excitation △ Less

Submitted 12 June, 2019; v1 submitted 11 June, 2019; originally announced June 2019.

Comments: Accepted for International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2019

arXiv:1905.06731 [pdf, other]

BrainTorrent: A Peer-to-Peer Environment for Decentralized Federated Learning

Authors: Abhijit Guha Roy, Shayan Siddiqui, Sebastian Pölsterl, Nassir Navab, Christian Wachinger

Abstract: Access to sufficient annotated data is a common challenge in training deep neural networks on medical images. As annotating data is expensive and time-consuming, it is difficult for an individual medical center to reach large enough sample sizes to build their own, personalized models. As an alternative, data from all centers could be pooled to train a centralized model that everyone can use. Howe… ▽ More Access to sufficient annotated data is a common challenge in training deep neural networks on medical images. As annotating data is expensive and time-consuming, it is difficult for an individual medical center to reach large enough sample sizes to build their own, personalized models. As an alternative, data from all centers could be pooled to train a centralized model that everyone can use. However, such a strategy is often infeasible due to the privacy-sensitive nature of medical data. Recently, federated learning (FL) has been introduced to collaboratively learn a shared prediction model across centers without the need for sharing data. In FL, clients are locally training models on site-specific datasets for a few epochs and then sharing their model weights with a central server, which orchestrates the overall training process. Importantly, the sharing of models does not compromise patient privacy. A disadvantage of FL is the dependence on a central server, which requires all clients to agree on one trusted central body, and whose failure would disrupt the training process of all clients. In this paper, we introduce BrainTorrent, a new FL framework without a central server, particularly targeted towards medical applications. BrainTorrent presents a highly dynamic peer-to-peer environment, where all centers directly interact with each other without depending on a central body. We demonstrate the overall effectiveness of FL for the challenging task of whole brain segmentation and observe that the proposed server-less BrainTorrent approach does not only outperform the traditional server-based one but reaches a similar performance to a model trained on pooled data. △ Less

Submitted 16 May, 2019; originally announced May 2019.

arXiv:1904.06406 [pdf, ps, other]

doi 10.1016/j.nima.2019.162411

ChAKRA : The high resolution charged particle detector array at VECC

Authors: Samir Kundu, T. K. Rana, C. Bhattacharya, K. Banerjee, R. Pandey, Santu Manna J. K. Meena, A. K. Saha, J. K. Sahoo, P. Dhara A. Dey D. Gupta T. K. Ghosh Pratap Roy, G. Mukherjee, R Mandal Saha, S. Roy, S. R. Bajirao, A. Sen, S. Bhattacharya

Abstract: A large 4$π$ array of charged particle detectors has been developed at Variable Energy Cyclotron Centre to facilitate high resolution charged particle reaction and spectroscopy studies by detecting event-by-event the charged reaction products emitted in heavy ion reactions at energy $\sim$ 10-60 MeV/A. The forward part ($θ\sim \pm $ $7^{0}$ - $\pm 45^{0}$) of the array consists of 24 highly granul… ▽ More A large 4$π$ array of charged particle detectors has been developed at Variable Energy Cyclotron Centre to facilitate high resolution charged particle reaction and spectroscopy studies by detecting event-by-event the charged reaction products emitted in heavy ion reactions at energy $\sim$ 10-60 MeV/A. The forward part ($θ\sim \pm $ $7^{0}$ - $\pm 45^{0}$) of the array consists of 24 highly granular, high resolution charged particle telescopes, each of which is made by three layers [single sided silicon strip($Δ$E) + double sided silicon strip (E/$Δ$E) + CsI(Tl)(E)]of detectors. The backward part of the array consists of 112 CsI(Tl) detectors which are capable of detecting primarily the light charged particles (Z $\le$ 2) emitted in the angular range of $θ\sim \pm $ $45^{0}$ - $\pm 175^{0}$. The extreme forward part of the array ($θ\sim \pm $ $3^{0}$ - $\pm 7^{0}$) is made up of 32 slow-fast plastic phoswich detectors that are capable of detecting light (Z $\le$2) and heavy charged particles (3 $\le$ Z $\lesssim$ 20) as well as handling high count rates. The design, construction and characterization of the array has been described. △ Less

Submitted 11 April, 2019; originally announced April 2019.

Comments: Paper has been submitted in NIMA

arXiv:1904.03110 [pdf, other]

doi 10.1007/978-3-030-32248-9_49

3DQ: Compact Quantized Neural Networks for Volumetric Whole Brain Segmentation

Authors: Magdalini Paschali, Stefano Gasperini, Abhijit Guha Roy, Michael Y. -S. Fang, Nassir Navab

Abstract: Model architectures have been dramatically increasing in size, improving performance at the cost of resource requirements. In this paper we propose 3DQ, a ternary quantization method, applied for the first time to 3D Fully Convolutional Neural Networks (F-CNNs), enabling 16x model compression while maintaining performance on par with full precision models. We extensively evaluate 3DQ on two datase… ▽ More Model architectures have been dramatically increasing in size, improving performance at the cost of resource requirements. In this paper we propose 3DQ, a ternary quantization method, applied for the first time to 3D Fully Convolutional Neural Networks (F-CNNs), enabling 16x model compression while maintaining performance on par with full precision models. We extensively evaluate 3DQ on two datasets for the challenging task of whole brain segmentation. Additionally, we showcase our method's ability to generalize on two common 3D architectures, namely 3D U-Net and V-Net. Outperforming a variety of baselines, the proposed method is capable of compressing large 3D models to a few MBytes, alleviating the storage needs in space critical applications. △ Less

Submitted 1 July, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

Comments: Accepted to MICCAI 2019

arXiv:1902.01314 [pdf, other]

'Squeeze & Excite' Guided Few-Shot Segmentation of Volumetric Images

Authors: Abhijit Guha Roy, Shayan Siddiqui, Sebastian Pölsterl, Nassir Navab, Christian Wachinger

Abstract: Deep neural networks enable highly accurate image segmentation, but require large amounts of manually annotated data for supervised training. Few-shot learning aims to address this shortcoming by learning a new class from a few annotated support examples. We introduce, a novel few-shot framework, for the segmentation of volumetric medical images with only a few annotated slices. Compared to other… ▽ More Deep neural networks enable highly accurate image segmentation, but require large amounts of manually annotated data for supervised training. Few-shot learning aims to address this shortcoming by learning a new class from a few annotated support examples. We introduce, a novel few-shot framework, for the segmentation of volumetric medical images with only a few annotated slices. Compared to other related works in computer vision, the major challenges are the absence of pre-trained networks and the volumetric nature of medical scans. We address these challenges by proposing a new architecture for few-shot segmentation that incorporates 'squeeze & excite' blocks. Our two-armed architecture consists of a conditioner arm, which processes the annotated support input and generates a task-specific representation. This representation is passed on to the segmenter arm that uses this information to segment the new query image. To facilitate efficient interaction between the conditioner and the segmenter arm, we propose to use 'channel squeeze & spatial excitation' blocks - a light-weight computational module - that enables heavy interaction between both the arms with negligible increase in model complexity. This contribution allows us to perform image segmentation without relying on a pre-trained model, which generally is unavailable for medical scans. Furthermore, we propose an efficient strategy for volumetric segmentation by optimally pairing a few slices of the support volume to all the slices of the query volume. We perform experiments for organ segmentation on whole-body contrast-enhanced CT scans from the Visceral Dataset. Our proposed model outperforms multiple baselines and existing approaches with respect to the segmentation accuracy by a significant margin. The source code is available at https://github.com/abhi4ssj/few-shot-segmentation. △ Less

Submitted 11 October, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

Comments: Accepted for publication at Medical Image Analysis

arXiv:1901.04420 [pdf, other]

Data Augmentation with Manifold Exploring Geometric Transformations for Increased Performance and Robustness

Authors: Magdalini Paschali, Walter Simson, Abhijit Guha Roy, Muhammad Ferjad Naeem, Rüdiger Göbl, Christian Wachinger, Nassir Navab

Abstract: In this paper we propose a novel augmentation technique that improves not only the performance of deep neural networks on clean test data, but also significantly increases their robustness to random transformations, both affine and projective. Inspired by ManiFool, the augmentation is performed by a line-search manifold-exploration method that learns affine geometric transformations that lead to t… ▽ More In this paper we propose a novel augmentation technique that improves not only the performance of deep neural networks on clean test data, but also significantly increases their robustness to random transformations, both affine and projective. Inspired by ManiFool, the augmentation is performed by a line-search manifold-exploration method that learns affine geometric transformations that lead to the misclassification on an image, while ensuring that it remains on the same manifold as the training data. This augmentation method populates any training dataset with images that lie on the border of the manifolds between two-classes and maximizes the variance the network is exposed to during training. Our method was thoroughly evaluated on the challenging tasks of fine-grained skin lesion classification from limited data, and breast tumor classification of mammograms. Compared with traditional augmentation methods, and with images synthesized by Generative Adversarial Networks our method not only achieves state-of-the-art performance but also significantly improves the network's robustness. △ Less

Submitted 14 January, 2019; originally announced January 2019.

Comments: Under Review for the 26th International Conference on Information Processing in Medical Imaging (IPMI) 2019

arXiv:1811.09800 [pdf, other]

Bayesian QuickNAT: Model Uncertainty in Deep Whole-Brain Segmentation for Structure-wise Quality Control

Authors: Abhijit Guha Roy, Sailesh Conjeti, Nassir Navab, Christian Wachinger

Abstract: We introduce Bayesian QuickNAT for the automated quality control of whole-brain segmentation on MRI T1 scans. Next to the Bayesian fully convolutional neural network, we also present inherent measures of segmentation uncertainty that allow for quality control per brain structure. For estimating model uncertainty, we follow a Bayesian approach, wherein, Monte Carlo (MC) samples from the posterior d… ▽ More We introduce Bayesian QuickNAT for the automated quality control of whole-brain segmentation on MRI T1 scans. Next to the Bayesian fully convolutional neural network, we also present inherent measures of segmentation uncertainty that allow for quality control per brain structure. For estimating model uncertainty, we follow a Bayesian approach, wherein, Monte Carlo (MC) samples from the posterior distribution are generated by kee** the dropout layers active at test time. Entropy over the MC samples provides a voxel-wise model uncertainty map, whereas expectation over the MC predictions provides the final segmentation. Next to voxel-wise uncertainty, we introduce four metrics to quantify structure-wise uncertainty in segmentation for quality control. We report experiments on four out-of-sample datasets comprising of diverse age range, pathology and imaging artifacts. The proposed structure-wise uncertainty metrics are highly correlated with the Dice score estimated with manual annotation and therefore present an inherent measure of segmentation quality. In particular, the intersection over union over all the MC samples is a suitable proxy for the Dice score. In addition to quality control at scan-level, we propose to incorporate the structure-wise uncertainty as a measure of confidence to do reliable group analysis on large data repositories. We envisage that the introduced uncertainty metrics would help assess the fidelity of automated deep learning based segmentation methods for large-scale population studies, as they enable automated quality control and group analyses in processing large data repositories. △ Less

Submitted 24 November, 2018; originally announced November 2018.

Comments: Under Review in NeuroImage

arXiv:1810.05735 [pdf, other]

doi 10.1109/ISBI.2018.8363542

InfiNet: Fully Convolutional Networks for Infant Brain MRI Segmentation

Authors: Shubham Kumar, Sailesh Conjeti, Abhijit Guha Roy, Christian Wachinger, Nassir Navab

Abstract: We present a novel, parameter-efficient and practical fully convolutional neural network architecture, termed InfiNet, aimed at voxel-wise semantic segmentation of infant brain MRI images at iso-intense stage, which can be easily extended for other segmentation tasks involving multi-modalities. InfiNet consists of double encoder arms for T1 and T2 input scans that feed into a joint-decoder arm tha… ▽ More We present a novel, parameter-efficient and practical fully convolutional neural network architecture, termed InfiNet, aimed at voxel-wise semantic segmentation of infant brain MRI images at iso-intense stage, which can be easily extended for other segmentation tasks involving multi-modalities. InfiNet consists of double encoder arms for T1 and T2 input scans that feed into a joint-decoder arm that terminates in the classification layer. The novelty of InfiNet lies in the manner in which the decoder upsamples lower resolution input feature map(s) from multiple encoder arms. Specifically, the pooled indices computed in the max-pooling layers of each of the encoder blocks are related to the corresponding decoder block to perform non-linear learning-free upsampling. The sparse maps are concatenated with intermediate encoder representations (skip connections) and convolved with trainable filters to produce dense feature maps. InfiNet is trained end-to-end to optimize for the Generalized Dice Loss, which is well-suited for high class imbalance. InfiNet achieves the whole-volume segmentation in under 50 seconds and we demonstrate competitive performance against multiple state-of-the art deep architectures and their multi-modal variants. △ Less

Submitted 11 October, 2018; originally announced October 2018.

Comments: 4 pages, 3 figures, conference, IEEE ISBI, 2018

ACM Class: I.2.10; I.2.4; I.4.10; I.2.1; I.4.6

Journal ref: Kumar, Shubham, et al. ISBI, IEEE (2018)(pp. 145-148)

arXiv:1810.05733 [pdf, other]

doi 10.1007/978-3-030-00889-5_26

Learning Optimal Deep Projection of $^{18}$F-FDG PET Imaging for Early Differential Diagnosis of Parkinsonian Syndromes

Authors: Shubham Kumar, Abhijit Guha Roy, ** Wu, Sailesh Conjeti, R. S. Anand, Jian Wang, Igor Yakushev, Stefan Förster, Markus Schwaiger, Sung-Cheng Huang, Axel Rominger, Chuantao Zuo, Kuangyu Shi

Abstract: Several diseases of parkinsonian syndromes present similar symptoms at early stage and no objective widely used diagnostic methods have been approved until now. Positron emission tomography (PET) with $^{18}$F-FDG was shown to be able to assess early neuronal dysfunction of synucleinopathies and tauopathies. Tensor factorization (TF) based approaches have been applied to identify characteristic me… ▽ More Several diseases of parkinsonian syndromes present similar symptoms at early stage and no objective widely used diagnostic methods have been approved until now. Positron emission tomography (PET) with $^{18}$F-FDG was shown to be able to assess early neuronal dysfunction of synucleinopathies and tauopathies. Tensor factorization (TF) based approaches have been applied to identify characteristic metabolic patterns for differential diagnosis. However, these conventional dimension-reduction strategies assume linear or multi-linear relationships inside data, and are therefore insufficient to distinguish nonlinear metabolic differences between various parkinsonian syndromes. In this paper, we propose a Deep Projection Neural Network (DPNN) to identify characteristic metabolic pattern for early differential diagnosis of parkinsonian syndromes. We draw our inspiration from the existing TF methods. The network consists of a (i) compression part: which uses a deep network to learn optimal 2D projections of 3D scans, and a (ii) classification part: which maps the 2D projections to labels. The compression part can be pre-trained using surplus unlabelled datasets. Also, as the classification part operates on these 2D projections, it can be trained end-to-end effectively with limited labelled data, in contrast to 3D approaches. We show that DPNN is more effective in comparison to existing state-of-the-art and plausible baselines. △ Less

Submitted 11 October, 2018; originally announced October 2018.

Comments: 8 pages, 3 figures, conference, MICCAI DLMIA, 2018

ACM Class: I.2.10; I.2.4; I.4.10; I.2.1

Journal ref: Kumar, Shubham, et al. DLMIA, Springer, Cham, 2018. 227-235

arXiv:1808.08127 [pdf, other]

Recalibrating Fully Convolutional Networks with Spatial and Channel 'Squeeze & Excitation' Blocks

Authors: Abhijit Guha Roy, Nassir Navab, Christian Wachinger

Abstract: In a wide range of semantic segmentation tasks, fully convolutional neural networks (F-CNNs) have been successfully leveraged to achieve state-of-the-art performance. Architectural innovations of F-CNNs have mainly been on improving spatial encoding or network connectivity to aid gradient flow. In this article, we aim towards an alternate direction of recalibrating the learned feature maps adaptiv… ▽ More In a wide range of semantic segmentation tasks, fully convolutional neural networks (F-CNNs) have been successfully leveraged to achieve state-of-the-art performance. Architectural innovations of F-CNNs have mainly been on improving spatial encoding or network connectivity to aid gradient flow. In this article, we aim towards an alternate direction of recalibrating the learned feature maps adaptively; boosting meaningful features while suppressing weak ones. The recalibration is achieved by simple computational blocks that can be easily integrated in F-CNNs architectures. We draw our inspiration from the recently proposed 'squeeze & excitation' (SE) modules for channel recalibration for image classification. Towards this end, we introduce three variants of SE modules for segmentation, (i) squeezing spatially and exciting channel-wise, (ii) squeezing channel-wise and exciting spatially and (iii) joint spatial and channel 'squeeze & excitation'. We effectively incorporate the proposed SE blocks in three state-of-the-art F-CNNs and demonstrate a consistent improvement of segmentation accuracy on three challenging benchmark datasets. Importantly, SE blocks only lead to a minimal increase in model complexity of about 1.5%, while the Dice score increases by 4-9% in the case of U-Net. Hence, we believe that SE blocks can be an integral part of future F-CNN architectures. △ Less

Submitted 23 August, 2018; originally announced August 2018.

Comments: Accepted for publication in IEEE Transactions on Medical Imaging. arXiv admin note: text overlap with arXiv:1803.02579

arXiv:1806.11475 [pdf, other]

SynNet: Structure-Preserving Fully Convolutional Networks for Medical Image Synthesis

Authors: Deepa Gunashekar, Sailesh Conjeti, Abhijit Guha Roy, Nassir Navab, Kuangyu Shi

Abstract: Cross modal image syntheses is gaining significant interests for its ability to estimate target images of a different modality from a given set of source images,like estimating MR to MR, MR to CT, CT to PET etc, without the need for an actual acquisition.Though they show potential for applications in radiation therapy planning,image super resolution, atlas construction, image segmentation etc.The… ▽ More Cross modal image syntheses is gaining significant interests for its ability to estimate target images of a different modality from a given set of source images,like estimating MR to MR, MR to CT, CT to PET etc, without the need for an actual acquisition.Though they show potential for applications in radiation therapy planning,image super resolution, atlas construction, image segmentation etc.The synthesis results are not as accurate as the actual acquisition.In this paper,we address the problem of multi modal image synthesis by proposing a fully convolutional deep learning architecture called the SynNet.We extend the proposed architecture for various input output configurations. And finally, we propose a structure preserving custom loss function for cross-modal image synthesis.We validate the proposed SynNet and its extended framework on BRATS dataset with comparisons against three state-of-the art methods.And the results of the proposed custom loss function is validated against the traditional loss function used by the state-of-the-art methods for cross modal image synthesis. △ Less

Submitted 29 June, 2018; originally announced June 2018.

Comments: 13 pages, 5 figures

arXiv:1804.07046 [pdf, other]

Inherent Brain Segmentation Quality Control from Fully ConvNet Monte Carlo Sampling

Authors: Abhijit Guha Roy, Sailesh Conjeti, Nassir Navab, Christian Wachinger

Abstract: We introduce inherent measures for effective quality control of brain segmentation based on a Bayesian fully convolutional neural network, using model uncertainty. Monte Carlo samples from the posterior distribution are efficiently generated using dropout at test time. Based on these samples, we introduce next to a voxel-wise uncertainty map also three metrics for structure-wise uncertainty. We th… ▽ More We introduce inherent measures for effective quality control of brain segmentation based on a Bayesian fully convolutional neural network, using model uncertainty. Monte Carlo samples from the posterior distribution are efficiently generated using dropout at test time. Based on these samples, we introduce next to a voxel-wise uncertainty map also three metrics for structure-wise uncertainty. We then incorporate these structure-wise uncertainty in group analyses as a measure of confidence in the observation. Our results show that the metrics are highly correlated to segmentation accuracy and therefore present an inherent measure of segmentation quality. Furthermore, group analysis with uncertainty results in effect sizes closer to that of manual annotations. The introduced uncertainty metrics can not only be very useful in translation to clinical practice but also provide automated quality control and group analyses in processing large data repositories. △ Less

Submitted 8 June, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

Comments: Accepted at MICCAI 2018

arXiv:1803.02579 [pdf, other]

Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks

Authors: Abhijit Guha Roy, Nassir Navab, Christian Wachinger

Abstract: Fully convolutional neural networks (F-CNNs) have set the state-of-the-art in image segmentation for a plethora of applications. Architectural innovations within F-CNNs have mainly focused on improving spatial encoding or network connectivity to aid gradient flow. In this paper, we explore an alternate direction of recalibrating the feature maps adaptively, to boost meaningful features, while supp… ▽ More Fully convolutional neural networks (F-CNNs) have set the state-of-the-art in image segmentation for a plethora of applications. Architectural innovations within F-CNNs have mainly focused on improving spatial encoding or network connectivity to aid gradient flow. In this paper, we explore an alternate direction of recalibrating the feature maps adaptively, to boost meaningful features, while suppressing weak ones. We draw inspiration from the recently proposed squeeze & excitation (SE) module for channel recalibration of feature maps for image classification. Towards this end, we introduce three variants of SE modules for image segmentation, (i) squeezing spatially and exciting channel-wise (cSE), (ii) squeezing channel-wise and exciting spatially (sSE) and (iii) concurrent spatial and channel squeeze & excitation (scSE). We effectively incorporate these SE modules within three different state-of-the-art F-CNNs (DenseNet, SD-Net, U-Net) and observe consistent improvement of performance across all architectures, while minimally effecting model complexity. Evaluations are performed on two challenging applications: whole brain segmentation on MRI scans (Multi-Atlas Labelling Challenge Dataset) and organ segmentation on whole body contrast enhanced CT scans (Visceral Dataset). △ Less

Submitted 8 June, 2018; v1 submitted 7 March, 2018; originally announced March 2018.

Comments: Accepted at MICCAI 2018

arXiv:1802.07724 [pdf, other]

doi 10.23731/CYRM-2017-001

Feasibility Study for BioLEIR

Authors: S. Ghithan, G. Roy, S. Schuh

Abstract: The biomedical community asked CERN to investigate the possibility to transform the Low Energy Ion Ring (LEIR) accelerator into a multidisciplinary, biomedical research facility (BioLEIR) that could provide ample, high-quality beams of a range of light ions suitable for clinically oriented fundamental research on cell cultures and for radiation instrumentation development. BioLEIR would be operate… ▽ More The biomedical community asked CERN to investigate the possibility to transform the Low Energy Ion Ring (LEIR) accelerator into a multidisciplinary, biomedical research facility (BioLEIR) that could provide ample, high-quality beams of a range of light ions suitable for clinically oriented fundamental research on cell cultures and for radiation instrumentation development. BioLEIR would be operated when LEIR is not providing heavy ions for the CERN physics programme. The study group was mandated to write a Feasibility Study Report, using high-level engineering estimates based on previous experience, with the aim to: - collect the requirements for such a facility from the biomedical community in close collaboration with the International Strategy Committee for CERN Medical Applications; - determine a coherent set of beam parameters, based on the requirements; - explore whether the beam requirements can be met throughout the facility, from the source to the biomedical end-stations; - perform a feasibility study of the facility, taking into consideration the overall CERN schedules and programmes; - favour simplicity and robustness of the facility design, while minimizing the cost of maintenance and operation; - establish a high-level costing of material and personnel needed for project implementation; - describe the preferred installation scenario; - perform a high-level risk analysis for the project; - identify the areas of potential difficulty, and the required R&D should the study go ahead and become a project. △ Less

Submitted 21 February, 2018; originally announced February 2018.

Comments: 183 pages

Report number: CERN-2017-001-M

arXiv:1801.04161 [pdf, other]

QuickNAT: A Fully Convolutional Network for Quick and Accurate Segmentation of Neuroanatomy

Authors: Abhijit Guha Roy, Sailesh Conjeti, Nassir Navab, Christian Wachinger

Abstract: Whole brain segmentation from structural magnetic resonance imaging (MRI) is a prerequisite for most morphological analyses, but is computationally intense and can therefore delay the availability of image markers after scan acquisition. We introduce QuickNAT, a fully convolutional, densely connected neural network that segments a \revision{MRI brain scan} in 20 seconds. To enable training of the… ▽ More Whole brain segmentation from structural magnetic resonance imaging (MRI) is a prerequisite for most morphological analyses, but is computationally intense and can therefore delay the availability of image markers after scan acquisition. We introduce QuickNAT, a fully convolutional, densely connected neural network that segments a \revision{MRI brain scan} in 20 seconds. To enable training of the complex network with millions of learnable parameters using limited annotated data, we propose to first pre-train on auxiliary labels created from existing segmentation software. Subsequently, the pre-trained model is fine-tuned on manual labels to rectify errors in auxiliary labels. With this learning strategy, we are able to use large neuroimaging repositories without manual annotations for training. In an extensive set of evaluations on eight datasets that cover a wide age range, pathology, and different scanners, we demonstrate that QuickNAT achieves superior segmentation accuracy and reliability in comparison to state-of-the-art methods, while being orders of magnitude faster. The speed up facilitates processing of large data repositories and supports translation of imaging biomarkers by making them available within seconds for fast clinical decision making. △ Less

Submitted 24 November, 2018; v1 submitted 12 January, 2018; originally announced January 2018.

Comments: Accepted for Publication at NeuroImage

arXiv:1712.06982 [pdf, other]

doi 10.1007/s41781-018-0018-8

A Roadmap for HEP Software and Computing R&D for the 2020s

Authors: Johannes Albrecht, Antonio Augusto Alves Jr, Guilherme Amadio, Giuseppe Andronico, Nguyen Anh-Ky, Laurent Aphecetche, John Apostolakis, Makoto Asai, Luca Atzori, Marian Babik, Giuseppe Bagliesi, Marilena Bandieramonte, Sunanda Banerjee, Martin Barisits, Lothar A. T. Bauerdick, Stefano Belforte, Douglas Benjamin, Catrin Bernius, Wahid Bhimji, Riccardo Maria Bianchi, Ian Bird, Catherine Biscarat, Jakob Blomer, Kenneth Bloom, Tommaso Boccali , et al. (285 additional authors not shown)

Abstract: Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for… ▽ More Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade. △ Less

Submitted 19 December, 2018; v1 submitted 18 December, 2017; originally announced December 2017.

Report number: HSF-CWP-2017-01

Journal ref: Comput Softw Big Sci (2019) 3, 7

arXiv:1705.00938 [pdf, other]

Error Corrective Boosting for Learning Fully Convolutional Networks with Limited Data

Authors: Abhijit Guha Roy, Sailesh Conjeti, Debdoot Sheet, Amin Katouzian, Nassir Navab, Christian Wachinger

Abstract: Training deep fully convolutional neural networks (F-CNNs) for semantic image segmentation requires access to abundant labeled data. While large datasets of unlabeled image data are available in medical applications, access to manually labeled data is very limited. We propose to automatically create auxiliary labels on initially unlabeled data with existing tools and to use them for pre-training.… ▽ More Training deep fully convolutional neural networks (F-CNNs) for semantic image segmentation requires access to abundant labeled data. While large datasets of unlabeled image data are available in medical applications, access to manually labeled data is very limited. We propose to automatically create auxiliary labels on initially unlabeled data with existing tools and to use them for pre-training. For the subsequent fine-tuning of the network with manually labeled data, we introduce error corrective boosting (ECB), which emphasizes parameter updates on classes with lower accuracy. Furthermore, we introduce SkipDeconv-Net (SD-Net), a new F-CNN architecture for brain segmentation that combines skip connections with the unpooling strategy for upsampling. The SD-Net addresses challenges of severe class imbalance and errors along boundaries. With application to whole-brain MRI T1 scan segmentation, we generate auxiliary labels on a large dataset with FreeSurfer and fine-tune on two datasets with manual annotations. Our results show that the inclusion of auxiliary labels and ECB yields significant improvements. SD-Net segments a 3D scan in 7 secs in comparison to 30 hours for the closest multi-atlas segmentation method, while reaching similar performance. It also outperforms the latest state-of-the-art F-CNN models. △ Less

Submitted 2 July, 2017; v1 submitted 2 May, 2017; originally announced May 2017.

Comments: Accepted at MICCAI 2017

arXiv:1704.02161 [pdf, other]

ReLayNet: Retinal Layer and Fluid Segmentation of Macular Optical Coherence Tomography using Fully Convolutional Network

Authors: Abhijit Guha Roy, Sailesh Conjeti, Sri Phani Krishna Karri, Debdoot Sheet, Amin Katouzian, Christian Wachinger, Nassir Navab

Abstract: Optical coherence tomography (OCT) is used for non-invasive diagnosis of diabetic macular edema assessing the retinal layers. In this paper, we propose a new fully convolutional deep architecture, termed ReLayNet, for end-to-end segmentation of retinal layers and fluid masses in eye OCT scans. ReLayNet uses a contracting path of convolutional blocks (encoders) to learn a hierarchy of contextual fe… ▽ More Optical coherence tomography (OCT) is used for non-invasive diagnosis of diabetic macular edema assessing the retinal layers. In this paper, we propose a new fully convolutional deep architecture, termed ReLayNet, for end-to-end segmentation of retinal layers and fluid masses in eye OCT scans. ReLayNet uses a contracting path of convolutional blocks (encoders) to learn a hierarchy of contextual features, followed by an expansive path of convolutional blocks (decoders) for semantic segmentation. ReLayNet is trained to optimize a joint loss function comprising of weighted logistic regression and Dice overlap loss. The framework is validated on a publicly available benchmark dataset with comparisons against five state-of-the-art segmentation methods including two deep learning based approaches to substantiate its effectiveness. △ Less

Submitted 7 July, 2017; v1 submitted 7 April, 2017; originally announced April 2017.

Comments: Accepted for Publication at Biomedical Optics Express

Showing 1–50 of 57 results for author: Roy, G