-
Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval
Authors:
Ravisri Valluri,
Akash Kumar Mohankumar,
Kushal Dave,
Amit Singh,
Jian Jiao,
Manik Varma,
Gaurav Sinha
Abstract:
Generative Retrieval introduces a new approach to Information Retrieval by reframing it as a constrained generation task, leveraging recent advancements in Autoregressive (AR) language models. However, AR-based Generative Retrieval methods suffer from high inference latency and cost compared to traditional dense retrieval techniques, limiting their practical applicability. This paper investigates…
▽ More
Generative Retrieval introduces a new approach to Information Retrieval by reframing it as a constrained generation task, leveraging recent advancements in Autoregressive (AR) language models. However, AR-based Generative Retrieval methods suffer from high inference latency and cost compared to traditional dense retrieval techniques, limiting their practical applicability. This paper investigates fully Non-autoregressive (NAR) language models as a more efficient alternative for generative retrieval. While standard NAR models alleviate latency and cost concerns, they exhibit a significant drop in retrieval performance (compared to AR models) due to their inability to capture dependencies between target tokens. To address this, we question the conventional choice of limiting the target token space to solely words or sub-words. We propose PIXAR, a novel approach that expands the target vocabulary of NAR models to include multi-word entities and common phrases (up to 5 million tokens), thereby reducing token dependencies. PIXAR employs inference optimization strategies to maintain low inference latency despite the significantly larger vocabulary. Our results demonstrate that PIXAR achieves a relative improvement of 31.0% in MRR@10 on MS MARCO and 23.2% in Hits@5 on Natural Questions compared to standard NAR models with similar latency and cost. Furthermore, online A/B experiments on a large commercial search engine show that PIXAR increases ad clicks by 5.08% and revenue by 4.02%.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Merlin: A Vision Language Foundation Model for 3D Computed Tomography
Authors:
Louis Blankemeier,
Joseph Paul Cohen,
Ashwin Kumar,
Dave Van Veen,
Syed Jamal Safdar Gardezi,
Magdalini Paschali,
Zhihong Chen,
Jean-Benoit Delbrouck,
Eduardo Reis,
Cesar Truyts,
Christian Bluethgen,
Malte Engmann Kjeldskov Jensen,
Sophie Ostmeier,
Maya Varma,
Jeya Maria Jose Valanarasu,
Zhongnan Fang,
Zepeng Huo,
Zaid Nabulsi,
Diego Ardila,
Wei-Hung Weng,
Edson Amaro Junior,
Neera Ahuja,
Jason Fries,
Nigam H. Shah,
Andrew Johnston
, et al. (6 additional authors not shown)
Abstract:
Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision la…
▽ More
Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs). However, current medical VLMs are generally limited to 2D images and short reports, and do not leverage electronic health record (EHR) data for supervision. We introduce Merlin - a 3D VLM that we train using paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens). We evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats
Authors:
Pierre Chambon,
Jean-Benoit Delbrouck,
Thomas Sounack,
Shih-Cheng Huang,
Zhihong Chen,
Maya Varma,
Steven QH Truong,
Chu The Chuong,
Curtis P. Langlotz
Abstract:
Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a…
▽ More
Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a new collection of radiology data sources, made publicly available to enhance the scaling, performance, robustness, and fairness of models for all subsequent machine learning tasks in the field of radiology. CheXpert Plus is the largest text dataset publicly released in radiology, with a total of 36 million text tokens, including 13 million impression tokens. To the best of our knowledge, it represents the largest text de-identification effort in radiology, with almost 1 million PHI spans anonymized. It is only the second time that a large-scale English paired dataset has been released in radiology, thereby enabling, for the first time, cross-institution training at scale. All reports are paired with high-quality images in DICOM format, along with numerous image and patient metadata covering various clinical and socio-economic groups, as well as many pathology labels and RadGraph annotations. We hope this dataset will boost research for AI models that can further assist radiologists and help improve medical care. Data is available at the following URL: https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 Models are available at the following URL: https://github.com/Stanford-AIMI/chexpert-plus
△ Less
Submitted 3 June, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
Authors:
Qi Chen,
Xiubo Geng,
Corby Rosset,
Carolyn Buractaon,
**gwen Lu,
Tao Shen,
Kun Zhou,
Chenyan Xiong,
Yeyun Gong,
Paul Bennett,
Nick Craswell,
Xing Xie,
Fan Yang,
Bryan Tower,
Nikhil Rao,
Anlei Dong,
Wenqi Jiang,
Zheng Liu,
Mingqin Li,
Chuanjie Liu,
Zengzhong Li,
Rangan Majumder,
Jennifer Neville,
Andy Oakley,
Knut Magne Risvik
, et al. (6 additional authors not shown)
Abstract:
Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of down…
▽ More
Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of downstream tasks and encourages research in various areas, such as generic end-to-end neural indexer models, generic embedding models, and next generation information access system with large language models. MS MARCO Web Search offers a retrieval benchmark with three web retrieval challenge tasks that demand innovations in both machine learning and information retrieval system research domains. As the first dataset that meets large, real and rich data requirements, MS MARCO Web Search paves the way for future advancements in AI and system research. MS MARCO Web Search dataset is available at: https://github.com/microsoft/MS-MARCO-Web-Search.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
GREEN: Generative Radiology Report Evaluation and Error Notation
Authors:
Sophie Ostmeier,
Justin Xu,
Zhihong Chen,
Maya Varma,
Louis Blankemeier,
Christian Bluethgen,
Arne Edward Michalson,
Michael Moseley,
Curtis Langlotz,
Akshay S Chaudhari,
Jean-Benoit Delbrouck
Abstract:
Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to the need for accurate medical communication about medical images. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GRE…
▽ More
Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to the need for accurate medical communication about medical images. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GREEN (Generative Radiology Report Evaluation and Error Notation), a radiology report generation metric that leverages the natural language understanding of language models to identify and explain clinically significant errors in candidate reports, both quantitatively and qualitatively. Compared to current metrics, GREEN offers: 1) a score aligned with expert preferences, 2) human interpretable explanations of clinically significant errors, enabling feedback loops with end-users, and 3) a lightweight open-source method that reaches the performance of commercial counterparts. We validate our GREEN metric by comparing it to GPT-4, as well as to error counts of 6 experts and preferences of 2 experts. Our method demonstrates not only higher correlation with expert error counts, but simultaneously higher alignment with expert preferences when compared to previous approaches."
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Graph Regularized Encoder Training for Extreme Classification
Authors:
Anshul Mittal,
Shikhar Mohan,
Deepak Saini,
Suchith C. Prabhu,
Jain jiao,
Sumeet Agarwal,
Soumen Chakrabarti,
Purushottam Kar,
Manik Varma
Abstract:
Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) prese…
▽ More
Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) present a convenient but computationally expensive way to leverage task metadata and enhance model accuracies in these settings. This paper formally establishes that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures. The paper notices that in these settings, it is much more effective to use graph data to regularize encoder training than to implement a GCN. Based on these insights, an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs. RAMEN scales to datasets with up to 1M labels and offers prediction accuracy up to 15% higher on benchmark datasets than state of the art methods, including those that use graph metadata to train GCNs. RAMEN also offers 10% higher accuracy over the best baseline on a proprietary recommendation dataset sourced from click logs of a popular search engine. Code for RAMEN will be released publicly.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
Authors:
Zhihong Chen,
Maya Varma,
Jean-Benoit Delbrouck,
Magdalini Paschali,
Louis Blankemeier,
Dave Van Veen,
Jeya Maria Jose Valanarasu,
Alaa Youssef,
Joseph Paul Cohen,
Eduardo Pontes Reis,
Emily B. Tsai,
Andrew Johnston,
Cameron Olsen,
Tanishq Mathew Abraham,
Sergios Gatidis,
Akshay S. Chaudhari,
Curtis Langlotz
Abstract:
Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, develo** FMs that can accurately interpret CXRs is challengin…
▽ More
Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, develo** FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation. In this work, we address these challenges by first introducing \emph{CheXinstruct} - a large-scale instruction-tuning dataset curated from 28 publicly-available datasets. We then present \emph{CheXagent} - an instruction-tuned FM capable of analyzing and summarizing CXRs. To build CheXagent, we design a clinical large language model (LLM) for parsing radiology reports, a vision encoder for representing CXR images, and a network to bridge the vision and language modalities. Finally, we introduce \emph{CheXbench} - a novel benchmark designed to systematically evaluate FMs across 8 clinically-relevant CXR interpretation tasks. Extensive quantitative evaluations and qualitative reviews with five expert radiologists demonstrate that CheXagent outperforms previously-developed general- and medical-domain FMs on CheXbench tasks. Furthermore, in an effort to improve model transparency, we perform a fairness evaluation across factors of sex, race and age to highlight potential performance disparities. Our project is at \url{https://stanford-aimi.github.io/chexagent.html}.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Multi-modal Extreme Classification
Authors:
Anshul Mittal,
Kunal Dahiya,
Shreya Malani,
Janani Ramaswamy,
Seba Kuruvilla,
Jitendra Ajmera,
Keng-hao Chang,
Sumeet Agarwal,
Purushottam Kar,
Manik Varma
Abstract:
This paper develops the MUFIN technique for extreme classification (XC) tasks with millions of labels where datapoints and labels are endowed with visual and textual descriptors. Applications of MUFIN to product-to-product recommendation and bid query prediction over several millions of products are presented. Contemporary multi-modal methods frequently rely on purely embedding-based methods. On t…
▽ More
This paper develops the MUFIN technique for extreme classification (XC) tasks with millions of labels where datapoints and labels are endowed with visual and textual descriptors. Applications of MUFIN to product-to-product recommendation and bid query prediction over several millions of products are presented. Contemporary multi-modal methods frequently rely on purely embedding-based methods. On the other hand, XC methods utilize classifier architectures to offer superior accuracies than embedding only methods but mostly focus on text-based categorization tasks. MUFIN bridges this gap by reformulating multi-modal categorization as an XC problem with several millions of labels. This presents the twin challenges of develo** multi-modal architectures that can offer embeddings sufficiently expressive to allow accurate categorization over millions of labels; and training and inference routines that scale logarithmically in the number of labels. MUFIN develops an architecture based on cross-modal attention and trains it in a modular fashion using pre-training and positive and negative mining. A novel product-to-product recommendation dataset MM-AmazonTitles-300K containing over 300K products was curated from publicly available amazon.com listings with each product endowed with a title and multiple images. On the all datasets MUFIN offered at least 3% higher accuracy than leading text-based, image-based and multi-modal techniques. Code for MUFIN is available at https://github.com/Extreme-classification/MUFIN
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts
Authors:
Wenyan Cong,
Hanxue Liang,
Peihao Wang,
Zhiwen Fan,
Tianlong Chen,
Mukund Varma,
Yi Wang,
Zhangyang Wang
Abstract:
Cross-scene generalizable NeRF models, which can directly synthesize novel views of unseen scenes, have become a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end "neuralized" architectures, i.e., replacing scene representation and/or rendering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed-forward…
▽ More
Cross-scene generalizable NeRF models, which can directly synthesize novel views of unseen scenes, have become a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end "neuralized" architectures, i.e., replacing scene representation and/or rendering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed-forward inference pipeline. While those feedforward "neuralized" architectures still do not fit diverse scenes well out of the box, we propose to bridge them with the powerful Mixture-of-Experts (MoE) idea from large language models (LLMs), which has demonstrated superior generalization ability by balancing between larger overall model capacity and flexible per-instance specialization. Starting from a recent generalizable NeRF architecture called GNT, we first demonstrate that MoE can be neatly plugged in to enhance the model. We further customize a shared permanent expert and a geometry-aware consistency loss to enforce cross-scene consistency and spatial smoothness respectively, which are essential for generalizable view synthesis. Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of-the-art results when transferring to unseen scenes, indicating remarkably better cross-scene generalization in both zero-shot and few-shot settings. Our codes are available at https://github.com/VITA-Group/GNT-MOVE.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data
Authors:
Maya Varma,
Jean-Benoit Delbrouck,
Sarah Hooper,
Akshay Chaudhari,
Curtis Langlotz
Abstract:
Vision-language models (VLMs), such as CLIP and ALIGN, are generally trained on datasets consisting of image-caption pairs obtained from the web. However, real-world multimodal datasets, such as healthcare data, are significantly more complex: each image (e.g. X-ray) is often paired with text (e.g. physician report) that describes many distinct attributes occurring in fine-grained regions of the i…
▽ More
Vision-language models (VLMs), such as CLIP and ALIGN, are generally trained on datasets consisting of image-caption pairs obtained from the web. However, real-world multimodal datasets, such as healthcare data, are significantly more complex: each image (e.g. X-ray) is often paired with text (e.g. physician report) that describes many distinct attributes occurring in fine-grained regions of the image. We refer to these samples as exhibiting high pairwise complexity, since each image-text pair can be decomposed into a large number of region-attribute pairings. The extent to which VLMs can capture fine-grained relationships between image regions and textual attributes when trained on such data has not been previously evaluated. The first key contribution of this work is to demonstrate through systematic evaluations that as the pairwise complexity of the training dataset increases, standard VLMs struggle to learn region-attribute relationships, exhibiting performance degradations of up to 37% on retrieval tasks. In order to address this issue, we introduce ViLLA as our second key contribution. ViLLA, which is trained to capture fine-grained region-attribute relationships from complex datasets, involves two components: (a) a lightweight, self-supervised map** model to decompose image-text samples into region-attribute pairs, and (b) a contrastive VLM to learn representations from generated region-attribute pairs. We demonstrate with experiments across four domains (synthetic, product, medical, and natural images) that ViLLA outperforms comparable VLMs on fine-grained reasoning tasks, such as zero-shot object detection (up to 3.6 AP50 points on COCO and 0.6 mAP points on LVIS) and retrieval (up to 14.2 R-Precision points).
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Navigation of micro-robot swarms for targeted delivery using reinforcement learning
Authors:
Akshatha Jagadish,
Manoj Varma
Abstract:
Micro robotics is quickly emerging to be a promising technological solution to many medical treatments with focus on targeted drug delivery. They are effective when working in swarms whose individual control is mostly infeasible owing to their minute size. Controlling a number of robots with a single controller is thus important and artificial intelligence can help us perform this task successfull…
▽ More
Micro robotics is quickly emerging to be a promising technological solution to many medical treatments with focus on targeted drug delivery. They are effective when working in swarms whose individual control is mostly infeasible owing to their minute size. Controlling a number of robots with a single controller is thus important and artificial intelligence can help us perform this task successfully. In this work, we use the Reinforcement Learning (RL) algorithms Proximal Policy Optimization (PPO) and Robust Policy Optimization (RPO) to navigate a swarm of 4, 9 and 16 microswimmers under hydrodynamic effects, controlled by their orientation, towards a circular absorbing target. We look at both PPO and RPO performances with limited state information scenarios and also test their robustness for random target location and size. We use curriculum learning to improve upon the performance and demonstrate the same in learning to navigate a swarm of 25 swimmers and steering the swarm to exemplify the manoeuvring capabilities of the RL model.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Role of single particle motility statistics on efficiency of targeted delivery of micro-robot swarms
Authors:
Akshatha Jagadish,
Manoj Varma
Abstract:
The study of dynamics of single active particles plays an important role in the development of artificial or hybrid micro-systems for bio-medical and other applications at micro-scale. Here, we utilize the results of these studies to better understand their implications for the specific application of drug delivery. We analyze the variations in the capture efficiency for different types of motion…
▽ More
The study of dynamics of single active particles plays an important role in the development of artificial or hybrid micro-systems for bio-medical and other applications at micro-scale. Here, we utilize the results of these studies to better understand their implications for the specific application of drug delivery. We analyze the variations in the capture efficiency for different types of motion dynamics without inter-particle interactions and compare the results. We also discuss the reasons for the same and describe the specific parameters that affect the capture efficiency, which in turn helps in both hardware and control design of a micro-bot swarm system for drug delivery.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Toward expanding the scope of radiology report summarization to multiple anatomies and modalities
Authors:
Zhihong Chen,
Maya Varma,
Xiang Wan,
Curtis Langlotz,
Jean-Benoit Delbrouck
Abstract:
Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproductio…
▽ More
Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproduction of results and fair comparisons across different systems and solutions. Second, most prior approaches are evaluated solely on chest X-rays. To address these limitations, we propose a dataset (MIMIC-RRS) involving three new modalities and seven new anatomies based on the MIMIC-III and MIMIC-CXR datasets. We then conduct extensive experiments to evaluate the performance of models both within and across modality-anatomy pairs in MIMIC-RRS. In addition, we evaluate their clinical efficacy via RadGraph, a factual correctness metric.
△ Less
Submitted 21 July, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
NGAME: Negative Mining-aware Mini-batching for Extreme Classification
Authors:
Kunal Dahiya,
Nilesh Gupta,
Deepak Saini,
Akshay Soni,
Yajun Wang,
Kushal Dave,
Jian Jiao,
Gururaj K,
Prasenjit Dey,
Amit Singh,
Deepesh Hada,
Vidit Jain,
Bhawna Paliwal,
Anshul Mittal,
Sonu Mehta,
Ramachandran Ramjee,
Sumeet Agarwal,
Purushottam Kar,
Manik Varma
Abstract:
Extreme Classification (XC) seeks to tag data points with the most relevant subset of labels from an extremely large label set. Performing deep XC with dense, learnt representations for data points and labels has attracted much attention due to its superiority over earlier XC methods that used sparse, hand-crafted features. Negative mining techniques have emerged as a critical component of all dee…
▽ More
Extreme Classification (XC) seeks to tag data points with the most relevant subset of labels from an extremely large label set. Performing deep XC with dense, learnt representations for data points and labels has attracted much attention due to its superiority over earlier XC methods that used sparse, hand-crafted features. Negative mining techniques have emerged as a critical component of all deep XC methods that allow them to scale to millions of labels. However, despite recent advances, training deep XC models with large encoder architectures such as transformers remains challenging. This paper identifies that memory overheads of popular negative mining techniques often force mini-batch sizes to remain small and slow training down. In response, this paper introduces NGAME, a light-weight mini-batch creation technique that offers provably accurate in-batch negative samples. This allows training with larger mini-batches offering significantly faster convergence and higher accuracies than existing negative sampling techniques. NGAME was found to be up to 16% more accurate than state-of-the-art methods on a wide array of benchmark datasets for extreme classification, as well as 3% more accurate at retrieving search engine queries in response to a user webpage visit to show personalized ads. In live A/B tests on a popular search engine, NGAME yielded up to 23% gains in click-through-rates.
△ Less
Submitted 10 July, 2022;
originally announced July 2022.
-
Domino: Discovering Systematic Errors with Cross-Modal Embeddings
Authors:
Sabri Eyuboglu,
Maya Varma,
Khaled Saab,
Jean-Benoit Delbrouck,
Christopher Lee-Messer,
Jared Dunnmon,
James Zou,
Christopher Ré
Abstract:
Machine learning models that achieve high overall accuracy often make systematic errors on important subsets (or slices) of data. Identifying underperforming slices is particularly challenging when working with high-dimensional inputs (e.g. images, audio), where important slices are often unlabeled. In order to address this issue, recent studies have proposed automated slice discovery methods (SDM…
▽ More
Machine learning models that achieve high overall accuracy often make systematic errors on important subsets (or slices) of data. Identifying underperforming slices is particularly challenging when working with high-dimensional inputs (e.g. images, audio), where important slices are often unlabeled. In order to address this issue, recent studies have proposed automated slice discovery methods (SDMs), which leverage learned model representations to mine input data for slices on which a model performs poorly. To be useful to a practitioner, these methods must identify slices that are both underperforming and coherent (i.e. united by a human-understandable concept). However, no quantitative evaluation framework currently exists for rigorously assessing SDMs with respect to these criteria. Additionally, prior qualitative evaluations have shown that SDMs often identify slices that are incoherent. In this work, we address these challenges by first designing a principled evaluation framework that enables a quantitative comparison of SDMs across 1,235 slice discovery settings in three input domains (natural images, medical images, and time-series data). Then, motivated by the recent development of powerful cross-modal representation learning approaches, we present Domino, an SDM that leverages cross-modal embeddings and a novel error-aware mixture model to discover and describe coherent slices. We find that Domino accurately identifies 36% of the 1,235 slices in our framework - a 12 percentage point improvement over prior methods. Further, Domino is the first SDM that can provide natural language descriptions of identified slices, correctly generating the exact name of the slice in 35% of settings.
△ Less
Submitted 21 May, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents
Authors:
Kunal Dahiya,
Deepak Saini,
Anshul Mittal,
Ankush Shaw,
Kushal Dave,
Akshay Soni,
Himanshu Jain,
Sumeet Agarwal,
Manik Varma
Abstract:
Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub…
▽ More
Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub-tasks each of which can be trained accurately and efficiently. Choosing different components for the four sub-tasks allows DeepXML to generate a family of algorithms with varying trade-offs between accuracy and scalability. In particular, DeepXML yields the Astec algorithm that could be 2-12% more accurate and 5-30x faster to train than leading deep extreme classifiers on publically available short text datasets. Astec could also efficiently train on Bing short text datasets containing up to 62 million labels while making predictions for billions of users and data points per day on commodity hardware. This allowed Astec to be deployed on the Bing search engine for a number of short text applications ranging from matching user queries to advertiser bid phrases to showing personalized ads where it yielded significant gains in click-through-rates, coverage, revenue and other online metrics over state-of-the-art techniques currently in production. DeepXML's code is available at https://github.com/Extreme-classification/deepxml
△ Less
Submitted 12 November, 2021;
originally announced November 2021.
-
Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text
Authors:
Maya Varma,
Laurel Orr,
Sen Wu,
Megan Leszczynski,
Xiao Ling,
Christopher Ré
Abstract:
Named entity disambiguation (NED), which involves map** textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grained structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In th…
▽ More
Named entity disambiguation (NED), which involves map** textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grained structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In this work, we address these issues by proposing a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain. We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining. Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Furthermore, we improve disambiguation of rare entities by up to 57 accuracy points.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Transportation Polytope and its Applications in Parallel Server Systems
Authors:
Sushil Mahavir Varma,
Siva Theja Maguluri
Abstract:
A parallel server system is a stochastic processing network with applications in manufacturing, supply chain, ride-hailing, call centers, etc. Heterogeneous customers arrive in the system, and only a subset of servers can serve any customer type given by the flexibility graph. The goal of the system operator is to minimize the delay that depends on the scheduling policy and the flexibility graph.…
▽ More
A parallel server system is a stochastic processing network with applications in manufacturing, supply chain, ride-hailing, call centers, etc. Heterogeneous customers arrive in the system, and only a subset of servers can serve any customer type given by the flexibility graph. The goal of the system operator is to minimize the delay that depends on the scheduling policy and the flexibility graph. A long line of literature focuses on designing near-optimal scheduling policies given a flexibility graph. On the contrary, we fix the scheduling policy to be the so-called MaxWeight scheduling given its superior delay performance and focus on designing near-optimal, sparse flexibility graphs. Our contributions are threefold.
First, we analyze the expected delay in the heavy-traffic asymptotic regime in terms of the properties of the flexibility graph and use this result to translate the design question in terms of transportation polytope, the deterministic equivalent of parallel server queues. Second, we design the sparsest flexibility graph that achieves a given delay performance and shows the robustness of the design to demand uncertainty. Third, given the budget to add edges arrives sequentially in time, we present the optimal schedule for adding them to the flexibility graph. These results are obtained by proving new results for transportation polytopes and are of independent interest. In particular, translating the difficulties to a simpler model, i.e. transportation polytope, allows us to develop a unified framework to answer several design questions.
△ Less
Submitted 6 January, 2023; v1 submitted 11 August, 2021;
originally announced August 2021.
-
XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches
Authors:
V Manushree,
Sameer Saxena,
Parna Chowdhury,
Manisimha Varma,
Harsh Rathod,
Ankita Ghosh,
Sahil Khose
Abstract:
Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color…
▽ More
Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations.
△ Less
Submitted 7 January, 2022; v1 submitted 25 August, 2021;
originally announced August 2021.
-
DECAF: Deep Extreme Classification with Label Features
Authors:
Anshul Mittal,
Kunal Dahiya,
Sheshansh Agrawal,
Deepak Saini,
Sumeet Agarwal,
Purushottam Kar,
Manik Varma
Abstract:
Extreme multi-label classification (XML) involves tagging a data point with its most relevant subset of labels from an extremely large label set, with several applications such as product-to-product recommendation with millions of products. Although leading XML algorithms scale to millions of labels, they largely ignore label meta-data such as textual descriptions of the labels. On the other hand,…
▽ More
Extreme multi-label classification (XML) involves tagging a data point with its most relevant subset of labels from an extremely large label set, with several applications such as product-to-product recommendation with millions of products. Although leading XML algorithms scale to millions of labels, they largely ignore label meta-data such as textual descriptions of the labels. On the other hand, classical techniques that can utilize label metadata via representation learning using deep networks struggle in extreme settings. This paper develops the DECAF algorithm that addresses these challenges by learning models enriched by label metadata that jointly learn model parameters and feature representations using deep networks and offer accurate classification at the scale of millions of labels. DECAF makes specific contributions to model architecture design, initialization, and training, enabling it to offer up to 2-6% more accurate prediction than leading extreme classifiers on publicly available benchmark product-to-product recommendation datasets, such as LF-AmazonTitles-1.3M. At the same time, DECAF was found to be up to 22x faster at inference than leading deep extreme classifiers, which makes it suitable for real-time applications that require predictions within a few milliseconds. The code for DECAF is available at the following URL https://github.com/Extreme-classification/DECAF.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
ECLARE: Extreme Classification with Label Graph Correlations
Authors:
Anshul Mittal,
Noveen Sachdeva,
Sheshansh Agrawal,
Sumeet Agarwal,
Purushottam Kar,
Manik Varma
Abstract:
Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during training. Such rare labels hold the key to personalized recommendations that can delight and surprise a user. However, the large number of rare labels a…
▽ More
Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during training. Such rare labels hold the key to personalized recommendations that can delight and surprise a user. However, the large number of rare labels and small amount of training data per rare label offer significant statistical and computational challenges. State-of-the-art deep XC methods attempt to remedy this by incorporating textual descriptions of labels but do not adequately address the problem. This paper presents ECLARE, a scalable deep learning architecture that incorporates not only label text, but also label correlations, to offer accurate real-time predictions within a few milliseconds. Core contributions of ECLARE include a frugal architecture and scalable techniques to train deep models along with label correlation graphs at the scale of millions of labels. In particular, ECLARE offers predictions that are 2 to 14% more accurate on both publicly available benchmark datasets as well as proprietary datasets for a related products recommendation task sourced from the Bing search engine. Code for ECLARE is available at https://github.com/Extreme-classification/ECLARE.
△ Less
Submitted 31 July, 2021;
originally announced August 2021.
-
Quality-Quantity Trade-offs in Tests for Management of COVID-19-like Epidemics
Authors:
Harish Sasikumar,
Manoj Varma
Abstract:
There are multiple testing methods to ascertain an infection in an individual and they vary in their performances, cost and delay. Unfortunately, better performing tests are sometimes costlier and time consuming and can only be done for a small fraction of the population. On the other hand, greater number of individuals can be tested using a cheaper, rapid test, but may only provide less reliable…
▽ More
There are multiple testing methods to ascertain an infection in an individual and they vary in their performances, cost and delay. Unfortunately, better performing tests are sometimes costlier and time consuming and can only be done for a small fraction of the population. On the other hand, greater number of individuals can be tested using a cheaper, rapid test, but may only provide less reliable results. In this work, we studied the interplay between cost and delay of the tests as well the additional advantages offered by partial and complete lockdowns. To understand the influence of different test strategies, we implemented them on realistic random social networks with a COVID-19-like epidemic in progression. Specifically, we compared the performance of two tests mimicking the characteristics of popular tests implemented for COVID-19 detection. We present procedures and intuitive understanding to ascertain the optimum combination of the tests to minimize the peak infection as well as total quarantine days when the number of tests is constrained by a fixed total budget.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
On the Linear convergence of Natural Policy Gradient Algorithm
Authors:
Sajad Khodadadian,
Prakirt Raj Jhunjhunwala,
Sushil Mahavir Varma,
Siva Theja Maguluri
Abstract:
Markov Decision Processes are classically solved using Value Iteration and Policy Iteration algorithms. Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization, such as gradient ascent. Among these, a popular algorithm is the Natural Policy Gradient, which is a mirror descent variant for MDPs. This algorithm forms the basis of several popular Reinforce…
▽ More
Markov Decision Processes are classically solved using Value Iteration and Policy Iteration algorithms. Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization, such as gradient ascent. Among these, a popular algorithm is the Natural Policy Gradient, which is a mirror descent variant for MDPs. This algorithm forms the basis of several popular Reinforcement Learning algorithms such as Natural actor-critic, TRPO, PPO, etc, and so is being studied with growing interest. It has been shown that Natural Policy Gradient with constant step size converges with a sublinear rate of O(1/k) to the global optimal. In this paper, we present improved finite time convergence bounds, and show that this algorithm has geometric (also known as linear) asymptotic convergence rate. We further improve this convergence result by introducing a variant of Natural Policy Gradient with adaptive step sizes. Finally, we compare different variants of policy gradient methods experimentally.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study
Authors:
Peter Washington,
Haik Kalantarian,
John Kent,
Arman Husic,
Aaron Kline,
Emilie Leblanc,
Cathy Hou,
Onur Cezmi Mutlu,
Kaitlyn Dunlap,
Yordan Penev,
Maya Varma,
Nate Tyler Stockham,
Brianna Chrisman,
Kelley Paskov,
Min Woo Sun,
Jae-Yoon Jung,
Catalin Voss,
Nick Haber,
Dennis Paul Wall
Abstract:
Background: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. Objective: We designed a strategy to gamify the collection and labeling of child emot…
▽ More
Background: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. Objective: We designed a strategy to gamify the collection and labeling of child emotion-enriched images to boost the performance of automatic child emotion recognition models to a level closer to what will be needed for digital health care approaches. Methods: We leveraged our prototype therapeutic smartphone game, GuessWhat, which was designed in large part for children with developmental and behavioral conditions, to gamify the secure collection of video data of children expressing a variety of emotions prompted by the game. Independently, we created a secure web interface to gamify the human labeling effort, called HollywoodSquares, tailored for use by any qualified labeler. We gathered and labeled 2155 videos, 39,968 emotion frames, and 106,001 labels on all images. With this drastically expanded pediatric emotion-centric database (>30 times larger than existing public pediatric emotion data sets), we trained a convolutional neural network (CNN) computer vision classifier of happy, sad, surprised, fearful, angry, disgust, and neutral expressions evoked by children. Results: The classifier achieved a 66.9% balanced accuracy and 67.4% F1-score on the entirety of the Child Affective Facial Expression (CAFE) as well as a 79.1% balanced accuracy and 78% F1-score on CAFE Subset A, a subset containing at least 60% human agreement on emotions labels. This performance is at least 10% higher than all previously developed classifiers evaluated against CAFE, the best of which reached a 56% balanced accuracy even when combining "anger" and "disgust" into a single class.
△ Less
Submitted 3 June, 2024; v1 submitted 15 December, 2020;
originally announced December 2020.
-
Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning
Authors:
Rachel Gardner,
Maya Varma,
Clare Zhu,
Ranjay Krishna
Abstract:
Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data. In this work, we seek to enable the collection of high-quality question-answer datasets from social media by proposing a novel task for automated quality analysis and data cleaning: question-answer (QA) plausibility. Given a machine or u…
▽ More
Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data. In this work, we seek to enable the collection of high-quality question-answer datasets from social media by proposing a novel task for automated quality analysis and data cleaning: question-answer (QA) plausibility. Given a machine or user-generated question and a crowd-sourced response from a social media user, we determine if the question and response are valid; if so, we identify the answer within the free-form response. We design BERT-based models to perform the QA plausibility task, and we evaluate the ability of our models to generate a clean, usable question-answer dataset. Our highest-performing approach consists of a single-task model which determines the plausibility of the question, followed by a multi-task model which evaluates the plausibility of the response as well as extracts answers (Question Plausibility AUROC=0.75, Response Plausibility AUROC=0.78, Answer Extraction F1=0.665).
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference
Authors:
Oindrila Saha,
Aditya Kusupati,
Harsha Vardhan Simhadri,
Manik Varma,
Prateek Jain
Abstract:
Standard Convolutional Neural Networks (CNNs) designed for computer vision tasks tend to have large intermediate activation maps. These require large working memory and are thus unsuitable for deployment on resource-constrained devices typically used for inference on the edge. Aggressively downsampling the images via pooling or strided convolutions can address the problem but leads to a significan…
▽ More
Standard Convolutional Neural Networks (CNNs) designed for computer vision tasks tend to have large intermediate activation maps. These require large working memory and are thus unsuitable for deployment on resource-constrained devices typically used for inference on the edge. Aggressively downsampling the images via pooling or strided convolutions can address the problem but leads to a significant decrease in accuracy due to gross aggregation of the feature map by standard pooling operators. In this paper, we introduce RNNPool, a novel pooling operator based on Recurrent Neural Networks (RNNs), that efficiently aggregates features over large patches of an image and rapidly downsamples activation maps. Empirical evaluation indicates that an RNNPool layer can effectively replace multiple blocks in a variety of architectures such as MobileNets, DenseNet when applied to standard vision tasks like image classification and face detection. That is, RNNPool can significantly decrease computational complexity and peak memory usage for inference while retaining comparable accuracy. We use RNNPool with the standard S3FD architecture to construct a face detection method that achieves state-of-the-art MAP for tiny ARM Cortex-M4 class microcontrollers with under 256 KB of RAM. Code is released at https://github.com/Microsoft/EdgeML.
△ Less
Submitted 22 October, 2020; v1 submitted 27 February, 2020;
originally announced February 2020.
-
Throughput Optimal Routing in Blockchain Based Payment Systems
Authors:
Sushil Mahavir Varma,
Siva Theja Maguluri
Abstract:
Cryptocurrency networks such as Bitcoin have emerged as a distributed alternative to traditional centralized financial transaction networks. However, there are major challenges in scaling up the throughput of such networks. Lightning network and Spider network are alternates that build bidirectional payment channels on top of cryptocurrency networks using smart contracts, to enable fast transactio…
▽ More
Cryptocurrency networks such as Bitcoin have emerged as a distributed alternative to traditional centralized financial transaction networks. However, there are major challenges in scaling up the throughput of such networks. Lightning network and Spider network are alternates that build bidirectional payment channels on top of cryptocurrency networks using smart contracts, to enable fast transactions that bypass the Blockchain. In this paper, we study the problem of routing transactions in such a payment processing network. We first propose a Stochastic model to study such a system, as opposed to a fluid model that is studied in the literature. Each link in such a model is a two-sided queue, and unlike classical queues, such queues are not stable unless there is an external control. We propose a notion of stability for the payment processing network consisting of such two-sided queues using the notion of on-chain rebalancing. We then characterize the capacity region and propose a throughput optimal algorithm that stabilizes the system under any load within the capacity region. The stochastic model enables us to study closed loop policies, which typically have better queuing/delay performance than the open loop policies (or static split rules) studied in the literature. We investigate this through simulations.
△ Less
Submitted 23 June, 2021; v1 submitted 12 December, 2019;
originally announced January 2020.
-
Extreme Regression for Dynamic Search Advertising
Authors:
Yashoteja Prabhu,
Aditya Kusupati,
Nilesh Gupta,
Manik Varma
Abstract:
This paper introduces a new learning paradigm called eXtreme Regression (XR) whose objective is to accurately predict the numerical degrees of relevance of an extremely large number of labels to a data point. XR can provide elegant solutions to many large-scale ranking and recommendation applications including Dynamic Search Advertising (DSA). XR can learn more accurate models than the recently po…
▽ More
This paper introduces a new learning paradigm called eXtreme Regression (XR) whose objective is to accurately predict the numerical degrees of relevance of an extremely large number of labels to a data point. XR can provide elegant solutions to many large-scale ranking and recommendation applications including Dynamic Search Advertising (DSA). XR can learn more accurate models than the recently popular extreme classifiers which incorrectly assume strictly binary-valued label relevances. Traditional regression metrics which sum the errors over all the labels are unsuitable for XR problems since they could give extremely loose bounds for the label ranking quality. Also, the existing regression algorithms won't efficiently scale to millions of labels. This paper addresses these limitations through: (1) new evaluation metrics for XR which sum only the k largest regression errors; (2) a new algorithm called XReg which decomposes XR task into a hierarchy of much smaller regression problems thus leading to highly efficient training and prediction. This paper also introduces a (3) new labelwise prediction algorithm in XReg useful for DSA and other recommendation tasks. Experiments on benchmark datasets demonstrated that XReg can outperform the state-of-the-art extreme classifiers as well as large-scale regressors and rankers by up to 50% reduction in the new XR error metric, and up to 2% and 2.4% improvements in terms of the propensity-scored precision metric used in extreme classification and the click-through rate metric used in DSA respectively. Deployment of XReg on DSA in Bing resulted in a relative gain of 27% in query coverage. XReg's source code can be downloaded from http://manikvarma.org/code/XReg/download.html.
△ Less
Submitted 20 January, 2020; v1 submitted 15 January, 2020;
originally announced January 2020.
-
One Size Does Not Fit All: Multi-Scale, Cascaded RNNs for Radar Classification
Authors:
Dhrubojyoti Roy,
Sangeeta Srivastava,
Aditya Kusupati,
Pranshu Jain,
Manik Varma,
Anish Arora
Abstract:
Edge sensing with micro-power pulse-Doppler radars is an emergent domain in monitoring and surveillance with several smart city applications. Existing solutions for the clutter versus multi-source radar classification task are limited in terms of either accuracy or efficiency, and in some cases, struggle with a trade-off between false alarms and recall of sources. We find that this problem can be…
▽ More
Edge sensing with micro-power pulse-Doppler radars is an emergent domain in monitoring and surveillance with several smart city applications. Existing solutions for the clutter versus multi-source radar classification task are limited in terms of either accuracy or efficiency, and in some cases, struggle with a trade-off between false alarms and recall of sources. We find that this problem can be resolved by learning the classifier across multiple time-scales. We propose a multi-scale, cascaded recurrent neural network architecture, MSC-RNN, comprised of an efficient multi-instance learning (MIL) Recurrent Neural Network (RNN) for clutter discrimination at a lower tier, and a more complex RNN classifier for source classification at the upper tier. By controlling the invocation of the upper RNN with the help of the lower tier conditionally, MSC-RNN achieves an overall accuracy of 0.972. Our approach holistically improves the accuracy and per-class recalls over ML models suitable for radar inferencing. Notably, we outperform cross-domain handcrafted feature engineering with time-domain deep feature learning, while also being up to $\sim$3$\times$ more efficient than a competitive solution.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
Pathologist-Level Grading of Prostate Biopsies with Artificial Intelligence
Authors:
Peter Ström,
Kimmo Kartasalo,
Henrik Olsson,
Leslie Solorzano,
Brett Delahunt,
Daniel M. Berney,
David G. Bostwick,
Andrew J. Evans,
David J. Grignon,
Peter A. Humphrey,
Kenneth A. Iczkowski,
James G. Kench,
Glen Kristiansen,
Theodorus H. van der Kwast,
Katia R. M. Leite,
Jesse K. McKenney,
Jon Oxley,
Chin-Chen Pan,
Hemamali Samaratunga,
John R. Srigley,
Hiroyuki Takahashi,
Toyonori Tsuzuki,
Murali Varma,
Ming Zhou,
Johan Lindberg
, et al. (7 additional authors not shown)
Abstract:
Background: An increasing volume of prostate biopsies and a world-wide shortage of uro-pathologists puts a strain on pathology departments. Additionally, the high intra- and inter-observer variability in grading can result in over- and undertreatment of prostate cancer. Artificial intelligence (AI) methods may alleviate these problems by assisting pathologists to reduce workload and harmonize grad…
▽ More
Background: An increasing volume of prostate biopsies and a world-wide shortage of uro-pathologists puts a strain on pathology departments. Additionally, the high intra- and inter-observer variability in grading can result in over- and undertreatment of prostate cancer. Artificial intelligence (AI) methods may alleviate these problems by assisting pathologists to reduce workload and harmonize grading.
Methods: We digitized 6,682 needle biopsies from 976 participants in the population based STHLM3 diagnostic study to train deep neural networks for assessing prostate biopsies. The networks were evaluated by predicting the presence, extent, and Gleason grade of malignant tissue for an independent test set comprising 1,631 biopsies from 245 men. We additionally evaluated grading performance on 87 biopsies individually graded by 23 experienced urological pathologists from the International Society of Urological Pathology. We assessed discriminatory performance by receiver operating characteristics (ROC) and tumor extent predictions by correlating predicted millimeter cancer length against measurements by the reporting pathologist. We quantified the concordance between grades assigned by the AI and the expert urological pathologists using Cohen's kappa.
Results: The performance of the AI to detect and grade cancer in prostate needle biopsy samples was comparable to that of international experts in prostate pathology. The AI achieved an area under the ROC curve of 0.997 for distinguishing between benign and malignant biopsy cores, and 0.999 for distinguishing between men with or without prostate cancer. The correlation between millimeter cancer predicted by the AI and assigned by the reporting pathologist was 0.96. For assigning Gleason grades, the AI achieved an average pairwise kappa of 0.62. This was within the range of the corresponding values for the expert pathologists (0.60 to 0.73).
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Water Distribution System Design Using Multi-Objective Genetic Algorithm with External Archive and Local Search
Authors:
Mahesh Patil,
M. Naveen Naidu,
A. Vasan,
Murari R. R. Varma
Abstract:
Hybridisation of the multi-objective optimisation algorithm NSGA-II and local search is proposed for water distribution system design. Results obtained with the proposed algorithm are presented for four medium-size water networks taken from the literature. Local search is found to be beneficial for one of the networks in terms of finding new solutions not reported earlier. It is also shown that si…
▽ More
Hybridisation of the multi-objective optimisation algorithm NSGA-II and local search is proposed for water distribution system design. Results obtained with the proposed algorithm are presented for four medium-size water networks taken from the literature. Local search is found to be beneficial for one of the networks in terms of finding new solutions not reported earlier. It is also shown that simply using an external archive to save all non-dominated solutions visited by the population, even without local search, leads to substantial improvement in the non-dominated set produced by the algorithm.
△ Less
Submitted 20 May, 2019;
originally announced May 2019.
-
Water Distribution System Design Using Multi-Objective Particle Swarm Optimisation
Authors:
Mahesh B. Patil,
M. Naveen Naidu,
A. Vasan,
Murari R. R. Varma
Abstract:
Application of the multi-objective particle swarm optimisation (MOPSO) algorithm to design of water distribution systems is described. An earlier MOPSO algorithm is augmented with (a) local search, (b) a modified strategy for assigning the leader, and (c) a modified mutation scheme. For one of the benchmark problems described in the literature, the effect of each of the above features on the algor…
▽ More
Application of the multi-objective particle swarm optimisation (MOPSO) algorithm to design of water distribution systems is described. An earlier MOPSO algorithm is augmented with (a) local search, (b) a modified strategy for assigning the leader, and (c) a modified mutation scheme. For one of the benchmark problems described in the literature, the effect of each of the above features on the algorithm performance is demonstrated. The augmented MOPSO algorithm (called MOPSO+) is applied to five benchmark problems, and in each case, it finds non-dominated solutions not reported earlier. In addition, for the purpose of comparing Pareto fronts (sets of non-dominated solutions) obtained by different algorithms, a new criterion is suggested, and its usefulness is pointed out with an example. Finally, some suggestions regarding future research directions are made.
△ Less
Submitted 14 March, 2019;
originally announced March 2019.
-
FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network
Authors:
Aditya Kusupati,
Manish Singh,
Kush Bhatia,
Ashish Kumar,
Prateek Jain,
Manik Varma
Abstract:
This paper develops the FastRNN and FastGRNN algorithms to address the twin RNN limitations of inaccurate training and inefficient prediction. Previous approaches have improved accuracy at the expense of prediction costs making them infeasible for resource-constrained and real-time applications. Unitary RNNs have increased accuracy somewhat by restricting the range of the state transition matrix's…
▽ More
This paper develops the FastRNN and FastGRNN algorithms to address the twin RNN limitations of inaccurate training and inefficient prediction. Previous approaches have improved accuracy at the expense of prediction costs making them infeasible for resource-constrained and real-time applications. Unitary RNNs have increased accuracy somewhat by restricting the range of the state transition matrix's singular values but have also increased the model size as they require a larger number of hidden units to make up for the loss in expressive power. Gated RNNs have obtained state-of-the-art accuracies by adding extra parameters thereby resulting in even larger models. FastRNN addresses these limitations by adding a residual connection that does not constrain the range of the singular values explicitly and has only two extra scalar parameters. FastGRNN then extends the residual connection to a gate by reusing the RNN matrices to match state-of-the-art gated RNN accuracies but with a 2-4x smaller model. Enforcing FastGRNN's matrices to be low-rank, sparse and quantized resulted in accurate models that could be up to 35x smaller than leading gated and unitary RNNs. This allowed FastGRNN to accurately recognize the "Hey Cortana" wakeword with a 1 KB model and to be deployed on severely resource-constrained IoT microcontrollers too tiny to store other RNN models. FastGRNN's code is available at https://github.com/Microsoft/EdgeML/.
△ Less
Submitted 8 January, 2019;
originally announced January 2019.
-
Locally Non-linear Embeddings for Extreme Multi-label Learning
Authors:
Kush Bhatia,
Himanshu Jain,
Purushottam Kar,
Prateek Jain,
Manik Varma
Abstract:
The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensio…
▽ More
The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensional label vectors onto a low dimensional linear subspace. Still, leading embedding approaches have been unable to deliver high prediction accuracies or scale to large problems as the low rank assumption is violated in most real world applications.
This paper develops the X-One classifier to address both limitations. The main technical contribution in X-One is a formulation for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels. This allows X-One to break free of the traditional low-rank assumption and boost classification accuracy by learning embeddings which preserve pairwise distances between only the nearest label vectors.
We conducted extensive experiments on several real-world as well as benchmark data sets and compared our method against state-of-the-art methods for extreme multi-label classification. Experiments reveal that X-One can make significantly more accurate predictions then the state-of-the-art methods including both embeddings (by as much as 35%) as well as trees (by as much as 6%). X-One can also scale efficiently to data sets with a million labels which are beyond the pale of leading embedding methods.
△ Less
Submitted 9 July, 2015;
originally announced July 2015.
-
Rainbow Connection Number and Connected Dominating Sets
Authors:
L. Sunil Chandran,
Anita Das,
Deepak Rajendraprasad,
Nithin M. Varma
Abstract:
Rainbow connection number rc(G) of a connected graph G is the minimum number of colours needed to colour the edges of G, so that every pair of vertices is connected by at least one path in which no two edges are coloured the same. In this paper we show that for every connected graph G, with minimum degree at least 2, the rainbow connection number is upper bounded by γ_c(G) + 2, where γ_c(G) is the…
▽ More
Rainbow connection number rc(G) of a connected graph G is the minimum number of colours needed to colour the edges of G, so that every pair of vertices is connected by at least one path in which no two edges are coloured the same. In this paper we show that for every connected graph G, with minimum degree at least 2, the rainbow connection number is upper bounded by γ_c(G) + 2, where γ_c(G) is the connected domination number of G. Bounds of the form diameter(G) \leq rc(G) \leq diameter(G) + c, 1 \leq c \leq 4, for many special graph classes follow as easy corollaries from this result. This includes interval graphs, AT-free graphs, circular arc graphs, threshold graphs, and chain graphs all with minimum degree at least 2 and connected. We also show that every bridge-less chordal graph G has rc(G) \leq 3.radius(G). In most of these cases, we also demonstrate the tightness of the bounds. An extension of this idea to two-step dominating sets is used to show that for every connected graph on n vertices with minimum degree δ, the rainbow connection number is upper bounded by 3n/(δ + 1) + 3. This solves an open problem of Schiermeyer (2009), improving the previously best known bound of 20n/δ by Krivelevich and Yuster (2010). Moreover, this bound is seen to be tight up to additive factors by a construction of Caro et al. (2008).
△ Less
Submitted 12 October, 2010;
originally announced October 2010.