Search | arXiv e-print repository

Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval

Authors: Ravisri Valluri, Akash Kumar Mohankumar, Kushal Dave, Amit Singh, Jian Jiao, Manik Varma, Gaurav Sinha

Abstract: Generative Retrieval introduces a new approach to Information Retrieval by reframing it as a constrained generation task, leveraging recent advancements in Autoregressive (AR) language models. However, AR-based Generative Retrieval methods suffer from high inference latency and cost compared to traditional dense retrieval techniques, limiting their practical applicability. This paper investigates… ▽ More Generative Retrieval introduces a new approach to Information Retrieval by reframing it as a constrained generation task, leveraging recent advancements in Autoregressive (AR) language models. However, AR-based Generative Retrieval methods suffer from high inference latency and cost compared to traditional dense retrieval techniques, limiting their practical applicability. This paper investigates fully Non-autoregressive (NAR) language models as a more efficient alternative for generative retrieval. While standard NAR models alleviate latency and cost concerns, they exhibit a significant drop in retrieval performance (compared to AR models) due to their inability to capture dependencies between target tokens. To address this, we question the conventional choice of limiting the target token space to solely words or sub-words. We propose PIXAR, a novel approach that expands the target vocabulary of NAR models to include multi-word entities and common phrases (up to 5 million tokens), thereby reducing token dependencies. PIXAR employs inference optimization strategies to maintain low inference latency despite the significantly larger vocabulary. Our results demonstrate that PIXAR achieves a relative improvement of 31.0% in MRR@10 on MS MARCO and 23.2% in Hits@5 on Natural Questions compared to standard NAR models with similar latency and cost. Furthermore, online A/B experiments on a large commercial search engine show that PIXAR increases ad clicks by 5.08% and revenue by 4.02%. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 14 pages, 6 tables, 2 figures

arXiv:2406.06512 [pdf, other]

Merlin: A Vision Language Foundation Model for 3D Computed Tomography

Authors: Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zhihong Chen, Jean-Benoit Delbrouck, Eduardo Reis, Cesar Truyts, Christian Bluethgen, Malte Engmann Kjeldskov Jensen, Sophie Ostmeier, Maya Varma, Jeya Maria Jose Valanarasu, Zhongnan Fang, Zepeng Huo, Zaid Nabulsi, Diego Ardila, Wei-Hung Weng, Edson Amaro Junior, Neera Ahuja, Jason Fries, Nigam H. Shah, Andrew Johnston , et al. (6 additional authors not shown)

Abstract: Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision la… ▽ More Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs). However, current medical VLMs are generally limited to 2D images and short reports, and do not leverage electronic health record (EHR) data for supervision. We introduce Merlin - a 3D VLM that we train using paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens). We evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 18 pages, 7 figures

arXiv:2405.19538 [pdf, other]

CheXpert Plus: Augmenting a Large Chest X-ray Dataset with Text Radiology Reports, Patient Demographics and Additional Image Formats

Authors: Pierre Chambon, Jean-Benoit Delbrouck, Thomas Sounack, Shih-Cheng Huang, Zhihong Chen, Maya Varma, Steven QH Truong, Chu The Chuong, Curtis P. Langlotz

Abstract: Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a… ▽ More Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a new collection of radiology data sources, made publicly available to enhance the scaling, performance, robustness, and fairness of models for all subsequent machine learning tasks in the field of radiology. CheXpert Plus is the largest text dataset publicly released in radiology, with a total of 36 million text tokens, including 13 million impression tokens. To the best of our knowledge, it represents the largest text de-identification effort in radiology, with almost 1 million PHI spans anonymized. It is only the second time that a large-scale English paired dataset has been released in radiology, thereby enabling, for the first time, cross-institution training at scale. All reports are paired with high-quality images in DICOM format, along with numerous image and patient metadata covering various clinical and socio-economic groups, as well as many pathology labels and RadGraph annotations. We hope this dataset will boost research for AI models that can further assist radiologists and help improve medical care. Data is available at the following URL: https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 Models are available at the following URL: https://github.com/Stanford-AIMI/chexpert-plus △ Less

Submitted 3 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: 13 pages Updated title

arXiv:2405.07526 [pdf, other]

doi 10.1145/3589335.3648327

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Authors: Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, **gwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik , et al. (6 additional authors not shown)

Abstract: Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of down… ▽ More Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of downstream tasks and encourages research in various areas, such as generic end-to-end neural indexer models, generic embedding models, and next generation information access system with large language models. MS MARCO Web Search offers a retrieval benchmark with three web retrieval challenge tasks that demand innovations in both machine learning and information retrieval system research domains. As the first dataset that meets large, real and rich data requirements, MS MARCO Web Search paves the way for future advancements in AI and system research. MS MARCO Web Search dataset is available at: https://github.com/microsoft/MS-MARCO-Web-Search. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 10 pages, 6 figures, for associated dataset, see http://github.com/microsoft/MS-MARCO-Web-Search

arXiv:2405.03595 [pdf, other]

GREEN: Generative Radiology Report Evaluation and Error Notation

Authors: Sophie Ostmeier, Justin Xu, Zhihong Chen, Maya Varma, Louis Blankemeier, Christian Bluethgen, Arne Edward Michalson, Michael Moseley, Curtis Langlotz, Akshay S Chaudhari, Jean-Benoit Delbrouck

Abstract: Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to the need for accurate medical communication about medical images. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GRE… ▽ More Evaluating radiology reports is a challenging problem as factual correctness is extremely important due to the need for accurate medical communication about medical images. Existing automatic evaluation metrics either suffer from failing to consider factual correctness (e.g., BLEU and ROUGE) or are limited in their interpretability (e.g., F1CheXpert and F1RadGraph). In this paper, we introduce GREEN (Generative Radiology Report Evaluation and Error Notation), a radiology report generation metric that leverages the natural language understanding of language models to identify and explain clinically significant errors in candidate reports, both quantitatively and qualitatively. Compared to current metrics, GREEN offers: 1) a score aligned with expert preferences, 2) human interpretable explanations of clinically significant errors, enabling feedback loops with end-users, and 3) a lightweight open-source method that reaches the performance of commercial counterparts. We validate our GREEN metric by comparing it to GPT-4, as well as to error counts of 6 experts and preferences of 2 experts. Our method demonstrates not only higher correlation with expert error counts, but simultaneously higher alignment with expert preferences when compared to previous approaches." △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2402.18434 [pdf, other]

Graph Regularized Encoder Training for Extreme Classification

Authors: Anshul Mittal, Shikhar Mohan, Deepak Saini, Suchith C. Prabhu, Jain jiao, Sumeet Agarwal, Soumen Chakrabarti, Purushottam Kar, Manik Varma

Abstract: Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) prese… ▽ More Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) present a convenient but computationally expensive way to leverage task metadata and enhance model accuracies in these settings. This paper formally establishes that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures. The paper notices that in these settings, it is much more effective to use graph data to regularize encoder training than to implement a GCN. Based on these insights, an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs. RAMEN scales to datasets with up to 1M labels and offers prediction accuracy up to 15% higher on benchmark datasets than state of the art methods, including those that use graph metadata to train GCNs. RAMEN also offers 10% higher accuracy over the best baseline on a proprietary recommendation dataset sourced from click logs of a popular search engine. Code for RAMEN will be released publicly. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2401.12208 [pdf, other]

CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation

Authors: Zhihong Chen, Maya Varma, Jean-Benoit Delbrouck, Magdalini Paschali, Louis Blankemeier, Dave Van Veen, Jeya Maria Jose Valanarasu, Alaa Youssef, Joseph Paul Cohen, Eduardo Pontes Reis, Emily B. Tsai, Andrew Johnston, Cameron Olsen, Tanishq Mathew Abraham, Sergios Gatidis, Akshay S. Chaudhari, Curtis Langlotz

Abstract: Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, develo** FMs that can accurately interpret CXRs is challengin… ▽ More Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, develo** FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation. In this work, we address these challenges by first introducing \emph{CheXinstruct} - a large-scale instruction-tuning dataset curated from 28 publicly-available datasets. We then present \emph{CheXagent} - an instruction-tuned FM capable of analyzing and summarizing CXRs. To build CheXagent, we design a clinical large language model (LLM) for parsing radiology reports, a vision encoder for representing CXR images, and a network to bridge the vision and language modalities. Finally, we introduce \emph{CheXbench} - a novel benchmark designed to systematically evaluate FMs across 8 clinically-relevant CXR interpretation tasks. Extensive quantitative evaluations and qualitative reviews with five expert radiologists demonstrate that CheXagent outperforms previously-developed general- and medical-domain FMs on CheXbench tasks. Furthermore, in an effort to improve model transparency, we perform a fairness evaluation across factors of sex, race and age to highlight potential performance disparities. Our project is at \url{https://stanford-aimi.github.io/chexagent.html}. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 24 pages, 8 figures

arXiv:2309.04961 [pdf, other]

doi 10.1109/CVPR52688.2022.01207

Multi-modal Extreme Classification

Authors: Anshul Mittal, Kunal Dahiya, Shreya Malani, Janani Ramaswamy, Seba Kuruvilla, Jitendra Ajmera, Keng-hao Chang, Sumeet Agarwal, Purushottam Kar, Manik Varma

Abstract: This paper develops the MUFIN technique for extreme classification (XC) tasks with millions of labels where datapoints and labels are endowed with visual and textual descriptors. Applications of MUFIN to product-to-product recommendation and bid query prediction over several millions of products are presented. Contemporary multi-modal methods frequently rely on purely embedding-based methods. On t… ▽ More This paper develops the MUFIN technique for extreme classification (XC) tasks with millions of labels where datapoints and labels are endowed with visual and textual descriptors. Applications of MUFIN to product-to-product recommendation and bid query prediction over several millions of products are presented. Contemporary multi-modal methods frequently rely on purely embedding-based methods. On the other hand, XC methods utilize classifier architectures to offer superior accuracies than embedding only methods but mostly focus on text-based categorization tasks. MUFIN bridges this gap by reformulating multi-modal categorization as an XC problem with several millions of labels. This presents the twin challenges of develo** multi-modal architectures that can offer embeddings sufficiently expressive to allow accurate categorization over millions of labels; and training and inference routines that scale logarithmically in the number of labels. MUFIN develops an architecture based on cross-modal attention and trains it in a modular fashion using pre-training and positive and negative mining. A novel product-to-product recommendation dataset MM-AmazonTitles-300K containing over 300K products was curated from publicly available amazon.com listings with each product endowed with a title and multiple images. On the all datasets MUFIN offered at least 3% higher accuracy than leading text-based, image-based and multi-modal techniques. Code for MUFIN is available at https://github.com/Extreme-classification/MUFIN △ Less

Submitted 10 September, 2023; originally announced September 2023.

ACM Class: H.3.3

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022

arXiv:2308.11793 [pdf, other]

Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

Authors: Wenyan Cong, Hanxue Liang, Peihao Wang, Zhiwen Fan, Tianlong Chen, Mukund Varma, Yi Wang, Zhangyang Wang

Abstract: Cross-scene generalizable NeRF models, which can directly synthesize novel views of unseen scenes, have become a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end "neuralized" architectures, i.e., replacing scene representation and/or rendering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed-forward… ▽ More Cross-scene generalizable NeRF models, which can directly synthesize novel views of unseen scenes, have become a new spotlight of the NeRF field. Several existing attempts rely on increasingly end-to-end "neuralized" architectures, i.e., replacing scene representation and/or rendering modules with performant neural networks such as transformers, and turning novel view synthesis into a feed-forward inference pipeline. While those feedforward "neuralized" architectures still do not fit diverse scenes well out of the box, we propose to bridge them with the powerful Mixture-of-Experts (MoE) idea from large language models (LLMs), which has demonstrated superior generalization ability by balancing between larger overall model capacity and flexible per-instance specialization. Starting from a recent generalizable NeRF architecture called GNT, we first demonstrate that MoE can be neatly plugged in to enhance the model. We further customize a shared permanent expert and a geometry-aware consistency loss to enforce cross-scene consistency and spatial smoothness respectively, which are essential for generalizable view synthesis. Our proposed model, dubbed GNT with Mixture-of-View-Experts (GNT-MOVE), has experimentally shown state-of-the-art results when transferring to unseen scenes, indicating remarkably better cross-scene generalization in both zero-shot and few-shot settings. Our codes are available at https://github.com/VITA-Group/GNT-MOVE. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV2023

arXiv:2308.11194 [pdf, other]

ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data

Authors: Maya Varma, Jean-Benoit Delbrouck, Sarah Hooper, Akshay Chaudhari, Curtis Langlotz

Abstract: Vision-language models (VLMs), such as CLIP and ALIGN, are generally trained on datasets consisting of image-caption pairs obtained from the web. However, real-world multimodal datasets, such as healthcare data, are significantly more complex: each image (e.g. X-ray) is often paired with text (e.g. physician report) that describes many distinct attributes occurring in fine-grained regions of the i… ▽ More Vision-language models (VLMs), such as CLIP and ALIGN, are generally trained on datasets consisting of image-caption pairs obtained from the web. However, real-world multimodal datasets, such as healthcare data, are significantly more complex: each image (e.g. X-ray) is often paired with text (e.g. physician report) that describes many distinct attributes occurring in fine-grained regions of the image. We refer to these samples as exhibiting high pairwise complexity, since each image-text pair can be decomposed into a large number of region-attribute pairings. The extent to which VLMs can capture fine-grained relationships between image regions and textual attributes when trained on such data has not been previously evaluated. The first key contribution of this work is to demonstrate through systematic evaluations that as the pairwise complexity of the training dataset increases, standard VLMs struggle to learn region-attribute relationships, exhibiting performance degradations of up to 37% on retrieval tasks. In order to address this issue, we introduce ViLLA as our second key contribution. ViLLA, which is trained to capture fine-grained region-attribute relationships from complex datasets, involves two components: (a) a lightweight, self-supervised map** model to decompose image-text samples into region-attribute pairs, and (b) a contrastive VLM to learn representations from generated region-attribute pairs. We demonstrate with experiments across four domains (synthetic, product, medical, and natural images) that ViLLA outperforms comparable VLMs on fine-grained reasoning tasks, such as zero-shot object detection (up to 3.6 AP50 points on COCO and 0.6 mAP points on LVIS) and retrieval (up to 14.2 R-Precision points). △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: ICCV 2023

arXiv:2306.17598 [pdf, other]

Navigation of micro-robot swarms for targeted delivery using reinforcement learning

Authors: Akshatha Jagadish, Manoj Varma

Abstract: Micro robotics is quickly emerging to be a promising technological solution to many medical treatments with focus on targeted drug delivery. They are effective when working in swarms whose individual control is mostly infeasible owing to their minute size. Controlling a number of robots with a single controller is thus important and artificial intelligence can help us perform this task successfull… ▽ More Micro robotics is quickly emerging to be a promising technological solution to many medical treatments with focus on targeted drug delivery. They are effective when working in swarms whose individual control is mostly infeasible owing to their minute size. Controlling a number of robots with a single controller is thus important and artificial intelligence can help us perform this task successfully. In this work, we use the Reinforcement Learning (RL) algorithms Proximal Policy Optimization (PPO) and Robust Policy Optimization (RPO) to navigate a swarm of 4, 9 and 16 microswimmers under hydrodynamic effects, controlled by their orientation, towards a circular absorbing target. We look at both PPO and RPO performances with limited state information scenarios and also test their robustness for random target location and size. We use curriculum learning to improve upon the performance and demonstrate the same in learning to navigate a swarm of 25 swimmers and steering the swarm to exemplify the manoeuvring capabilities of the RL model. △ Less

Submitted 30 June, 2023; originally announced June 2023.

arXiv:2306.17578 [pdf, other]

Role of single particle motility statistics on efficiency of targeted delivery of micro-robot swarms

Authors: Akshatha Jagadish, Manoj Varma

Abstract: The study of dynamics of single active particles plays an important role in the development of artificial or hybrid micro-systems for bio-medical and other applications at micro-scale. Here, we utilize the results of these studies to better understand their implications for the specific application of drug delivery. We analyze the variations in the capture efficiency for different types of motion… ▽ More The study of dynamics of single active particles plays an important role in the development of artificial or hybrid micro-systems for bio-medical and other applications at micro-scale. Here, we utilize the results of these studies to better understand their implications for the specific application of drug delivery. We analyze the variations in the capture efficiency for different types of motion dynamics without inter-particle interactions and compare the results. We also discuss the reasons for the same and describe the specific parameters that affect the capture efficiency, which in turn helps in both hardware and control design of a micro-bot swarm system for drug delivery. △ Less

Submitted 30 June, 2023; originally announced June 2023.

arXiv:2211.08584 [pdf, other]

doi 10.18653/v1/2023.acl-short.41

Toward expanding the scope of radiology report summarization to multiple anatomies and modalities

Authors: Zhihong Chen, Maya Varma, Xiang Wan, Curtis Langlotz, Jean-Benoit Delbrouck

Abstract: Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproductio… ▽ More Radiology report summarization (RRS) is a growing area of research. Given the Findings section of a radiology report, the goal is to generate a summary (called an Impression section) that highlights the key observations and conclusions of the radiology study. However, RRS currently faces essential limitations.First, many prior studies conduct experiments on private datasets, preventing reproduction of results and fair comparisons across different systems and solutions. Second, most prior approaches are evaluated solely on chest X-rays. To address these limitations, we propose a dataset (MIMIC-RRS) involving three new modalities and seven new anatomies based on the MIMIC-III and MIMIC-CXR datasets. We then conduct extensive experiments to evaluate the performance of models both within and across modality-anatomy pairs in MIMIC-RRS. In addition, we evaluate their clinical efficacy via RadGraph, a factual correctness metric. △ Less

Submitted 21 July, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2023

arXiv:2207.04452 [pdf, other]

NGAME: Negative Mining-aware Mini-batching for Extreme Classification

Authors: Kunal Dahiya, Nilesh Gupta, Deepak Saini, Akshay Soni, Yajun Wang, Kushal Dave, Jian Jiao, Gururaj K, Prasenjit Dey, Amit Singh, Deepesh Hada, Vidit Jain, Bhawna Paliwal, Anshul Mittal, Sonu Mehta, Ramachandran Ramjee, Sumeet Agarwal, Purushottam Kar, Manik Varma

Abstract: Extreme Classification (XC) seeks to tag data points with the most relevant subset of labels from an extremely large label set. Performing deep XC with dense, learnt representations for data points and labels has attracted much attention due to its superiority over earlier XC methods that used sparse, hand-crafted features. Negative mining techniques have emerged as a critical component of all dee… ▽ More Extreme Classification (XC) seeks to tag data points with the most relevant subset of labels from an extremely large label set. Performing deep XC with dense, learnt representations for data points and labels has attracted much attention due to its superiority over earlier XC methods that used sparse, hand-crafted features. Negative mining techniques have emerged as a critical component of all deep XC methods that allow them to scale to millions of labels. However, despite recent advances, training deep XC models with large encoder architectures such as transformers remains challenging. This paper identifies that memory overheads of popular negative mining techniques often force mini-batch sizes to remain small and slow training down. In response, this paper introduces NGAME, a light-weight mini-batch creation technique that offers provably accurate in-batch negative samples. This allows training with larger mini-batches offering significantly faster convergence and higher accuracies than existing negative sampling techniques. NGAME was found to be up to 16% more accurate than state-of-the-art methods on a wide array of benchmark datasets for extreme classification, as well as 3% more accurate at retrieving search engine queries in response to a user webpage visit to show personalized ads. In live A/B tests on a popular search engine, NGAME yielded up to 23% gains in click-through-rates. △ Less

Submitted 10 July, 2022; originally announced July 2022.

arXiv:2203.14960 [pdf, other]

Domino: Discovering Systematic Errors with Cross-Modal Embeddings

Authors: Sabri Eyuboglu, Maya Varma, Khaled Saab, Jean-Benoit Delbrouck, Christopher Lee-Messer, Jared Dunnmon, James Zou, Christopher Ré

Abstract: Machine learning models that achieve high overall accuracy often make systematic errors on important subsets (or slices) of data. Identifying underperforming slices is particularly challenging when working with high-dimensional inputs (e.g. images, audio), where important slices are often unlabeled. In order to address this issue, recent studies have proposed automated slice discovery methods (SDM… ▽ More Machine learning models that achieve high overall accuracy often make systematic errors on important subsets (or slices) of data. Identifying underperforming slices is particularly challenging when working with high-dimensional inputs (e.g. images, audio), where important slices are often unlabeled. In order to address this issue, recent studies have proposed automated slice discovery methods (SDMs), which leverage learned model representations to mine input data for slices on which a model performs poorly. To be useful to a practitioner, these methods must identify slices that are both underperforming and coherent (i.e. united by a human-understandable concept). However, no quantitative evaluation framework currently exists for rigorously assessing SDMs with respect to these criteria. Additionally, prior qualitative evaluations have shown that SDMs often identify slices that are incoherent. In this work, we address these challenges by first designing a principled evaluation framework that enables a quantitative comparison of SDMs across 1,235 slice discovery settings in three input domains (natural images, medical images, and time-series data). Then, motivated by the recent development of powerful cross-modal representation learning approaches, we present Domino, an SDM that leverages cross-modal embeddings and a novel error-aware mixture model to discover and describe coherent slices. We find that Domino accurately identifies 36% of the 1,235 slices in our framework - a 12 percentage point improvement over prior methods. Further, Domino is the first SDM that can provide natural language descriptions of identified slices, correctly generating the exact name of the slice in 35% of settings. △ Less

Submitted 21 May, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: ICLR 2022 (Oral)

arXiv:2111.06685 [pdf, other]

doi 10.1145/3437963.3441810

DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

Authors: Kunal Dahiya, Deepak Saini, Anshul Mittal, Ankush Shaw, Kushal Dave, Akshay Soni, Himanshu Jain, Sumeet Agarwal, Manik Varma

Abstract: Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub… ▽ More Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub-tasks each of which can be trained accurately and efficiently. Choosing different components for the four sub-tasks allows DeepXML to generate a family of algorithms with varying trade-offs between accuracy and scalability. In particular, DeepXML yields the Astec algorithm that could be 2-12% more accurate and 5-30x faster to train than leading deep extreme classifiers on publically available short text datasets. Astec could also efficiently train on Bing short text datasets containing up to 62 million labels while making predictions for billions of users and data points per day on commodity hardware. This allowed Astec to be deployed on the Bing search engine for a number of short text applications ranging from matching user queries to advertiser bid phrases to showing personalized ads where it yielded significant gains in click-through-rates, coverage, revenue and other online metrics over state-of-the-art techniques currently in production. DeepXML's code is available at https://github.com/Extreme-classification/deepxml △ Less

Submitted 12 November, 2021; originally announced November 2021.

ACM Class: F.2.2; I.2.7

Journal ref: Web Search and Data Mining 2021

arXiv:2110.08228 [pdf, other]

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

Authors: Maya Varma, Laurel Orr, Sen Wu, Megan Leszczynski, Xiao Ling, Christopher Ré

Abstract: Named entity disambiguation (NED), which involves map** textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grained structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In th… ▽ More Named entity disambiguation (NED), which involves map** textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grained structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In this work, we address these issues by proposing a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain. We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining. Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Furthermore, we improve disambiguation of rare entities by up to 57 accuracy points. △ Less

Submitted 15 October, 2021; originally announced October 2021.

Comments: Accepted to Findings of EMNLP 2021

arXiv:2108.13167 [pdf, ps, other]

Transportation Polytope and its Applications in Parallel Server Systems

Authors: Sushil Mahavir Varma, Siva Theja Maguluri

Abstract: A parallel server system is a stochastic processing network with applications in manufacturing, supply chain, ride-hailing, call centers, etc. Heterogeneous customers arrive in the system, and only a subset of servers can serve any customer type given by the flexibility graph. The goal of the system operator is to minimize the delay that depends on the scheduling policy and the flexibility graph.… ▽ More A parallel server system is a stochastic processing network with applications in manufacturing, supply chain, ride-hailing, call centers, etc. Heterogeneous customers arrive in the system, and only a subset of servers can serve any customer type given by the flexibility graph. The goal of the system operator is to minimize the delay that depends on the scheduling policy and the flexibility graph. A long line of literature focuses on designing near-optimal scheduling policies given a flexibility graph. On the contrary, we fix the scheduling policy to be the so-called MaxWeight scheduling given its superior delay performance and focus on designing near-optimal, sparse flexibility graphs. Our contributions are threefold. First, we analyze the expected delay in the heavy-traffic asymptotic regime in terms of the properties of the flexibility graph and use this result to translate the design question in terms of transportation polytope, the deterministic equivalent of parallel server queues. Second, we design the sparsest flexibility graph that achieves a given delay performance and shows the robustness of the design to demand uncertainty. Third, given the budget to add edges arrives sequentially in time, we present the optimal schedule for adding them to the flexibility graph. These results are obtained by proving new results for transportation polytopes and are of independent interest. In particular, translating the difficulties to a simpler model, i.e. transportation polytope, allows us to develop a unified framework to answer several design questions. △ Less

Submitted 6 January, 2023; v1 submitted 11 August, 2021; originally announced August 2021.

Comments: 56 pages, 10 Figures

arXiv:2108.11554 [pdf, other]

XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches

Authors: V Manushree, Sameer Saxena, Parna Chowdhury, Manisimha Varma, Harsh Rathod, Ankita Ghosh, Sahil Khose

Abstract: Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color… ▽ More Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations. △ Less

Submitted 7 January, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

Comments: ML for Creativity and Design workshop at NeurIPS 2021

arXiv:2108.00368 [pdf, other]

doi 10.1145/3437963.3441807

DECAF: Deep Extreme Classification with Label Features

Authors: Anshul Mittal, Kunal Dahiya, Sheshansh Agrawal, Deepak Saini, Sumeet Agarwal, Purushottam Kar, Manik Varma

Abstract: Extreme multi-label classification (XML) involves tagging a data point with its most relevant subset of labels from an extremely large label set, with several applications such as product-to-product recommendation with millions of products. Although leading XML algorithms scale to millions of labels, they largely ignore label meta-data such as textual descriptions of the labels. On the other hand,… ▽ More Extreme multi-label classification (XML) involves tagging a data point with its most relevant subset of labels from an extremely large label set, with several applications such as product-to-product recommendation with millions of products. Although leading XML algorithms scale to millions of labels, they largely ignore label meta-data such as textual descriptions of the labels. On the other hand, classical techniques that can utilize label metadata via representation learning using deep networks struggle in extreme settings. This paper develops the DECAF algorithm that addresses these challenges by learning models enriched by label metadata that jointly learn model parameters and feature representations using deep networks and offer accurate classification at the scale of millions of labels. DECAF makes specific contributions to model architecture design, initialization, and training, enabling it to offer up to 2-6% more accurate prediction than leading extreme classifiers on publicly available benchmark product-to-product recommendation datasets, such as LF-AmazonTitles-1.3M. At the same time, DECAF was found to be up to 22x faster at inference than leading deep extreme classifiers, which makes it suitable for real-time applications that require predictions within a few milliseconds. The code for DECAF is available at the following URL https://github.com/Extreme-classification/DECAF. △ Less

Submitted 1 August, 2021; originally announced August 2021.

ACM Class: F.2.2; I.2.7

Journal ref: Web Search and Data Mining 2021

arXiv:2108.00261 [pdf, other]

doi 10.1145/3442381.3449815

ECLARE: Extreme Classification with Label Graph Correlations

Authors: Anshul Mittal, Noveen Sachdeva, Sheshansh Agrawal, Sumeet Agarwal, Purushottam Kar, Manik Varma

Abstract: Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during training. Such rare labels hold the key to personalized recommendations that can delight and surprise a user. However, the large number of rare labels a… ▽ More Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during training. Such rare labels hold the key to personalized recommendations that can delight and surprise a user. However, the large number of rare labels and small amount of training data per rare label offer significant statistical and computational challenges. State-of-the-art deep XC methods attempt to remedy this by incorporating textual descriptions of labels but do not adequately address the problem. This paper presents ECLARE, a scalable deep learning architecture that incorporates not only label text, but also label correlations, to offer accurate real-time predictions within a few milliseconds. Core contributions of ECLARE include a frugal architecture and scalable techniques to train deep models along with label correlation graphs at the scale of millions of labels. In particular, ECLARE offers predictions that are 2 to 14% more accurate on both publicly available benchmark datasets as well as proprietary datasets for a related products recommendation task sourced from the Bing search engine. Code for ECLARE is available at https://github.com/Extreme-classification/ECLARE. △ Less

Submitted 31 July, 2021; originally announced August 2021.

ACM Class: F.2.2; I.2.7

Journal ref: The Web Conference 2021

arXiv:2106.08591 [pdf]

Quality-Quantity Trade-offs in Tests for Management of COVID-19-like Epidemics

Authors: Harish Sasikumar, Manoj Varma

Abstract: There are multiple testing methods to ascertain an infection in an individual and they vary in their performances, cost and delay. Unfortunately, better performing tests are sometimes costlier and time consuming and can only be done for a small fraction of the population. On the other hand, greater number of individuals can be tested using a cheaper, rapid test, but may only provide less reliable… ▽ More There are multiple testing methods to ascertain an infection in an individual and they vary in their performances, cost and delay. Unfortunately, better performing tests are sometimes costlier and time consuming and can only be done for a small fraction of the population. On the other hand, greater number of individuals can be tested using a cheaper, rapid test, but may only provide less reliable results. In this work, we studied the interplay between cost and delay of the tests as well the additional advantages offered by partial and complete lockdowns. To understand the influence of different test strategies, we implemented them on realistic random social networks with a COVID-19-like epidemic in progression. Specifically, we compared the performance of two tests mimicking the characteristics of popular tests implemented for COVID-19 detection. We present procedures and intuitive understanding to ascertain the optimum combination of the tests to minimize the peak infection as well as total quarantine days when the number of tests is constrained by a fixed total budget. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: The contents are prepared as a report with figures and graphs at their appropriate positions. All the figures are captioned properly and are explained in the main text. The references are all in IEEE format. Section from pages 21 to 31 forms the appendix

MSC Class: 91D30

arXiv:2105.01424 [pdf, other]

On the Linear convergence of Natural Policy Gradient Algorithm

Authors: Sajad Khodadadian, Prakirt Raj Jhunjhunwala, Sushil Mahavir Varma, Siva Theja Maguluri

Abstract: Markov Decision Processes are classically solved using Value Iteration and Policy Iteration algorithms. Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization, such as gradient ascent. Among these, a popular algorithm is the Natural Policy Gradient, which is a mirror descent variant for MDPs. This algorithm forms the basis of several popular Reinforce… ▽ More Markov Decision Processes are classically solved using Value Iteration and Policy Iteration algorithms. Recent interest in Reinforcement Learning has motivated the study of methods inspired by optimization, such as gradient ascent. Among these, a popular algorithm is the Natural Policy Gradient, which is a mirror descent variant for MDPs. This algorithm forms the basis of several popular Reinforcement Learning algorithms such as Natural actor-critic, TRPO, PPO, etc, and so is being studied with growing interest. It has been shown that Natural Policy Gradient with constant step size converges with a sublinear rate of O(1/k) to the global optimal. In this paper, we present improved finite time convergence bounds, and show that this algorithm has geometric (also known as linear) asymptotic convergence rate. We further improve this convergence result by introducing a variant of Natural Policy Gradient with adaptive step sizes. Finally, we compare different variants of policy gradient methods experimentally. △ Less

Submitted 4 May, 2021; originally announced May 2021.

Comments: 19 pages, 1 figure, A version of this paper was first submitted to a conference in Mar 2021

arXiv:2012.08678 [pdf]

Improved Digital Therapy for Developmental Pediatrics Using Domain-Specific Artificial Intelligence: Machine Learning Study

Authors: Peter Washington, Haik Kalantarian, John Kent, Arman Husic, Aaron Kline, Emilie Leblanc, Cathy Hou, Onur Cezmi Mutlu, Kaitlyn Dunlap, Yordan Penev, Maya Varma, Nate Tyler Stockham, Brianna Chrisman, Kelley Paskov, Min Woo Sun, Jae-Yoon Jung, Catalin Voss, Nick Haber, Dennis Paul Wall

Abstract: Background: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. Objective: We designed a strategy to gamify the collection and labeling of child emot… ▽ More Background: Automated emotion classification could aid those who struggle to recognize emotions, including children with developmental behavioral conditions such as autism. However, most computer vision emotion recognition models are trained on adult emotion and therefore underperform when applied to child faces. Objective: We designed a strategy to gamify the collection and labeling of child emotion-enriched images to boost the performance of automatic child emotion recognition models to a level closer to what will be needed for digital health care approaches. Methods: We leveraged our prototype therapeutic smartphone game, GuessWhat, which was designed in large part for children with developmental and behavioral conditions, to gamify the secure collection of video data of children expressing a variety of emotions prompted by the game. Independently, we created a secure web interface to gamify the human labeling effort, called HollywoodSquares, tailored for use by any qualified labeler. We gathered and labeled 2155 videos, 39,968 emotion frames, and 106,001 labels on all images. With this drastically expanded pediatric emotion-centric database (>30 times larger than existing public pediatric emotion data sets), we trained a convolutional neural network (CNN) computer vision classifier of happy, sad, surprised, fearful, angry, disgust, and neutral expressions evoked by children. Results: The classifier achieved a 66.9% balanced accuracy and 67.4% F1-score on the entirety of the Child Affective Facial Expression (CAFE) as well as a 79.1% balanced accuracy and 78% F1-score on CAFE Subset A, a subset containing at least 60% human agreement on emotions labels. This performance is at least 10% higher than all previously developed classifiers evaluated against CAFE, the best of which reached a 56% balanced accuracy even when combining "anger" and "disgust" into a single class. △ Less

Submitted 3 June, 2024; v1 submitted 15 December, 2020; originally announced December 2020.

Journal ref: JMIR pediatrics and parenting 5.2 (2022): e26760

arXiv:2011.04883 [pdf, other]

Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning

Authors: Rachel Gardner, Maya Varma, Clare Zhu, Ranjay Krishna

Abstract: Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data. In this work, we seek to enable the collection of high-quality question-answer datasets from social media by proposing a novel task for automated quality analysis and data cleaning: question-answer (QA) plausibility. Given a machine or u… ▽ More Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data. In this work, we seek to enable the collection of high-quality question-answer datasets from social media by proposing a novel task for automated quality analysis and data cleaning: question-answer (QA) plausibility. Given a machine or user-generated question and a crowd-sourced response from a social media user, we determine if the question and response are valid; if so, we identify the answer within the free-form response. We design BERT-based models to perform the QA plausibility task, and we evaluate the ability of our models to generate a clean, usable question-answer dataset. Our highest-performing approach consists of a single-task model which determines the plausibility of the question, followed by a multi-task model which evaluates the plausibility of the response as well as extracts answers (Question Plausibility AUROC=0.75, Response Plausibility AUROC=0.78, Answer Extraction F1=0.665). △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: Published at the 6th Workshop on Noisy User-generated Text (W-NUT) 2020 at EMNLP (6 pages, 4 figures)

arXiv:2002.11921 [pdf, other]

RNNPool: Efficient Non-linear Pooling for RAM Constrained Inference

Authors: Oindrila Saha, Aditya Kusupati, Harsha Vardhan Simhadri, Manik Varma, Prateek Jain

Abstract: Standard Convolutional Neural Networks (CNNs) designed for computer vision tasks tend to have large intermediate activation maps. These require large working memory and are thus unsuitable for deployment on resource-constrained devices typically used for inference on the edge. Aggressively downsampling the images via pooling or strided convolutions can address the problem but leads to a significan… ▽ More Standard Convolutional Neural Networks (CNNs) designed for computer vision tasks tend to have large intermediate activation maps. These require large working memory and are thus unsuitable for deployment on resource-constrained devices typically used for inference on the edge. Aggressively downsampling the images via pooling or strided convolutions can address the problem but leads to a significant decrease in accuracy due to gross aggregation of the feature map by standard pooling operators. In this paper, we introduce RNNPool, a novel pooling operator based on Recurrent Neural Networks (RNNs), that efficiently aggregates features over large patches of an image and rapidly downsamples activation maps. Empirical evaluation indicates that an RNNPool layer can effectively replace multiple blocks in a variety of architectures such as MobileNets, DenseNet when applied to standard vision tasks like image classification and face detection. That is, RNNPool can significantly decrease computational complexity and peak memory usage for inference while retaining comparable accuracy. We use RNNPool with the standard S3FD architecture to construct a face detection method that achieves state-of-the-art MAP for tiny ARM Cortex-M4 class microcontrollers with under 256 KB of RAM. Code is released at https://github.com/Microsoft/EdgeML. △ Less

Submitted 22 October, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

Comments: 25 pages, 8 figures. Published at Advances in Neural Information Processing Systems (NeurIPS) 2020

arXiv:2001.05299 [pdf, other]

Throughput Optimal Routing in Blockchain Based Payment Systems

Authors: Sushil Mahavir Varma, Siva Theja Maguluri

Abstract: Cryptocurrency networks such as Bitcoin have emerged as a distributed alternative to traditional centralized financial transaction networks. However, there are major challenges in scaling up the throughput of such networks. Lightning network and Spider network are alternates that build bidirectional payment channels on top of cryptocurrency networks using smart contracts, to enable fast transactio… ▽ More Cryptocurrency networks such as Bitcoin have emerged as a distributed alternative to traditional centralized financial transaction networks. However, there are major challenges in scaling up the throughput of such networks. Lightning network and Spider network are alternates that build bidirectional payment channels on top of cryptocurrency networks using smart contracts, to enable fast transactions that bypass the Blockchain. In this paper, we study the problem of routing transactions in such a payment processing network. We first propose a Stochastic model to study such a system, as opposed to a fluid model that is studied in the literature. Each link in such a model is a two-sided queue, and unlike classical queues, such queues are not stable unless there is an external control. We propose a notion of stability for the payment processing network consisting of such two-sided queues using the notion of on-chain rebalancing. We then characterize the capacity region and propose a throughput optimal algorithm that stabilizes the system under any load within the capacity region. The stochastic model enables us to study closed loop policies, which typically have better queuing/delay performance than the open loop policies (or static split rules) studied in the literature. We investigate this through simulations. △ Less

Submitted 23 June, 2021; v1 submitted 12 December, 2019; originally announced January 2020.

Comments: 17 pages, 6 Figures, Accepted to be published in IEEE TCNS

arXiv:2001.05228 [pdf, ps, other]

doi 10.1145/3336191.3371768

Extreme Regression for Dynamic Search Advertising

Authors: Yashoteja Prabhu, Aditya Kusupati, Nilesh Gupta, Manik Varma

Abstract: This paper introduces a new learning paradigm called eXtreme Regression (XR) whose objective is to accurately predict the numerical degrees of relevance of an extremely large number of labels to a data point. XR can provide elegant solutions to many large-scale ranking and recommendation applications including Dynamic Search Advertising (DSA). XR can learn more accurate models than the recently po… ▽ More This paper introduces a new learning paradigm called eXtreme Regression (XR) whose objective is to accurately predict the numerical degrees of relevance of an extremely large number of labels to a data point. XR can provide elegant solutions to many large-scale ranking and recommendation applications including Dynamic Search Advertising (DSA). XR can learn more accurate models than the recently popular extreme classifiers which incorrectly assume strictly binary-valued label relevances. Traditional regression metrics which sum the errors over all the labels are unsuitable for XR problems since they could give extremely loose bounds for the label ranking quality. Also, the existing regression algorithms won't efficiently scale to millions of labels. This paper addresses these limitations through: (1) new evaluation metrics for XR which sum only the k largest regression errors; (2) a new algorithm called XReg which decomposes XR task into a hierarchy of much smaller regression problems thus leading to highly efficient training and prediction. This paper also introduces a (3) new labelwise prediction algorithm in XReg useful for DSA and other recommendation tasks. Experiments on benchmark datasets demonstrated that XReg can outperform the state-of-the-art extreme classifiers as well as large-scale regressors and rankers by up to 50% reduction in the new XR error metric, and up to 2% and 2.4% improvements in terms of the propensity-scored precision metric used in extreme classification and the click-through rate metric used in DSA respectively. Deployment of XReg on DSA in Bing resulted in a relative gain of 27% in query coverage. XReg's source code can be downloaded from http://manikvarma.org/code/XReg/download.html. △ Less

Submitted 20 January, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: 15 pages, 4 figures, published at WSDM 2020 as a Long Oral

arXiv:1909.03082 [pdf, other]

One Size Does Not Fit All: Multi-Scale, Cascaded RNNs for Radar Classification

Authors: Dhrubojyoti Roy, Sangeeta Srivastava, Aditya Kusupati, Pranshu Jain, Manik Varma, Anish Arora

Abstract: Edge sensing with micro-power pulse-Doppler radars is an emergent domain in monitoring and surveillance with several smart city applications. Existing solutions for the clutter versus multi-source radar classification task are limited in terms of either accuracy or efficiency, and in some cases, struggle with a trade-off between false alarms and recall of sources. We find that this problem can be… ▽ More Edge sensing with micro-power pulse-Doppler radars is an emergent domain in monitoring and surveillance with several smart city applications. Existing solutions for the clutter versus multi-source radar classification task are limited in terms of either accuracy or efficiency, and in some cases, struggle with a trade-off between false alarms and recall of sources. We find that this problem can be resolved by learning the classifier across multiple time-scales. We propose a multi-scale, cascaded recurrent neural network architecture, MSC-RNN, comprised of an efficient multi-instance learning (MIL) Recurrent Neural Network (RNN) for clutter discrimination at a lower tier, and a more complex RNN classifier for source classification at the upper tier. By controlling the invocation of the upper RNN with the help of the lower tier conditionally, MSC-RNN achieves an overall accuracy of 0.972. Our approach holistically improves the accuracy and per-class recalls over ML models suitable for radar inferencing. Notably, we outperform cross-domain handcrafted feature engineering with time-domain deep feature learning, while also being up to $\sim$3$\times$ more efficient than a competitive solution. △ Less

Submitted 6 September, 2019; originally announced September 2019.

Comments: Conditionally accepted to ACM BuildSys 2019

arXiv:1907.01368 [pdf]

Pathologist-Level Grading of Prostate Biopsies with Artificial Intelligence

Authors: Peter Ström, Kimmo Kartasalo, Henrik Olsson, Leslie Solorzano, Brett Delahunt, Daniel M. Berney, David G. Bostwick, Andrew J. Evans, David J. Grignon, Peter A. Humphrey, Kenneth A. Iczkowski, James G. Kench, Glen Kristiansen, Theodorus H. van der Kwast, Katia R. M. Leite, Jesse K. McKenney, Jon Oxley, Chin-Chen Pan, Hemamali Samaratunga, John R. Srigley, Hiroyuki Takahashi, Toyonori Tsuzuki, Murali Varma, Ming Zhou, Johan Lindberg , et al. (7 additional authors not shown)

Abstract: Background: An increasing volume of prostate biopsies and a world-wide shortage of uro-pathologists puts a strain on pathology departments. Additionally, the high intra- and inter-observer variability in grading can result in over- and undertreatment of prostate cancer. Artificial intelligence (AI) methods may alleviate these problems by assisting pathologists to reduce workload and harmonize grad… ▽ More Background: An increasing volume of prostate biopsies and a world-wide shortage of uro-pathologists puts a strain on pathology departments. Additionally, the high intra- and inter-observer variability in grading can result in over- and undertreatment of prostate cancer. Artificial intelligence (AI) methods may alleviate these problems by assisting pathologists to reduce workload and harmonize grading. Methods: We digitized 6,682 needle biopsies from 976 participants in the population based STHLM3 diagnostic study to train deep neural networks for assessing prostate biopsies. The networks were evaluated by predicting the presence, extent, and Gleason grade of malignant tissue for an independent test set comprising 1,631 biopsies from 245 men. We additionally evaluated grading performance on 87 biopsies individually graded by 23 experienced urological pathologists from the International Society of Urological Pathology. We assessed discriminatory performance by receiver operating characteristics (ROC) and tumor extent predictions by correlating predicted millimeter cancer length against measurements by the reporting pathologist. We quantified the concordance between grades assigned by the AI and the expert urological pathologists using Cohen's kappa. Results: The performance of the AI to detect and grade cancer in prostate needle biopsy samples was comparable to that of international experts in prostate pathology. The AI achieved an area under the ROC curve of 0.997 for distinguishing between benign and malignant biopsy cores, and 0.999 for distinguishing between men with or without prostate cancer. The correlation between millimeter cancer predicted by the AI and assigned by the reporting pathologist was 0.96. For assigning Gleason grades, the AI achieved an average pairwise kappa of 0.62. This was within the range of the corresponding values for the expert pathologists (0.60 to 0.73). △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: 45 pages, 11 figures

arXiv:1905.08105 [pdf, other]

Water Distribution System Design Using Multi-Objective Genetic Algorithm with External Archive and Local Search

Authors: Mahesh Patil, M. Naveen Naidu, A. Vasan, Murari R. R. Varma

Abstract: Hybridisation of the multi-objective optimisation algorithm NSGA-II and local search is proposed for water distribution system design. Results obtained with the proposed algorithm are presented for four medium-size water networks taken from the literature. Local search is found to be beneficial for one of the networks in terms of finding new solutions not reported earlier. It is also shown that si… ▽ More Hybridisation of the multi-objective optimisation algorithm NSGA-II and local search is proposed for water distribution system design. Results obtained with the proposed algorithm are presented for four medium-size water networks taken from the literature. Local search is found to be beneficial for one of the networks in terms of finding new solutions not reported earlier. It is also shown that simply using an external archive to save all non-dominated solutions visited by the population, even without local search, leads to substantial improvement in the non-dominated set produced by the algorithm. △ Less

Submitted 20 May, 2019; originally announced May 2019.

Comments: 5 pages, 1 figure

arXiv:1903.06127 [pdf, other]

Water Distribution System Design Using Multi-Objective Particle Swarm Optimisation

Authors: Mahesh B. Patil, M. Naveen Naidu, A. Vasan, Murari R. R. Varma

Abstract: Application of the multi-objective particle swarm optimisation (MOPSO) algorithm to design of water distribution systems is described. An earlier MOPSO algorithm is augmented with (a) local search, (b) a modified strategy for assigning the leader, and (c) a modified mutation scheme. For one of the benchmark problems described in the literature, the effect of each of the above features on the algor… ▽ More Application of the multi-objective particle swarm optimisation (MOPSO) algorithm to design of water distribution systems is described. An earlier MOPSO algorithm is augmented with (a) local search, (b) a modified strategy for assigning the leader, and (c) a modified mutation scheme. For one of the benchmark problems described in the literature, the effect of each of the above features on the algorithm performance is demonstrated. The augmented MOPSO algorithm (called MOPSO+) is applied to five benchmark problems, and in each case, it finds non-dominated solutions not reported earlier. In addition, for the purpose of comparing Pareto fronts (sets of non-dominated solutions) obtained by different algorithms, a new criterion is suggested, and its usefulness is pointed out with an example. Finally, some suggestions regarding future research directions are made. △ Less

Submitted 14 March, 2019; originally announced March 2019.

Comments: 16 pages, 12 figures

arXiv:1901.02358 [pdf, ps, other]

FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network

Authors: Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain, Manik Varma

Abstract: This paper develops the FastRNN and FastGRNN algorithms to address the twin RNN limitations of inaccurate training and inefficient prediction. Previous approaches have improved accuracy at the expense of prediction costs making them infeasible for resource-constrained and real-time applications. Unitary RNNs have increased accuracy somewhat by restricting the range of the state transition matrix's… ▽ More This paper develops the FastRNN and FastGRNN algorithms to address the twin RNN limitations of inaccurate training and inefficient prediction. Previous approaches have improved accuracy at the expense of prediction costs making them infeasible for resource-constrained and real-time applications. Unitary RNNs have increased accuracy somewhat by restricting the range of the state transition matrix's singular values but have also increased the model size as they require a larger number of hidden units to make up for the loss in expressive power. Gated RNNs have obtained state-of-the-art accuracies by adding extra parameters thereby resulting in even larger models. FastRNN addresses these limitations by adding a residual connection that does not constrain the range of the singular values explicitly and has only two extra scalar parameters. FastGRNN then extends the residual connection to a gate by reusing the RNN matrices to match state-of-the-art gated RNN accuracies but with a 2-4x smaller model. Enforcing FastGRNN's matrices to be low-rank, sparse and quantized resulted in accurate models that could be up to 35x smaller than leading gated and unitary RNNs. This allowed FastGRNN to accurately recognize the "Hey Cortana" wakeword with a 1 KB model and to be deployed on severely resource-constrained IoT microcontrollers too tiny to store other RNN models. FastGRNN's code is available at https://github.com/Microsoft/EdgeML/. △ Less

Submitted 8 January, 2019; originally announced January 2019.

Comments: 23 pages, 10 figures, Published at Advances in Neural Information Processing Systems (NeurIPS) 2018

arXiv:1507.02743 [pdf, ps, other]

Locally Non-linear Embeddings for Extreme Multi-label Learning

Authors: Kush Bhatia, Himanshu Jain, Purushottam Kar, Prateek Jain, Manik Varma

Abstract: The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensio… ▽ More The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensional label vectors onto a low dimensional linear subspace. Still, leading embedding approaches have been unable to deliver high prediction accuracies or scale to large problems as the low rank assumption is violated in most real world applications. This paper develops the X-One classifier to address both limitations. The main technical contribution in X-One is a formulation for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels. This allows X-One to break free of the traditional low-rank assumption and boost classification accuracy by learning embeddings which preserve pairwise distances between only the nearest label vectors. We conducted extensive experiments on several real-world as well as benchmark data sets and compared our method against state-of-the-art methods for extreme multi-label classification. Experiments reveal that X-One can make significantly more accurate predictions then the state-of-the-art methods including both embeddings (by as much as 35%) as well as trees (by as much as 6%). X-One can also scale efficiently to data sets with a million labels which are beyond the pale of leading embedding methods. △ Less

Submitted 9 July, 2015; originally announced July 2015.

arXiv:1010.2296 [pdf, ps, other]

doi 10.1002/jgt.20643

Rainbow Connection Number and Connected Dominating Sets

Authors: L. Sunil Chandran, Anita Das, Deepak Rajendraprasad, Nithin M. Varma

Abstract: Rainbow connection number rc(G) of a connected graph G is the minimum number of colours needed to colour the edges of G, so that every pair of vertices is connected by at least one path in which no two edges are coloured the same. In this paper we show that for every connected graph G, with minimum degree at least 2, the rainbow connection number is upper bounded by γ_c(G) + 2, where γ_c(G) is the… ▽ More Rainbow connection number rc(G) of a connected graph G is the minimum number of colours needed to colour the edges of G, so that every pair of vertices is connected by at least one path in which no two edges are coloured the same. In this paper we show that for every connected graph G, with minimum degree at least 2, the rainbow connection number is upper bounded by γ_c(G) + 2, where γ_c(G) is the connected domination number of G. Bounds of the form diameter(G) \leq rc(G) \leq diameter(G) + c, 1 \leq c \leq 4, for many special graph classes follow as easy corollaries from this result. This includes interval graphs, AT-free graphs, circular arc graphs, threshold graphs, and chain graphs all with minimum degree at least 2 and connected. We also show that every bridge-less chordal graph G has rc(G) \leq 3.radius(G). In most of these cases, we also demonstrate the tightness of the bounds. An extension of this idea to two-step dominating sets is used to show that for every connected graph on n vertices with minimum degree δ, the rainbow connection number is upper bounded by 3n/(δ + 1) + 3. This solves an open problem of Schiermeyer (2009), improving the previously best known bound of 20n/δ by Krivelevich and Yuster (2010). Moreover, this bound is seen to be tight up to additive factors by a construction of Caro et al. (2008). △ Less

Submitted 12 October, 2010; originally announced October 2010.

Comments: 14 pages

MSC Class: O5C15; 05C69 (Primary); 05C12; 05C40 (Secondary)

Showing 1–35 of 35 results for author: Varma, M