-
NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry
Authors:
Yash Khandelwal,
Mayur Arvind,
Sriram Kumar,
Ashish Gupta,
Sachin Kumar Danisetty,
Piyush Bagad,
Anish Madan,
Mayank Lunayach,
Aditya Annavajjala,
Abhishek Maiti,
Sansiddh Jain,
Aman Dalmia,
Namrata Deka,
Jerome White,
Jigar Doshi,
Angjoo Kanazawa,
Rahul Panicker,
Alpan Raval,
Srinivas Rana,
Makarand Tapaswi
Abstract:
Malnutrition among newborns is a top public health concern in develo** countries. Identification and subsequent growth monitoring are key to successful interventions. However, this is challenging in rural communities where health systems tend to be inaccessible and under-equipped, with poor adherence to protocol. Our goal is to equip health workers and public health systems with a solution for c…
▽ More
Malnutrition among newborns is a top public health concern in develo** countries. Identification and subsequent growth monitoring are key to successful interventions. However, this is challenging in rural communities where health systems tend to be inaccessible and under-equipped, with poor adherence to protocol. Our goal is to equip health workers and public health systems with a solution for contactless newborn anthropometry in the community.
We propose NurtureNet, a multi-task model that fuses visual information (a video taken with a low-cost smartphone) with tabular inputs to regress multiple anthropometry estimates including weight, length, head circumference, and chest circumference. We show that visual proxy tasks of segmentation and keypoint prediction further improve performance. We establish the efficacy of the model through several experiments and achieve a relative error of 3.9% and mean absolute error of 114.3 g for weight estimation. Model compression to 15 MB also allows offline deployment to low-cost smartphones.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Authors:
Yuan Shangguan,
Haichuan Yang,
Danni Li,
Chunyang Wu,
Yassir Fathullah,
Dilin Wang,
Ayushi Dalmia,
Raghuraman Krishnamoorthi,
Ozlem Kalinli,
Junteng Jia,
Jay Mahadeokar,
Xin Lei,
Mike Seltzer,
Vikas Chandra
Abstract:
Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficien…
▽ More
Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficiently train many sizes of hardware-friendly on-device ASR models with comparable GPU-hours to that of a single training job. TODM leverages insights from prior work on Supernet, where Recurrent Neural Network Transducer (RNN-T) models share weights within a Supernet. It reduces layer sizes and widths of the Supernet to obtain subnetworks, making them smaller models suitable for all hardware types. We introduce a novel combination of three techniques to improve the outcomes of the TODM Supernet: adaptive dropouts, an in-place Alpha-divergence knowledge distillation, and the use of ScaledAdam optimizer. We validate our approach by comparing Supernet-trained versus individually tuned Multi-Head State Space Model (MH-SSM) RNN-T using LibriSpeech. Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently kee** the cost of training many models at a small constant.
△ Less
Submitted 27 November, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Clustering with UMAP: Why and How Connectivity Matters
Authors:
Ayush Dalmia,
Suzanna Sia
Abstract:
Topology based dimensionality reduction methods such as t-SNE and UMAP have seen increasing success and popularity in high-dimensional data. These methods have strong mathematical foundations and are based on the intuition that the topology in low dimensions should be close to that of high dimensions. Given that the initial topological structure is a precursor to the success of the algorithm, this…
▽ More
Topology based dimensionality reduction methods such as t-SNE and UMAP have seen increasing success and popularity in high-dimensional data. These methods have strong mathematical foundations and are based on the intuition that the topology in low dimensions should be close to that of high dimensions. Given that the initial topological structure is a precursor to the success of the algorithm, this naturally raises the question: What makes a "good" topological structure for dimensionality reduction? Insight into this will enable us to design better algorithms which take into account both local and global structure. In this paper which focuses on UMAP, we study the effects of node connectivity (k-Nearest Neighbors vs mutual k-Nearest Neighbors) and relative neighborhood (Adjacent via Path Neighbors) on dimensionality reduction. We explore these concepts through extensive ablation studies on 4 standard image and text datasets; MNIST, FMNIST, 20NG, AG, reducing to 2 and 64 dimensions. Our findings indicate that a more refined notion of connectivity (mutual k-Nearest Neighbors with minimum spanning tree) together with a flexible method of constructing the local neighborhood (Path Neighbors), can achieve a much better representation than default UMAP, as measured by downstream clustering performance.
△ Less
Submitted 16 December, 2021; v1 submitted 12 August, 2021;
originally announced August 2021.
-
Impact of data-splits on generalization: Identifying COVID-19 from cough and context
Authors:
Makkunda Sharma,
Nikhil Shenoy,
Jigar Doshi,
Piyush Bagad,
Aman Dalmia,
Parag Bhamare,
Amrita Mahale,
Saurabh Rane,
Neeraj Agrawal,
Rahul Panicker
Abstract:
Rapidly scaling screening, testing and quarantine has shown to be an effective strategy to combat the COVID-19 pandemic. We consider the application of deep learning techniques to distinguish individuals with COVID from non-COVID by using data acquirable from a phone. Using cough and context (symptoms and meta-data) represent such a promising approach. Several independent works in this direction h…
▽ More
Rapidly scaling screening, testing and quarantine has shown to be an effective strategy to combat the COVID-19 pandemic. We consider the application of deep learning techniques to distinguish individuals with COVID from non-COVID by using data acquirable from a phone. Using cough and context (symptoms and meta-data) represent such a promising approach. Several independent works in this direction have shown promising results. However, none of them report performance across clinically relevant data splits. Specifically, the performance where the development and test sets are split in time (retrospective validation) and across sites (broad validation). Although there is meaningful generalization across these splits the performance significantly varies (up to 0.1 AUC score). In addition, we study the performance of symptomatic and asymptomatic individuals across these three splits. Finally, we show that our model focuses on meaningful features of the input, cough bouts for cough and relevant symptoms for context. The code and checkpoints are available at https://github.com/WadhwaniAI/cough-against-covid
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds
Authors:
Piyush Bagad,
Aman Dalmia,
Jigar Doshi,
Arsha Nagrani,
Parag Bhamare,
Amrita Mahale,
Saurabh Rane,
Neeraj Agarwal,
Rahul Panicker
Abstract:
Testing capacity for COVID-19 remains a challenge globally due to the lack of adequate supplies, trained personnel, and sample-processing equipment. These problems are even more acute in rural and underdeveloped regions. We demonstrate that solicited-cough sounds collected over a phone, when analysed by our AI model, have statistically significant signal indicative of COVID-19 status (AUC 0.72, t-…
▽ More
Testing capacity for COVID-19 remains a challenge globally due to the lack of adequate supplies, trained personnel, and sample-processing equipment. These problems are even more acute in rural and underdeveloped regions. We demonstrate that solicited-cough sounds collected over a phone, when analysed by our AI model, have statistically significant signal indicative of COVID-19 status (AUC 0.72, t-test,p <0.01,95% CI 0.61-0.83). This holds true for asymptomatic patients as well. Towards this, we collect the largest known(to date) dataset of microbiologically confirmed COVID-19 cough sounds from 3,621 individuals. When used in a triaging step within an overall testing protocol, by enabling risk-stratification of individuals before confirmatory tests, our tool can increase the testing capacity of a healthcare system by 43% at disease prevalence of 5%, without additional supplies, trained personnel, or physical infrastructure
△ Less
Submitted 23 September, 2020; v1 submitted 17 September, 2020;
originally announced September 2020.
-
Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!
Authors:
Suzanna Sia,
Ayush Dalmia,
Sabrina J. Mielke
Abstract:
Topic models are a useful analysis tool to uncover the underlying themes within document collections. The dominant approach is to use probabilistic topic models that posit a generative story, but in this paper we propose an alternative way to obtain topics: clustering pre-trained word embeddings while incorporating document information for weighted clustering and reranking top words. We provide be…
▽ More
Topic models are a useful analysis tool to uncover the underlying themes within document collections. The dominant approach is to use probabilistic topic models that posit a generative story, but in this paper we propose an alternative way to obtain topics: clustering pre-trained word embeddings while incorporating document information for weighted clustering and reranking top words. We provide benchmarks for the combination of different word embeddings and clustering algorithms, and analyse their performance under dimensionality reduction with PCA. The best performing combination for our approach performs as well as classical topic models, but with lower runtime and computational complexity.
△ Less
Submitted 6 October, 2020; v1 submitted 30 April, 2020;
originally announced April 2020.
-
Unified Semantic Parsing with Weak Supervision
Authors:
Priyanka Agrawal,
Parag Jain,
Ayushi Dalmia,
Abhishek Bansal,
Ashish Mittal,
Karthik Sankaranarayanan
Abstract:
Semantic parsing over multiple knowledge bases enables a parser to exploit structural similarities of programs across the multiple domains. However, the fundamental challenge lies in obtaining high-quality annotations of (utterance, program) pairs across various domains needed for training such models. To overcome this, we propose a novel framework to build a unified multi-domain enabled semantic…
▽ More
Semantic parsing over multiple knowledge bases enables a parser to exploit structural similarities of programs across the multiple domains. However, the fundamental challenge lies in obtaining high-quality annotations of (utterance, program) pairs across various domains needed for training such models. To overcome this, we propose a novel framework to build a unified multi-domain enabled semantic parser trained only with weak supervision (denotations). Weakly supervised training is particularly arduous as the program search space grows exponentially in a multi-domain setting. To solve this, we incorporate a multi-policy distillation mechanism in which we first train domain-specific semantic parsers (teachers) using weak supervision in the absence of the ground truth programs, followed by training a single unified parser (student) from the domain specific policies obtained from these teachers. The resultant semantic parser is not only compact but also generalizes better, and generates more accurate programs. It further does not require the user to provide a domain label while querying. On the standard Overnight dataset (containing multiple domains), we demonstrate that the proposed model improves performance by 20% in terms of denotation accuracy in comparison to baseline techniques.
△ Less
Submitted 12 June, 2019;
originally announced June 2019.
-
Styling with Attention to Details
Authors:
Ayushi Dalmia,
Sachindra Joshi,
Raghavendra Singh,
Vikas Raykar
Abstract:
Fashion as characterized by its nature, is driven by style. In this paper, we propose a method that takes into account the style information to complete a given set of selected fashion items with a complementary fashion item. Complementary items are those items that can be worn along with the selected items according to the style. Addressing this problem facilitates in automatically generating sty…
▽ More
Fashion as characterized by its nature, is driven by style. In this paper, we propose a method that takes into account the style information to complete a given set of selected fashion items with a complementary fashion item. Complementary items are those items that can be worn along with the selected items according to the style. Addressing this problem facilitates in automatically generating stylish fashion ensembles leading to a richer shop** experience for users.
Recently, there has been a surge of online social websites where fashion enthusiasts post the outfit of the day and other users can like and comment on them. These posts contain a gold-mine of information about style. In this paper, we exploit these posts to train a deep neural network which captures style in an automated manner. We pose the problem of predicting complementary fashion items as a sequence to sequence problem where the input is the selected set of fashion items and the output is a complementary fashion item based on the style information learned by the model. We use the encoder decoder architecture to solve this problem of completing the set of fashion items. We evaluate the goodness of the proposed model through a variety of experiments. We empirically observe that our proposed model outperforms competitive baseline like apriori algorithm by ~28 in terms of accuracy for top-1 recommendation to complete the fashion ensemble. We also perform retrieval based experiments to understand the ability of the model to learn style and rank the complementary fashion items and find that using attention in our encoder decoder model helps in improving the mean reciprocal rank by ~24. Qualitatively we find the complementary fashion items generated by our proposed model are richer than the apriori algorithm.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
Siamese Neural Networks with Random Forest for detecting duplicate question pairs
Authors:
Ameya Godbole,
Aman Dalmia,
Sunil Kumar Sahu
Abstract:
Determining whether two given questions are semantically similar is a fairly challenging task given the different structures and forms that the questions can take. In this paper, we use Gated Recurrent Units(GRU) in combination with other highly used machine learning algorithms like Random Forest, Adaboost and SVM for the similarity prediction task on a dataset released by Quora, consisting of abo…
▽ More
Determining whether two given questions are semantically similar is a fairly challenging task given the different structures and forms that the questions can take. In this paper, we use Gated Recurrent Units(GRU) in combination with other highly used machine learning algorithms like Random Forest, Adaboost and SVM for the similarity prediction task on a dataset released by Quora, consisting of about 400k labeled question pairs. We got the best result by using the Siamese adaptation of a Bidirectional GRU with a Random Forest classifier, which landed us among the top 24% in the competition Quora Question Pairs hosted on Kaggle.
△ Less
Submitted 28 January, 2018; v1 submitted 22 January, 2018;
originally announced January 2018.
-
Metrics for Community Analysis: A Survey
Authors:
Tanmoy Chakraborty,
Ayushi Dalmia,
Animesh Mukherjee,
Niloy Ganguly
Abstract:
Detecting and analyzing dense groups or communities from social and information networks has attracted immense attention over last one decade due to its enormous applicability in different domains. Community detection is an ill-defined problem, as the nature of the communities is not known in advance. The problem has turned out to be even complicated due to the fact that communities emerge in the…
▽ More
Detecting and analyzing dense groups or communities from social and information networks has attracted immense attention over last one decade due to its enormous applicability in different domains. Community detection is an ill-defined problem, as the nature of the communities is not known in advance. The problem has turned out to be even complicated due to the fact that communities emerge in the network in various forms - disjoint, overlap**, hierarchical etc. Various heuristics have been proposed depending upon the application in hand. All these heuristics have been materialized in the form of new metrics, which in most cases are used as optimization functions for detecting the community structure, or provide an indication of the goodness of detected communities during evaluation. There arises a need for an organized and detailed survey of the metrics proposed with respect to community detection and evaluation. In this survey, we present a comprehensive and structured overview of the start-of-the-art metrics used for the detection and the evaluation of community structure. We also conduct experiments on synthetic and real-world networks to present a comparative analysis of these metrics in measuring the goodness of the underlying community structure.
△ Less
Submitted 12 April, 2016;
originally announced April 2016.