Search | arXiv e-print repository

A Framework for Multi-modal Learning: Jointly Modeling Inter- & Intra-Modality Dependencies

Authors: Divyam Madaan, Taro Makino, Sumit Chopra, Kyunghyun Cho

Abstract: Supervised multi-modal learning involves map** multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approac… ▽ More Supervised multi-modal learning involves map** multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approaches that rely solely on either inter- or intra-modality dependencies may not be optimal in general. We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose inter- & intra-modality modeling (I2M2) framework, which captures and integrates both the inter- and intra-modality dependencies, leading to more accurate predictions. We evaluate our approach using real-world healthcare and vision-and-language datasets with state-of-the-art models, demonstrating superior performance over traditional methods focusing only on one type of modality dependency. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2306.13276 [pdf, other]

On Sensitivity and Robustness of Normalization Schemes to Input Distribution Shifts in Automatic MR Image Diagnosis

Authors: Divyam Madaan, Daniel Sodickson, Kyunghyun Cho, Sumit Chopra

Abstract: Magnetic Resonance Imaging (MRI) is considered the gold standard of medical imaging because of the excellent soft-tissue contrast exhibited in the images reconstructed by the MRI pipeline, which in-turn enables the human radiologist to discern many pathologies easily. More recently, Deep Learning (DL) models have also achieved state-of-the-art performance in diagnosing multiple diseases using thes… ▽ More Magnetic Resonance Imaging (MRI) is considered the gold standard of medical imaging because of the excellent soft-tissue contrast exhibited in the images reconstructed by the MRI pipeline, which in-turn enables the human radiologist to discern many pathologies easily. More recently, Deep Learning (DL) models have also achieved state-of-the-art performance in diagnosing multiple diseases using these reconstructed images as input. However, the image reconstruction process within the MRI pipeline, which requires the use of complex hardware and adjustment of a large number of scanner parameters, is highly susceptible to noise of various forms, resulting in arbitrary artifacts within the images. Furthermore, the noise distribution is not stationary and varies within a machine, across machines, and patients, leading to varying artifacts within the images. Unfortunately, DL models are quite sensitive to these varying artifacts as it leads to changes in the input data distribution between the training and testing phases. The lack of robustness of these models against varying artifacts impedes their use in medical applications where safety is critical. In this work, we focus on improving the generalization performance of these models in the presence of multiple varying artifacts that manifest due to the complexity of the MR data acquisition. In our experiments, we observe that Batch Normalization, a widely used technique during the training of DL models for medical image analysis, is a significant cause of performance degradation in these changing environments. As a solution, we propose to use other normalization techniques, such as Group Normalization and Layer Normalization (LN), to inject robustness into model performance against varying image artifacts. Through a systematic set of experiments, we show that GN and LN provide better accuracy for various MR artifacts and distribution shifts. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: Accepted at MIDL 2023

arXiv:2306.08593 [pdf, other]

Heterogeneous Continual Learning

Authors: Divyam Madaan, Hongxu Yin, Wonmin Byeon, Jan Kautz, Pavlo Molchanov

Abstract: We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures. Most CL methods focus on adapting a single architecture to a new task/class by modifying its weights. However, with rapid progress in architecture design, the problem of adapting existing solutions to novel architectures becomes relevant. To address this limitation, we pro… ▽ More We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures. Most CL methods focus on adapting a single architecture to a new task/class by modifying its weights. However, with rapid progress in architecture design, the problem of adapting existing solutions to novel architectures becomes relevant. To address this limitation, we propose Heterogeneous Continual Learning (HCL), where a wide range of evolving network architectures emerge continually together with novel data/tasks. As a solution, we build on top of the distillation family of techniques and modify it to a new setting where a weaker model takes the role of a teacher; meanwhile, a new stronger architecture acts as a student. Furthermore, we consider a setup of limited access to previous data and propose Quick Deep Inversion (QDI) to recover prior task visual features to support knowledge transfer. QDI significantly reduces computational costs compared to previous solutions and improves overall performance. In summary, we propose a new setup for CL with a modified knowledge distillation paradigm and design a quick data inversion method to enhance distillation. Our evaluation of various benchmarks shows a significant improvement on accuracy in comparison to state-of-the-art methods over various networks architectures. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: Accepted to CVPR 2023

arXiv:2302.13289 [pdf, other]

Improving Representational Continuity via Continued Pretraining

Authors: Michael Sun, Ananya Kumar, Divyam Madaan, Percy Liang

Abstract: We consider the continual representation learning setting: sequentially pretrain a model $M'$ on tasks $T_1, \ldots, T_T$, and then adapt $M'$ on a small amount of data from each task $T_i$ to check if it has forgotten information from old tasks. Under a kNN adaptation protocol, prior work shows that continual learning methods improve forgetting over naive training (SGD). In reality, practitioners… ▽ More We consider the continual representation learning setting: sequentially pretrain a model $M'$ on tasks $T_1, \ldots, T_T$, and then adapt $M'$ on a small amount of data from each task $T_i$ to check if it has forgotten information from old tasks. Under a kNN adaptation protocol, prior work shows that continual learning methods improve forgetting over naive training (SGD). In reality, practitioners do not use kNN classifiers -- they use the adaptation method that works best (e.g., fine-tuning) -- here, we find that strong continual learning baselines do worse than naive training. Interestingly, we find that a method from the transfer learning community (LP-FT) outperforms naive training and the other continual learning methods. Even with standard kNN evaluation protocols, LP-FT performs comparably with strong continual learning methods (while being simpler and requiring less memory) on three standard benchmarks: sequential CIFAR-10, CIFAR-100, and TinyImageNet. LP-FT also reduces forgetting in a real world satellite remote sensing dataset (FMoW), and a variant of LP-FT gets state-of-the-art accuracies on an NLP continual learning benchmark. △ Less

Submitted 26 February, 2023; originally announced February 2023.

arXiv:2210.09730 [pdf, other]

doi 10.1145/3636516

Efficient Syndrome Decoder for Heavy Hexagonal QECC via Machine Learning

Authors: Debasmita Bhoumik, Ritajit Majumdar, Dhiraj Madan, Dhinakaran Vinayagamurthy, Shesha Raghunathan, Susmita Sur-Kolay

Abstract: Error syndromes for heavy hexagonal code and other topological codes such as surface code have typically been decoded by using Minimum Weight Perfect Matching (MWPM) based methods. Recent advances have shown that topological codes can be efficiently decoded by deploying machine learning (ML) techniques, in particular with neural networks. In this work, we first propose an ML based decoder for heav… ▽ More Error syndromes for heavy hexagonal code and other topological codes such as surface code have typically been decoded by using Minimum Weight Perfect Matching (MWPM) based methods. Recent advances have shown that topological codes can be efficiently decoded by deploying machine learning (ML) techniques, in particular with neural networks. In this work, we first propose an ML based decoder for heavy hexagonal code and establish its efficiency in terms of the values of threshold and pseudo-threshold, for various noise models. We show that the proposed ML based decoding method achieves $\sim5 \times$ higher values of threshold than that for MWPM. Next, exploiting the property of subsystem codes, we define gauge equivalence for heavy hexagonal code, by which two distinct errors can belong to the same error class. A linear search based method is proposed for determining the equivalent error classes. This provides a quadratic reduction in the number of error classes to be considered for both bit flip and phase flip errors, and thus a further improvement of $\sim 14\%$ in the threshold over the basic ML decoder. Lastly, a novel technique based on rank to determine the equivalent error classes is presented, which is empirically faster than the one based on linear search. △ Less

Submitted 2 April, 2024; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: This paper is published in ACM Transactions on Quantum Computing. Link https://dl.acm.org/doi/abs/10.1145/3636516

Journal ref: ACM Transactions on Quantum Computing 5, 1, Article 5 (March 2024), 27 pages

arXiv:2208.12852 [pdf, other]

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey

Authors: Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman

Abstract: We present the results of the NLP Community Metasurvey. Run from May to June 2022, the survey elicited opinions on controversial issues, including industry influence in the field, concerns about AGI, and ethics. Our results put concrete numbers to several controversies: For example, respondents are split almost exactly in half on questions about the importance of artificial general intelligence, w… ▽ More We present the results of the NLP Community Metasurvey. Run from May to June 2022, the survey elicited opinions on controversial issues, including industry influence in the field, concerns about AGI, and ethics. Our results put concrete numbers to several controversies: For example, respondents are split almost exactly in half on questions about the importance of artificial general intelligence, whether language models understand language, and the necessity of linguistic structure and inductive bias for solving NLP problems. In addition, the survey posed meta-questions, asking respondents to predict the distribution of survey responses. This allows us not only to gain insight on the spectrum of beliefs held by NLP researchers, but also to uncover false sociological beliefs where the community's predictions don't match reality. We find such mismatches on a wide range of issues. Among other results, the community greatly overestimates its own belief in the usefulness of benchmarks and the potential for scaling to solve real-world problems, while underestimating its own belief in the importance of linguistic structure, inductive bias, and interdisciplinary science. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: 31 pages, 19 figures, 3 tables; more information at https://nlpsurvey.net

ACM Class: I.2.7

arXiv:2207.01508 [pdf]

Understanding misinformation in India: The case for a meaningful regulatory approach for social media platforms

Authors: Gandharv Dhruv Madan

Abstract: For research, this paper has included numerous literature that are covering a variety of information on the topics of misinformation, social media and fake news, regulation of misinformation and social media platforms, all presented for India. Studies including thematic analysis of misinformation, brief history on social media and its amplification of misinformation, current and past policy interv… ▽ More For research, this paper has included numerous literature that are covering a variety of information on the topics of misinformation, social media and fake news, regulation of misinformation and social media platforms, all presented for India. Studies including thematic analysis of misinformation, brief history on social media and its amplification of misinformation, current and past policy interventions by the Indian government, history of self-regulations in industries, and an analysis of regulatory approaches in the Indian context. This paper aims at introducing a coherent reading into the context of misinformation in the country and the subsequent social and business disruptions that will follow. Utilizing lessons from history around industry regulations, existing policy research and framework analysis to convince the reader of the nature of policy intervention that will bode well for all stakeholders involved. The literature sources have been mentioned in their respective sections for reference. The research utilized the PASTEL framework to analyse data collected from other research efforts covering the topic of misinformation and regulation across academic whitepapers and news media blogs and articles, all available freely on the public domain. Relevant secondary data, in terms of information, previous analysis in other research efforts, and literature work included in respective sections in the paper have been reproduced, shared and/or indicated wherever necessary. △ Less

Submitted 19 June, 2022; originally announced July 2022.

Comments: 10 pages

arXiv:2112.00653 [pdf, other]

doi 10.24963/ijcai.2022/597

Variational Learning for Unsupervised Knowledge Grounded Dialogs

Authors: Mayank Mishra, Dhiraj Madan, Gaurav Pandey, Danish Contractor

Abstract: Recent methods for knowledge grounded dialogs generate responses by incorporating information from an external textual document. These methods do not require the exact document to be known during training and rely on the use of a retrieval system to fetch relevant documents from a large index. The documents used to generate the responses are modeled as latent variables whose prior probabilities ne… ▽ More Recent methods for knowledge grounded dialogs generate responses by incorporating information from an external textual document. These methods do not require the exact document to be known during training and rely on the use of a retrieval system to fetch relevant documents from a large index. The documents used to generate the responses are modeled as latent variables whose prior probabilities need to be estimated. Models such as RAG and REALM, marginalize the document probabilities over the documents retrieved from the index to define the log likelihood loss function which is optimized end-to-end. In this paper, we develop a variational approach to the above technique wherein, we instead maximize the Evidence Lower bound (ELBO). Using a collection of three publicly available open-conversation datasets, we demonstrate how the posterior distribution, that has information from the ground-truth response, allows for a better approximation of the objective function during training. To overcome the challenges associated with sampling over a large knowledge collection, we develop an efficient approach to approximate the ELBO. To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems. △ Less

Submitted 28 April, 2022; v1 submitted 23 November, 2021; originally announced December 2021.

arXiv:2110.06976 [pdf, other]

Representational Continuity for Unsupervised Continual Learning

Authors: Divyam Madaan, Jaehong Yoon, Yuanchun Li, Yunxin Liu, Sung Ju Hwang

Abstract: Continual learning (CL) aims to learn a sequence of tasks without forgetting the previously acquired knowledge. However, recent CL advances are restricted to supervised continual learning (SCL) scenarios. Consequently, they are not scalable to real-world applications where the data distribution is often biased and unannotated. In this work, we focus on unsupervised continual learning (UCL), where… ▽ More Continual learning (CL) aims to learn a sequence of tasks without forgetting the previously acquired knowledge. However, recent CL advances are restricted to supervised continual learning (SCL) scenarios. Consequently, they are not scalable to real-world applications where the data distribution is often biased and unannotated. In this work, we focus on unsupervised continual learning (UCL), where we learn the feature representations on an unlabelled sequence of tasks and show that reliance on annotated data is not necessary for continual learning. We conduct a systematic study analyzing the learned feature representations and show that unsupervised visual representations are surprisingly more robust to catastrophic forgetting, consistently achieve better performance, and generalize better to out-of-distribution tasks than SCL. Furthermore, we find that UCL achieves a smoother loss landscape through qualitative analysis of the learned representations and learns meaningful feature representations. Additionally, we propose Lifelong Unsupervised Mixup (LUMP), a simple yet effective technique that interpolates between the current task and previous tasks' instances to alleviate catastrophic forgetting for unsupervised representations. △ Less

Submitted 4 April, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: Accepted to ICLR (Oral) 2022. Code available at https://github.com/divyam3897/UCL

arXiv:2110.04637 [pdf, other]

Depth Optimized Ansatz Circuit in QAOA for Max-Cut

Authors: Ritajit Majumdar, Debasmita Bhoumik, Dhiraj Madan, Dhinakaran Vinayagamurthy, Shesha Raghunathan, Susmita Sur-Kolay

Abstract: While a Quantum Approximate Optimization Algorithm (QAOA) is intended to provide a quantum advantage in finding approximate solutions to combinatorial optimization problems, noise in the system is a hurdle in exploiting its full potential. Several error mitigation techniques have been studied to lessen the effect of noise on this algorithm. Recently, Majumdar et al. proposed a Depth First Search (… ▽ More While a Quantum Approximate Optimization Algorithm (QAOA) is intended to provide a quantum advantage in finding approximate solutions to combinatorial optimization problems, noise in the system is a hurdle in exploiting its full potential. Several error mitigation techniques have been studied to lessen the effect of noise on this algorithm. Recently, Majumdar et al. proposed a Depth First Search (DFS) based method to reduce $n-1$ CNOT gates in the ansatz design of QAOA for finding Max-Cut in a graph G = (V, E), |V| = n. However, this method tends to increase the depth of the circuit, making it more prone to relaxation error. The depth of the circuit is proportional to the height of the DFS tree, which can be $n-1$ in the worst case. In this paper, we propose an $O(Δ\cdot n^2)$ greedy heuristic algorithm, where $Δ$ is the maximum degree of the graph, that finds a spanning tree of lower height, thus reducing the overall depth of the circuit while still retaining the $n-1$ reduction in the number of CNOT gates needed in the ansatz. We numerically show that this algorithm achieves nearly 10 times increase in the probability of success for each iteration of QAOA for Max-Cut. We further show that although the average depth of the circuit produced by this heuristic algorithm still grows linearly with n, our algorithm reduces the slope of the linear increase from 1 to 0.11. △ Less

Submitted 9 October, 2021; originally announced October 2021.

Comments: 12 pages, single column

arXiv:2106.02812 [pdf, other]

Optimizing Ansatz Design in QAOA for Max-cut

Authors: Ritajit Majumdar, Dhiraj Madan, Debasmita Bhoumik, Dhinakaran Vinayagamurthy, Shesha Raghunathan, Susmita Sur-Kolay

Abstract: Quantum Approximate Optimization Algorithm (QAOA) is studied primarily to find approximate solutions to combinatorial optimization problems. For a graph with $n$ vertices and $m$ edges, a depth $p$ QAOA for the Max-cut problem requires $2\cdot m \cdot p$ CNOT gates. CNOT is one of the primary sources of error in modern quantum computers. In this paper, we propose two hardware independent methods t… ▽ More Quantum Approximate Optimization Algorithm (QAOA) is studied primarily to find approximate solutions to combinatorial optimization problems. For a graph with $n$ vertices and $m$ edges, a depth $p$ QAOA for the Max-cut problem requires $2\cdot m \cdot p$ CNOT gates. CNOT is one of the primary sources of error in modern quantum computers. In this paper, we propose two hardware independent methods to reduce the number of CNOT gates in the circuit. First, we present a method based on Edge Coloring of the input graph that minimizes the the number of cycles (termed as depth of the circuit), and reduces upto $\lfloor \frac{n}{2} \rfloor$ CNOT gates. Next, we depict another method based on Depth First Search (DFS) on the input graph that reduces $n-1$ CNOT gates, but increases depth of the circuit moderately. We analytically derive the condition for which the reduction in CNOT gates overshadows this increase in depth, and the error probability of the circuit is still lowered. We show that all IBM Quantum Hardware satisfy this condition. We simulate these two methods for graphs of various sparsity with the \textit{ibmq\_manhattan} noise model, and show that the DFS based method outperforms the edge coloring based method, which in turn, outperforms the traditional QAOA circuit in terms of reduction in the number of CNOT gates, and hence the probability of error of the circuit. △ Less

Submitted 28 June, 2021; v1 submitted 5 June, 2021; originally announced June 2021.

Comments: 13 pages; double column

arXiv:2106.01085 [pdf, other]

Online Coreset Selection for Rehearsal-based Continual Learning

Authors: Jaehong Yoon, Divyam Madaan, Eunho Yang, Sung Ju Hwang

Abstract: A dataset is a shred of crucial evidence to describe a task. However, each data point in the dataset does not have the same potential, as some of the data points can be more representative or informative than others. This unequal importance among the data points may have a large impact in rehearsal-based continual learning, where we store a subset of the training examples (coreset) to be replayed… ▽ More A dataset is a shred of crucial evidence to describe a task. However, each data point in the dataset does not have the same potential, as some of the data points can be more representative or informative than others. This unequal importance among the data points may have a large impact in rehearsal-based continual learning, where we store a subset of the training examples (coreset) to be replayed later to alleviate catastrophic forgetting. In continual learning, the quality of the samples stored in the coreset directly affects the model's effectiveness and efficiency. The coreset selection problem becomes even more important under realistic settings, such as imbalanced continual learning or noisy data scenarios. To tackle this problem, we propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration and trains them in an online manner. Our proposed method maximizes the model's adaptation to a current dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting. We validate the effectiveness of our coreset selection mechanism over various standard, imbalanced, and noisy datasets against strong continual learning baselines, demonstrating that it improves task adaptation and prevents catastrophic forgetting in a sample-efficient manner. △ Less

Submitted 18 March, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: ICLR 2022

arXiv:2006.12135 [pdf, other]

Learning to Generate Noise for Multi-Attack Robustness

Authors: Divyam Madaan, **woo Shin, Sung Ju Hwang

Abstract: Adversarial learning has emerged as one of the successful techniques to circumvent the susceptibility of existing methods against adversarial perturbations. However, the majority of existing defense methods are tailored to defend against a single category of adversarial perturbation (e.g. $\ell_\infty$-attack). In safety-critical applications, this makes these methods extraneous as the attacker ca… ▽ More Adversarial learning has emerged as one of the successful techniques to circumvent the susceptibility of existing methods against adversarial perturbations. However, the majority of existing defense methods are tailored to defend against a single category of adversarial perturbation (e.g. $\ell_\infty$-attack). In safety-critical applications, this makes these methods extraneous as the attacker can adopt diverse adversaries to deceive the system. Moreover, training on multiple perturbations simultaneously significantly increases the computational overhead during training. To address these challenges, we propose a novel meta-learning framework that explicitly learns to generate noise to improve the model's robustness against multiple types of attacks. Its key component is Meta Noise Generator (MNG) that outputs optimal noise to stochastically perturb a given sample, such that it helps lower the error on diverse adversarial perturbations. By utilizing samples generated by MNG, we train a model by enforcing the label consistency across multiple perturbations. We validate the robustness of models trained by our scheme on various datasets and against a wide variety of perturbations, demonstrating that it significantly outperforms the baselines across multiple perturbations with a marginal computational cost. △ Less

Submitted 24 June, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: Accepted to ICML 2021. Code available at https://github.com/divyam3897/MNG_AC

arXiv:1909.03759 [pdf, other]

Neural Conversational QA: Learning to Reason v.s. Exploiting Patterns

Authors: Nikhil Verma, Abhishek Sharma, Dhiraj Madan, Danish Contractor, Harshit Kumar, Sachindra Joshi

Abstract: Neural Conversational QA tasks like ShARC require systems to answer questions based on the contents of a given passage. On studying recent state-of-the-art models on the ShARCQA task, we found indications that the models learn spurious clues/patterns in the dataset. Furthermore, we show that a heuristic-based program designed to exploit these patterns can have performance comparable to that of the… ▽ More Neural Conversational QA tasks like ShARC require systems to answer questions based on the contents of a given passage. On studying recent state-of-the-art models on the ShARCQA task, we found indications that the models learn spurious clues/patterns in the dataset. Furthermore, we show that a heuristic-based program designed to exploit these patterns can have performance comparable to that of the neural models. In this paper we share our findings about four types of patterns found in the ShARC corpus and describe how neural models exploit them. Motivated by the aforementioned findings, we create and share a modified dataset that has fewer spurious patterns, consequently allowing models to learn better. △ Less

Submitted 9 October, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

Comments: Accepted at EMNLP 2020. NOTE: An older version of this paper presented a model called 'UrcaNet'. Please view the v1 version of this paper on arxiv for details on that model. This version does not contain UrcaNet

arXiv:1908.04355 [pdf, other]

Adversarial Neural Pruning with Latent Vulnerability Suppression

Authors: Divyam Madaan, **woo Shin, Sung Ju Hwang

Abstract: Despite the remarkable performance of deep neural networks on various computer vision tasks, they are known to be susceptible to adversarial perturbations, which makes it challenging to deploy them in real-world safety-critical applications. In this paper, we conjecture that the leading cause of adversarial vulnerability is the distortion in the latent feature space, and provide methods to suppres… ▽ More Despite the remarkable performance of deep neural networks on various computer vision tasks, they are known to be susceptible to adversarial perturbations, which makes it challenging to deploy them in real-world safety-critical applications. In this paper, we conjecture that the leading cause of adversarial vulnerability is the distortion in the latent feature space, and provide methods to suppress them effectively. Explicitly, we define \emph{vulnerability} for each latent feature and then propose a new loss for adversarial learning, \emph{Vulnerability Suppression (VS)} loss, that aims to minimize the feature-level vulnerability during training. We further propose a Bayesian framework to prune features with high vulnerability to reduce both vulnerability and loss on adversarial samples. We validate our \emph{Adversarial Neural Pruning with Vulnerability Suppression (ANP-VS)} method on multiple benchmark datasets, on which it not only obtains state-of-the-art adversarial robustness but also improves the performance on clean examples, using only a fraction of the parameters used by the full network. Further qualitative analysis suggests that the improvements come from the suppression of feature-level vulnerability. △ Less

Submitted 2 July, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

Comments: Accepted to ICML 2020. Code available at https://github.com/divyam3897/ANP_VS

arXiv:1905.13678 [pdf, other]

Learning Sparse Networks Using Targeted Dropout

Authors: Aidan N. Gomez, Ivan Zhang, Siddhartha Rao Kamalakara, Divyam Madaan, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton

Abstract: Neural networks are easier to optimise when they have many more weights than are required for modelling the map** from inputs to outputs. This suggests a two-stage learning procedure that first learns a large net and then prunes away connections or hidden units. But standard training does not necessarily encourage nets to be amenable to pruning. We introduce targeted dropout, a method for traini… ▽ More Neural networks are easier to optimise when they have many more weights than are required for modelling the map** from inputs to outputs. This suggests a two-stage learning procedure that first learns a large net and then prunes away connections or hidden units. But standard training does not necessarily encourage nets to be amenable to pruning. We introduce targeted dropout, a method for training a neural network so that it is robust to subsequent pruning. Before computing the gradients for each weight update, targeted dropout stochastically selects a set of units or weights to be dropped using a simple self-reinforcing sparsity criterion and then computes the gradients for the remaining weights. The resulting network is robust to post hoc pruning of weights or units that frequently occur in the dropped sets. The method improves upon more complicated sparsifying regularisers while being simple to implement and easy to tune. △ Less

Submitted 9 September, 2019; v1 submitted 31 May, 2019; originally announced May 2019.

arXiv:1904.03977 [pdf, other]

VayuAnukulani: Adaptive Memory Networks for Air Pollution Forecasting

Authors: Divyam Madaan, Radhika Dua, Prerana Mukherjee, Brejesh Lall

Abstract: Air pollution is the leading environmental health hazard globally due to various sources which include factory emissions, car exhaust and cooking stoves. As a precautionary measure, air pollution forecast serves as the basis for taking effective pollution control measures, and accurate air pollution forecasting has become an important task. In this paper, we forecast fine-grained ambient air quali… ▽ More Air pollution is the leading environmental health hazard globally due to various sources which include factory emissions, car exhaust and cooking stoves. As a precautionary measure, air pollution forecast serves as the basis for taking effective pollution control measures, and accurate air pollution forecasting has become an important task. In this paper, we forecast fine-grained ambient air quality information for 5 prominent locations in Delhi based on the historical and real-time ambient air quality and meteorological data reported by Central Pollution Control board. We present VayuAnukulani system, a novel end-to-end solution to predict air quality for next 24 hours by estimating the concentration and level of different air pollutants including nitrogen dioxide ($NO_2$), particulate matter ($PM_{2.5}$ and $PM_{10}$) for Delhi. Extensive experiments on data sources obtained in Delhi demonstrate that the proposed adaptive attention based Bidirectional LSTM Network outperforms several baselines for classification and regression models. The accuracy of the proposed adaptive system is $\sim 15 - 20\%$ better than the same offline trained model. We compare the proposed methodology on several competing baselines, and show that the network outperforms conventional methods by $\sim 3 - 5 \%$. △ Less

Submitted 8 April, 2019; originally announced April 2019.

arXiv:1811.01012 [pdf, other]

Unsupervised Learning of Interpretable Dialog Models

Authors: Dhiraj Madan, Dinesh Raghu, Gaurav Pandey, Sachindra Joshi

Abstract: Recently several deep learning based models have been proposed for end-to-end learning of dialogs. While these models can be trained from data without the need for any additional annotations, it is hard to interpret them. On the other hand, there exist traditional state based dialog systems, where the states of the dialog are discrete and hence easy to interpret. However these states need to be ha… ▽ More Recently several deep learning based models have been proposed for end-to-end learning of dialogs. While these models can be trained from data without the need for any additional annotations, it is hard to interpret them. On the other hand, there exist traditional state based dialog systems, where the states of the dialog are discrete and hence easy to interpret. However these states need to be handcrafted and annotated in the data. To achieve the best of both worlds, we propose Latent State Tracking Network (LSTN) using which we learn an interpretable model in unsupervised manner. The model defines a discrete latent variable at each turn of the conversation which can take a finite set of values. Since these discrete variables are not present in the training data, we use EM algorithm to train our model in unsupervised manner. In the experiments, we show that LSTN can help achieve interpretability in dialog models without much decrease in performance compared to end-to-end approaches. △ Less

Submitted 2 November, 2018; originally announced November 2018.

arXiv:1710.10609 [pdf, other]

Finding Dominant User Utterances And System Responses in Conversations

Authors: Dhiraj Madan, Sachindra Joshi

Abstract: There are several dialog frameworks which allow manual specification of intents and rule based dialog flow. The rule based framework provides good control to dialog designers at the expense of being more time consuming and laborious. The job of a dialog designer can be reduced if we could identify pairs of user intents and corresponding responses automatically from prior conversations between user… ▽ More There are several dialog frameworks which allow manual specification of intents and rule based dialog flow. The rule based framework provides good control to dialog designers at the expense of being more time consuming and laborious. The job of a dialog designer can be reduced if we could identify pairs of user intents and corresponding responses automatically from prior conversations between users and agents. In this paper we propose an approach to find these frequent user utterances (which serve as examples for intents) and corresponding agent responses. We propose a novel SimCluster algorithm that extends standard K-means algorithm to simultaneously cluster user utterances and agent utterances by taking their adjacency information into account. The method also aligns these clusters to provide pairs of intents and response groups. We compare our results with those produced by using simple Kmeans clustering on a real dataset and observe upto 10% absolute improvement in F1-scores. Through our experiments on synthetic dataset, we show that our algorithm gains more advantage over K-means algorithm when the data has large variance. △ Less

Submitted 29 October, 2017; originally announced October 2017.

arXiv:1507.08501 [pdf, ps, other]

Randomised Rounding with Applications

Authors: Dhiraj Madan, Sandeep Sen

Abstract: We develop new techniques for rounding packing integer programs using iterative randomized rounding. It is based on a novel application of multidimensional Brownian motion in $\mathbb{R}^n$. Let $\overset{\sim}{x} \in {[0,1]}^n$ be a fractional feasible solution of a packing constraint $A x \leq 1,\ \ $ $A \in {\{0,1 \}}^{m\times n}$ that maximizes a linear objective function. The independent rand… ▽ More We develop new techniques for rounding packing integer programs using iterative randomized rounding. It is based on a novel application of multidimensional Brownian motion in $\mathbb{R}^n$. Let $\overset{\sim}{x} \in {[0,1]}^n$ be a fractional feasible solution of a packing constraint $A x \leq 1,\ \ $ $A \in {\{0,1 \}}^{m\times n}$ that maximizes a linear objective function. The independent randomized rounding method of Raghavan-Thompson rounds each variable $x_i$ to 1 with probability $\overset{\sim}{x_i}$ and 0 otherwise. The expected value of the rounded objective function matches the fractional optimum and no constraint is violated by more than $O(\frac{\log m} {\log\log m})$.In contrast, our algorithm iteratively transforms $\overset{\sim}{x}$ to $\hat{x} \in {\{ 0,1\}}^{n}$ using a random walk, such that the expected values of $\hat{x}_i$'s are consistent with the Raghavan-Thompson rounding. In addition, it gives us intermediate values $x'$ which can then be used to bias the rounding towards a superior solution.The reduced dependencies between the constraints of the sparser system can be exploited using {\it Lovasz Local Lemma}. For $m$ randomly chosen packing constraints in $n$ variables, with $k$ variables in each inequality, the constraints are satisfied within $O(\frac{\log (mkp\log m/n) }{\log\log (mkp\log m/n)})$ with high probability where $p$ is the ratio between the maximum and minimum coefficients of the linear objective function. Further, we explore trade-offs between approximation factors and error, and present applications to well-known problems like circuit-switching, maximum independent set of rectangles and hypergraph $b$-matching. Our methods apply to the weighted instances of the problems and are likely to lead to better insights for even dependent rounding. △ Less

Submitted 30 July, 2015; originally announced July 2015.

Showing 1–20 of 20 results for author: Madan, D