Search | arXiv e-print repository

Component Matching Approach in Linking Business and Application Architecture

Abstract: The development of an IT strategy and ensuring that it is the best possible one for business is a key problem many organizations face. This problem is that of linking business architecture to IT architecture in general and application architecture specifically. In our earlier work we proposed Category theory as the formal language to unify the business and IT worlds with the ability to represent t… ▽ More The development of an IT strategy and ensuring that it is the best possible one for business is a key problem many organizations face. This problem is that of linking business architecture to IT architecture in general and application architecture specifically. In our earlier work we proposed Category theory as the formal language to unify the business and IT worlds with the ability to represent the concepts and relations between the two in a unified way. We used rCOS as the underlying model for the specification of interfaces, contracts, and components. The concept of pseudo-category was then utilized to represent the business and application architecture specifications and the relationships contained within. The linkages between them now can be established using the matching of the business component contracts with the application component contracts. However the matching was based on manual process and in this paper we extend the work by considering automated component matching process. The ground work for a tool to support the matching process is laid out in this paper. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 8 pages, one figure

arXiv:2406.05114 [pdf, other]

The Expanding Scope of the Stability Gap: Unveiling its Presence in Joint Incremental Learning of Homogeneous Tasks

Authors: Sandesh Kamath, Albin Soutif-Cormerais, Joost van de Weijer, Bogdan Raducanu

Abstract: Recent research identified a temporary performance drop on previously learned tasks when transitioning to a new one. This drop is called the stability gap and has great consequences for continual learning: it complicates the direct employment of continually learning since the worse-case performance at task-boundaries is dramatic, it limits its potential as an energy-efficient training paradigm, an… ▽ More Recent research identified a temporary performance drop on previously learned tasks when transitioning to a new one. This drop is called the stability gap and has great consequences for continual learning: it complicates the direct employment of continually learning since the worse-case performance at task-boundaries is dramatic, it limits its potential as an energy-efficient training paradigm, and finally, the stability drop could result in a reduced final performance of the algorithm. In this paper, we show that the stability gap also occurs when applying joint incremental training of homogeneous tasks. In this scenario, the learner continues training on the same data distribution and has access to all data from previous tasks. In addition, we show that in this scenario, there exists a low-loss linear path to the next minima, but that SGD optimization does not choose this path. We perform further analysis including a finer batch-wise analysis which could provide insights towards potential solution directions. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted at CVPR 2024 Workshop on Continual Learning in Computer Vision (CLVision)

arXiv:2405.19074 [pdf, other]

Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

Authors: Dipam Goswami, Albin Soutif--Cormerais, Yuyang Liu, Sandesh Kamath, Bartłomiej Twardowski, Joost van de Weijer

Abstract: Continual learning methods are known to suffer from catastrophic forgetting, a phenomenon that is particularly hard to counter for methods that do not store exemplars of previous tasks. Therefore, to reduce potential drift in the feature extractor, existing exemplar-free methods are typically evaluated in settings where the first task is significantly larger than subsequent tasks. Their performanc… ▽ More Continual learning methods are known to suffer from catastrophic forgetting, a phenomenon that is particularly hard to counter for methods that do not store exemplars of previous tasks. Therefore, to reduce potential drift in the feature extractor, existing exemplar-free methods are typically evaluated in settings where the first task is significantly larger than subsequent tasks. Their performance drops drastically in more challenging settings starting with a smaller first task. To address this problem of feature drift estimation for exemplar-free methods, we propose to adversarially perturb the current samples such that their embeddings are close to the old class prototypes in the old model embedding space. We then estimate the drift in the embedding space from the old to the new model using the perturbed images and compensate the prototypes accordingly. We exploit the fact that adversarial samples are transferable from the old to the new feature space in a continual learning setting. The generation of these images is simple and computationally cheap. We demonstrate in our experiments that the proposed approach better tracks the movement of prototypes in embedding space and outperforms existing methods on several standard continual learning benchmarks as well as on fine-grained datasets. Code is available at https://github.com/dipamgoswami/ADC. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted at CVPR 2024

arXiv:2403.02568 [pdf, other]

Designing Born-Accessible Courses in Data Science and Visualization: Challenges and Opportunities of a Remote Curriculum Taught by Blind Instructors to Blind Students

Authors: JooYoung Seo, Sile O'Modhrain, Yilin Xia, Sanchita Kamath, Bongshin Lee, James M. Coughlan

Abstract: While recent years have seen a growing interest in accessible visualization tools and techniques for blind people, little attention is paid to the learning opportunities and teaching strategies of data science and visualization tailored for blind individuals. Whereas the former focuses on the accessibility issues of data visualization tools, the latter is concerned with the learnability of concept… ▽ More While recent years have seen a growing interest in accessible visualization tools and techniques for blind people, little attention is paid to the learning opportunities and teaching strategies of data science and visualization tailored for blind individuals. Whereas the former focuses on the accessibility issues of data visualization tools, the latter is concerned with the learnability of concepts and skills for data science and visualization. In this paper, we present novel approaches to teaching data science and visualization to blind students in an online setting. Taught by blind instructors, nine blind learners having a wide range of professional backgrounds participated in a two-week summer course. We describe the course design, teaching strategies, and learning outcomes. We also discuss the challenges and opportunities of teaching data science and visualization to blind students. Our work contributes to the growing body of knowledge on accessible data science and visualization education, and provides insights into the design of online courses for blind students. △ Less

Submitted 22 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2312.10534 [pdf, other]

Rethinking Robustness of Model Attributions

Authors: Sandesh Kamath, Sankalp Mittal, Amit Deshpande, Vineeth N Balasubramanian

Abstract: For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile… ▽ More For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile and have proposed improvements in either these methods or the model training. We observe two main causes for fragile attributions: first, the existing metrics of robustness (e.g., top-k intersection) over-penalize even reasonable local shifts in attribution, thereby making random perturbations to appear as a strong attack, and second, the attribution can be concentrated in a small region even when there are multiple important parts in an image. To rectify this, we propose simple ways to strengthen existing metrics and attribution methods that incorporate locality of pixels in robustness metrics and diversity of pixel locations in attributions. Towards the role of model training in attributional robustness, we empirically observe that adversarially trained models have more robust attributions on smaller datasets, however, this advantage disappears in larger datasets. Code is available at https://github.com/ksandeshk/LENS. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: Accepted AAAI 2024

arXiv:2310.17120 [pdf, other]

Topic Segmentation of Semi-Structured and Unstructured Conversational Datasets using Language Models

Authors: Reshmi Ghosh, Harjeet Singh Kajal, Sharanya Kamath, Dhuri Shrivastava, Samyadeep Basu, Hansi Zeng, Soundararajan Srinivasan

Abstract: Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentat… ▽ More Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentation models on unstructured texts. We find that: (a) Current strategies of pre-training on a large corpus of structured text such as Wiki-727K do not help in transferability to unstructured conversational data. (b) Training from scratch with only a relatively small-sized dataset of the target unstructured domain improves the segmentation results by a significant margin. We stress-test our proposed Topic Segmentation approach by experimenting with multiple loss functions, in order to mitigate effects of imbalance in unstructured conversational datasets. Our empirical evaluation indicates that Focal Loss function is a robust alternative to Cross-Entropy and re-weighted Cross-Entropy loss function when segmenting unstructured and semi-structured chats. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted to IntelliSys 2023. arXiv admin note: substantial text overlap with arXiv:2211.14954

arXiv:2306.06613 [pdf, ps, other]

Parameter-free version of Adaptive Gradient Methods for Strongly-Convex Functions

Authors: Deepak Gouda, Hassan Naveed, Salil Kamath

Abstract: The optimal learning rate for adaptive gradient methods applied to λ-strongly convex functions relies on the parameters λ and learning rate η. In this paper, we adapt a universal algorithm along the lines of Metagrad, to get rid of this dependence on λ and η. The main idea is to concurrently run multiple experts and combine their predictions to a master algorithm. This master enjoys O(d log T) reg… ▽ More The optimal learning rate for adaptive gradient methods applied to λ-strongly convex functions relies on the parameters λ and learning rate η. In this paper, we adapt a universal algorithm along the lines of Metagrad, to get rid of this dependence on λ and η. The main idea is to concurrently run multiple experts and combine their predictions to a master algorithm. This master enjoys O(d log T) regret bounds. △ Less

Submitted 14 July, 2023; v1 submitted 11 June, 2023; originally announced June 2023.

arXiv:2212.04617 [pdf, other]

UNet Based Pipeline for Lung Segmentation from Chest X-Ray Images

Authors: Shashank Shekhar, Ritika Nandi, H Srikanth Kamath

Abstract: Biomedical image segmentation is one of the fastest growing fields which has seen extensive automation through the use of Artificial Intelligence. This has enabled widespread adoption of accurate techniques to expedite the screening and diagnostic processes which would otherwise take several days to finalize. In this paper, we present an end-to-end pipeline to segment lungs from chest X-ray images… ▽ More Biomedical image segmentation is one of the fastest growing fields which has seen extensive automation through the use of Artificial Intelligence. This has enabled widespread adoption of accurate techniques to expedite the screening and diagnostic processes which would otherwise take several days to finalize. In this paper, we present an end-to-end pipeline to segment lungs from chest X-ray images, training the neural network model on the Japanese Society of Radiological Technology (JSRT) dataset, using UNet to enable faster processing of initial screening for various lung disorders. The pipeline developed can be readily used by medical centers with just the provision of X-Ray images as input. The model will perform the preprocessing, and provide a segmented image as the final output. It is expected that this will drastically reduce the manual effort involved and lead to greater accessibility in resource-constrained locations. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: 6 Pages

arXiv:2211.14954 [pdf, other]

Topic Segmentation in the Wild: Towards Segmentation of Semi-structured & Unstructured Chats

Authors: Reshmi Ghosh, Harjeet Singh Kajal, Sharanya Kamath, Dhuri Shrivastava, Samyadeep Basu, Soundararajan Srinivasan

Abstract: Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentat… ▽ More Breaking down a document or a conversation into multiple contiguous segments based on its semantic structure is an important and challenging problem in NLP, which can assist many downstream tasks. However, current works on topic segmentation often focus on segmentation of structured texts. In this paper, we comprehensively analyze the generalization capabilities of state-of-the-art topic segmentation models on unstructured texts. We find that: (a) Current strategies of pre-training on a large corpus of structured text such as Wiki-727K do not help in transferability to unstructured texts. (b) Training from scratch with only a relatively small-sized dataset of the target unstructured domain improves the segmentation results by a significant margin. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022 : ENLSP

arXiv:2211.04780 [pdf, other]

On the Robustness of Explanations of Deep Neural Network Models: A Survey

Authors: Amlan Jyoti, Karthik Balaji Ganesh, Manoj Gayala, Nandita Lakshmi Tunuguntla, Sandesh Kamath, Vineeth N Balasubramanian

Abstract: Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can… ▽ More Explainability has been widely stated as a cornerstone of the responsible and trustworthy use of machine learning models. With the ubiquitous use of Deep Neural Network (DNN) models expanding to risk-sensitive and safety-critical domains, many methods have been proposed to explain the decisions of these models. Recent years have also seen concerted efforts that have shown how such explanations can be distorted (attacked) by minor input perturbations. While there have been many surveys that review explainability methods themselves, there has been no effort hitherto to assimilate the different methods and metrics proposed to study the robustness of explanations of DNN models. In this work, we present a comprehensive survey of methods that study, understand, attack, and defend explanations of DNN models. We also present a detailed review of different metrics used to evaluate explanation methods, as well as describe attributional attack and defense methods. We conclude with lessons and take-aways for the community towards ensuring robust explanations of DNN model predictions. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: Under Review ACM Computing Surveys "Special Issue on Trustworthy AI"

arXiv:2210.01761 [pdf, other]

A Framework for Web Services Retrieval Using Bio Inspired Clustering

Authors: Anirudha Rayasam, Siddhartha R Thota, Avinash N Bukkittu, Sowmya Kamath

Abstract: Efficiently discovering relevant Web services with respect to a specific user query has become a growing challenge owing to the incredible growth in the field of web technologies. In previous works, different clustering models have been used to address these issues. But, most of the traditional clustering techniques are computationally intensive and fail to address all the problems involved. Also,… ▽ More Efficiently discovering relevant Web services with respect to a specific user query has become a growing challenge owing to the incredible growth in the field of web technologies. In previous works, different clustering models have been used to address these issues. But, most of the traditional clustering techniques are computationally intensive and fail to address all the problems involved. Also, the current standards fail to incorporate the semantic relatedness of Web services during clustering and retrieval resulting in decreased performance. In this paper, we propose a framework for web services retrieval that uses a bottom-up, decentralized and self organising approach to cluster available services. It also provides online, dynamic computation of clusters thus overcoming the drawbacks of traditional clustering methods. We also use the semantic similarity between Web services for the clustering process to enhance the precision and lower the recall. △ Less

Submitted 4 October, 2022; originally announced October 2022.

arXiv:2101.07127 [pdf, other]

Fundamental Limits of Demand-Private Coded Caching

Authors: Chinmay Gurjarpadhye, Jithin Ravi, Sneha Kamath, Bikash Kumar Dey, Nikhil Karamchandani

Abstract: We consider the coded caching problem with an additional privacy constraint that a user should not get any information about the demands of the other users. We first show that a demand-private scheme for $N$ files and $K$ users can be obtained from a non-private scheme that serves only a subset of the demands for the $N$ files and $NK$ users problem. We further use this fact to construct a demand-… ▽ More We consider the coded caching problem with an additional privacy constraint that a user should not get any information about the demands of the other users. We first show that a demand-private scheme for $N$ files and $K$ users can be obtained from a non-private scheme that serves only a subset of the demands for the $N$ files and $NK$ users problem. We further use this fact to construct a demand-private scheme for $N$ files and $K$ users from a particular known non-private scheme for $N$ files and $NK-K+1$ users. It is then demonstrated that, the memory-rate pair $(M,\min \{N,K\}(1-M/N))$, which is achievable for non-private schemes with uncoded transmissions, is also achievable under demand privacy. We further propose a scheme that improves on these ideas by removing some redundant transmissions. The memory-rate trade-off achieved using our schemes is shown to be within a multiplicative factor of 3 from the optimal when $K < N$ and of 8 when $N\leq K$. Finally, we give the exact memory-rate trade-off for demand-private coded caching problems with $N\geq K=2$. △ Less

Submitted 18 January, 2021; originally announced January 2021.

Comments: 43 pages, 6 figures

arXiv:2012.02515 [pdf, other]

AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature Movements

Authors: Mohit Raghavendra, Pravan Omprakash, B R Mukesh, Sowmya Kamath

Abstract: Biometric systems based on Machine learning and Deep learning are being extensively used as authentication mechanisms in resource-constrained environments like smartphones and other small computing devices. These AI-powered facial recognition mechanisms have gained enormous popularity in recent years due to their transparent, contact-less and non-invasive nature. While they are effective to a larg… ▽ More Biometric systems based on Machine learning and Deep learning are being extensively used as authentication mechanisms in resource-constrained environments like smartphones and other small computing devices. These AI-powered facial recognition mechanisms have gained enormous popularity in recent years due to their transparent, contact-less and non-invasive nature. While they are effective to a large extent, there are ways to gain unauthorized access using photographs, masks, glasses, etc. In this paper, we propose an alternative authentication mechanism that uses both facial recognition and the unique movements of that particular face while uttering a password, that is, the temporal facial feature movements. The proposed model is not inhibited by language barriers because a user can set a password in any language. When evaluated on the standard MIRACL-VC1 dataset, the proposed model achieved an accuracy of 98.1%, underscoring its effectiveness as an effective and robust system. The proposed method is also data-efficient since the model gave good results even when trained with only 10 positive video samples. The competence of the training of the network is also demonstrated by benchmarking the proposed system against various compounded Facial recognition and Lip reading models. △ Less

Submitted 19 December, 2020; v1 submitted 4 December, 2020; originally announced December 2020.

Comments: 2-page version accepted in AAAI-21 Student Abstract and Poster Program

arXiv:2006.11604 [pdf, other]

How do SGD hyperparameters in natural training affect adversarial robustness?

Authors: Sandesh Kamath, Amit Deshpande, K V Subrahmanyam

Abstract: Learning rate, batch size and momentum are three important hyperparameters in the SGD algorithm. It is known from the work of Jastrzebski et al. arXiv:1711.04623 that large batch size training of neural networks yields models which do not generalize well. Yao et al. arXiv:1802.08241 observe that large batch training yields models that have poor adversarial robustness. In the same paper, the author… ▽ More Learning rate, batch size and momentum are three important hyperparameters in the SGD algorithm. It is known from the work of Jastrzebski et al. arXiv:1711.04623 that large batch size training of neural networks yields models which do not generalize well. Yao et al. arXiv:1802.08241 observe that large batch training yields models that have poor adversarial robustness. In the same paper, the authors train models with different batch sizes and compute the eigenvalues of the Hessian of loss function. They observe that as the batch size increases, the dominant eigenvalues of the Hessian become larger. They also show that both adversarial training and small-batch training leads to a drop in the dominant eigenvalues of the Hessian or lowering its spectrum. They combine adversarial training and second order information to come up with a new large-batch training algorithm and obtain robust models with good generalization. In this paper, we empirically observe the effect of the SGD hyperparameters on the accuracy and adversarial robustness of networks trained with unperturbed samples. Jastrzebski et al. considered training models with a fixed learning rate to batch size ratio. They observed that higher the ratio, better is the generalization. We observe that networks trained with constant learning rate to batch size ratio, as proposed in Jastrzebski et al., yield models which generalize well and also have almost constant adversarial robustness, independent of the batch size. We observe that momentum is more effective with varying batch sizes and a fixed learning rate than with constant learning rate to batch size ratio based SGD training. △ Less

Submitted 20 June, 2020; originally announced June 2020.

Comments: Preliminary version presented in ICML 2019 Workshop on "Understanding and Improving Generalization in Deep Learning" as "On Adversarial Robustness of Small vs Large Batch Training"

arXiv:2006.04449 [pdf, other]

On Universalized Adversarial and Invariant Perturbations

Authors: Sandesh Kamath, Amit Deshpande, K V Subrahmanyam

Abstract: Convolutional neural networks or standard CNNs (StdCNNs) are translation-equivariant models that achieve translation invariance when trained on data augmented with sufficient translations. Recent work on equivariant models for a given group of transformations (e.g., rotations) has lead to group-equivariant convolutional neural networks (GCNNs). GCNNs trained on data augmented with sufficient rotat… ▽ More Convolutional neural networks or standard CNNs (StdCNNs) are translation-equivariant models that achieve translation invariance when trained on data augmented with sufficient translations. Recent work on equivariant models for a given group of transformations (e.g., rotations) has lead to group-equivariant convolutional neural networks (GCNNs). GCNNs trained on data augmented with sufficient rotations achieve rotation invariance. Recent work by authors arXiv:2002.11318 studies a trade-off between invariance and robustness to adversarial attacks. In another related work arXiv:2005.08632, given any model and any input-dependent attack that satisfies a certain spectral property, the authors propose a universalization technique called SVD-Universal to produce a universal adversarial perturbation by looking at very few test examples. In this paper, we study the effectiveness of SVD-Universal on GCNNs as they gain rotation invariance through higher degree of training augmentation. We empirically observe that as GCNNs gain rotation invariance through training augmented with larger rotations, the fooling rate of SVD-Universal gets better. To understand this phenomenon, we introduce universal invariant directions and study their relation to the universal adversarial direction produced by SVD-Universal. △ Less

Submitted 8 June, 2020; originally announced June 2020.

Comments: Some part of this work was presented in ICML 2018 Workshop on "Towards learning with limited labels: Equivariance, Invariance,and Beyond" as "Understanding Adversarial Robustness of Symmetric Networks"

arXiv:2005.08632 [pdf, other]

Universalization of any adversarial attack using very few test examples

Authors: Sandesh Kamath, Amit Deshpande, K V Subrahmanyam, Vineeth N Balasubramanian

Abstract: Deep learning models are known to be vulnerable not only to input-dependent adversarial attacks but also to input-agnostic or universal adversarial attacks. Dezfooli et al. \cite{Dezfooli17,Dezfooli17anal} construct universal adversarial attack on a given model by looking at a large number of training data points and the geometry of the decision boundary near them. Subsequent work \cite{Khrulkov18… ▽ More Deep learning models are known to be vulnerable not only to input-dependent adversarial attacks but also to input-agnostic or universal adversarial attacks. Dezfooli et al. \cite{Dezfooli17,Dezfooli17anal} construct universal adversarial attack on a given model by looking at a large number of training data points and the geometry of the decision boundary near them. Subsequent work \cite{Khrulkov18} constructs universal attack by looking only at test examples and intermediate layers of the given model. In this paper, we propose a simple universalization technique to take any input-dependent adversarial attack and construct a universal attack by only looking at very few adversarial test examples. We do not require details of the given model and have negligible computational overhead for universalization. We theoretically justify our universalization technique by a spectral property common to many input-dependent adversarial perturbations, e.g., gradients, Fast Gradient Sign Method (FGSM) and DeepFool. Using matrix concentration inequalities and spectral perturbation bounds, we show that the top singular vector of input-dependent adversarial directions on a small test sample gives an effective and simple universal adversarial attack. For VGG16 and VGG19 models trained on ImageNet, our simple universalization of Gradient, FGSM, and DeepFool perturbations using a test sample of 64 images gives fooling rates comparable to state-of-the-art universal attacks \cite{Dezfooli17,Khrulkov18} for reasonable norms of perturbation. Code available at https://github.com/ksandeshk/svd-uap . △ Less

Submitted 28 October, 2022; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: Appeared in ACM CODS-COMAD 2022 (Research Track)

arXiv:2002.11318 [pdf, other]

Can we have it all? On the Trade-off between Spatial and Adversarial Robustness of Neural Networks

Authors: Sandesh Kamath, Amit Deshpande, K V Subrahmanyam, Vineeth N Balasubramanian

Abstract: (Non-)robustness of neural networks to small, adversarial pixel-wise perturbations, and as more recently shown, to even random spatial transformations (e.g., translations, rotations) entreats both theoretical and empirical understanding. Spatial robustness to random translations and rotations is commonly attained via equivariant models (e.g., StdCNNs, GCNNs) and training augmentation, whereas adve… ▽ More (Non-)robustness of neural networks to small, adversarial pixel-wise perturbations, and as more recently shown, to even random spatial transformations (e.g., translations, rotations) entreats both theoretical and empirical understanding. Spatial robustness to random translations and rotations is commonly attained via equivariant models (e.g., StdCNNs, GCNNs) and training augmentation, whereas adversarial robustness is typically achieved by adversarial training. In this paper, we prove a quantitative trade-off between spatial and adversarial robustness in a simple statistical setting. We complement this empirically by showing that: (a) as the spatial robustness of equivariant models improves by training augmentation with progressively larger transformations, their adversarial robustness worsens progressively, and (b) as the state-of-the-art robust models are adversarially trained with progressively larger pixel-wise perturbations, their spatial robustness drops progressively. Towards achieving pareto-optimality in this trade-off, we propose a method based on curriculum learning that trains gradually on more difficult perturbations (both spatial and adversarial) to improve spatial and adversarial robustness simultaneously. △ Less

Submitted 10 November, 2021; v1 submitted 26 February, 2020; originally announced February 2020.

Comments: Accepted NeurIPS 2021. Preliminary version consisting early experimental results was presented in ICML 2018 Workshop on "Towards learning with limited labels: Equivariance, Invariance,and Beyond" as "Understanding Adversarial Robustness of Symmetric Networks"

arXiv:1911.06995 [pdf, other]

Demand-Private Coded Caching and the Exact Trade-off for N=K=2

Authors: Sneha Kamath, Jithin Ravi, Bikash Kumar Dey

Abstract: The distributed coded caching problem has been studied extensively in the recent past. While the known coded caching schemes achieve an improved transmission rate, they violate the privacy of the users since in these schemes the demand of one user is revealed to others in the delivery phase. In this paper, we consider the coded caching problem under the constraint that the demands of the other use… ▽ More The distributed coded caching problem has been studied extensively in the recent past. While the known coded caching schemes achieve an improved transmission rate, they violate the privacy of the users since in these schemes the demand of one user is revealed to others in the delivery phase. In this paper, we consider the coded caching problem under the constraint that the demands of the other users remain information theoretically secret from each user. We first show that the memory-rate pair $(M,\min \{N,K\}(1-M/N))$ is achievable under information theoretic demand privacy, while using broadcast transmissions. We then show that a demand-private scheme for $N$ files and $K$ users can be obtained from a non-private scheme that satisfies only a restricted subset of demands of $NK$ users for $N$ files. We then focus on the demand-private coded caching problem for $K=2$ users, $N=2$ files. We characterize the exact memory-rate trade-off for this case. To show the achievability, we use our first result to construct a demand-private scheme from a non-private scheme satisfying a restricted demand subset that is known from an earlier work by Tian. Further, by giving a converse based on the extra requirement of privacy, we show that the obtained achievable region is the exact memory-rate trade-off. △ Less

Submitted 18 February, 2020; v1 submitted 16 November, 2019; originally announced November 2019.

Comments: 8 pages, 2 figures

arXiv:1911.00712 [pdf, other]

How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering

Authors: Sanjay Kamath, Brigitte Grau, Yue Ma

Abstract: Using deep learning models on small scale datasets would result in overfitting. To overcome this problem, the process of pre-training a model and fine-tuning it to the small scale dataset has been used extensively in domains such as image processing. Similarly for question answering, pre-training and fine-tuning can be done in several ways. Commonly reading comprehension models are used for pre-tr… ▽ More Using deep learning models on small scale datasets would result in overfitting. To overcome this problem, the process of pre-training a model and fine-tuning it to the small scale dataset has been used extensively in domains such as image processing. Similarly for question answering, pre-training and fine-tuning can be done in several ways. Commonly reading comprehension models are used for pre-training, but we show that other types of pre-training can work better. We compare two pre-training models based on reading comprehension and open domain question answering models and determine the performance when fine-tuned and tested over BIOASQ question answering dataset. We find open domain question answering model to be a better fit for this task rather than reading comprehension model. △ Less

Submitted 2 November, 2019; originally announced November 2019.

arXiv:1909.03324 [pdf, other]

Demand Private Coded Caching

Authors: Sneha Kamath

Abstract: The work by Maddah-Ali and Niesen demonstrated the benefits in reducing the transmission rate in a noiseless broadcast network by joint design of caching and delivery schemes. In their setup, each user learns the demands of all other users in the delivery phase. In this paper, we introduce the problem of demand private coded caching where we impose a privacy requirement that no user learns any inf… ▽ More The work by Maddah-Ali and Niesen demonstrated the benefits in reducing the transmission rate in a noiseless broadcast network by joint design of caching and delivery schemes. In their setup, each user learns the demands of all other users in the delivery phase. In this paper, we introduce the problem of demand private coded caching where we impose a privacy requirement that no user learns any information about the demands of other users. We provide an achievable scheme and compare its performance using the existing lower bounds on the achievable rates under no privacy setting. For this setting, when $N\leq K$ we show that our scheme is order optimal within a multiplicative factor of 8. Furthermore, when $N > K$ and $M\geq N/K$, our scheme is order optimal within a multiplicative factor of 4. △ Less

Submitted 7 September, 2019; originally announced September 2019.

Comments: 14 pages, 3 figures

arXiv:1807.07878 [pdf, other]

An Operational Approach to Information Leakage

Authors: Ibrahim Issa, Aaron B. Wagner, Sudeep Kamath

Abstract: Given two random variables $X$ and $Y$, an operational approach is undertaken to quantify the ``leakage'' of information from $X$ to $Y$. The resulting measure $\mathcal{L}(X \!\! \to \!\! Y)$ is called \emph{maximal leakage}, and is defined as the multiplicative increase, upon observing $Y$, of the probability of correctly guessing a randomized function of $X$, maximized over all such randomized… ▽ More Given two random variables $X$ and $Y$, an operational approach is undertaken to quantify the ``leakage'' of information from $X$ to $Y$. The resulting measure $\mathcal{L}(X \!\! \to \!\! Y)$ is called \emph{maximal leakage}, and is defined as the multiplicative increase, upon observing $Y$, of the probability of correctly guessing a randomized function of $X$, maximized over all such randomized functions. A closed-form expression for $\mathcal{L}(X \!\! \to \!\! Y)$ is given for discrete $X$ and $Y$, and it is subsequently generalized to handle a large class of random variables. The resulting properties are shown to be consistent with an axiomatic view of a leakage measure, and the definition is shown to be robust to variations in the setup. Moreover, a variant of the Shannon cipher system is studied, in which performance of an encryption scheme is measured using maximal leakage. A single-letter characterization of the optimal limit of (normalized) maximal leakage is derived and asymptotically-optimal encryption schemes are demonstrated. Furthermore, the sample complexity of estimating maximal leakage from data is characterized up to subpolynomial factors. Finally, the \emph{guessing} framework used to define maximal leakage is used to give operational interpretations of commonly used leakage measures, such as Shannon capacity, maximal correlation, and local differential privacy. △ Less

Submitted 20 July, 2018; originally announced July 2018.

Comments: Submitted to IEEE Transactions on Information Theory (appeared in part in CISS 2016, ISIT 2016 & 2017)

arXiv:1506.01105 [pdf, other]

The two-unicast problem

Authors: Sudeep Kamath, Venkat Anantharam, David Tse, Chih-Chun Wang

Abstract: We consider the communication capacity of wireline networks for a two-unicast traffic pattern. The network has two sources and two destinations with each source communicating a message to its own destination, subject to the capacity constraints on the directed edges of the network. We propose a simple outer bound for the problem that we call the Generalized Network Sharing (GNS) bound. We show thi… ▽ More We consider the communication capacity of wireline networks for a two-unicast traffic pattern. The network has two sources and two destinations with each source communicating a message to its own destination, subject to the capacity constraints on the directed edges of the network. We propose a simple outer bound for the problem that we call the Generalized Network Sharing (GNS) bound. We show this bound is the tightest edge-cut bound for two-unicast networks and is tight in several bottleneck cases, though it is not tight in general. We also show that the problem of computing the GNS bound is NP-complete. Finally, we show that despite its seeming simplicity, the two-unicast problem is as hard as the most general network coding problem. As a consequence, linear coding is insufficient to achieve capacity for general two-unicast networks, and non-Shannon inequalities are necessary for characterizing capacity of general two-unicast networks. △ Less

Submitted 2 June, 2015; originally announced June 2015.

Comments: 23 pages, 22 figures

arXiv:1505.00769 [pdf, other]

On Non-Interactive Simulation of Joint Distributions

Authors: Sudeep Kamath, Venkat Anantharam

Abstract: We consider the following non-interactive simulation problem: Alice and Bob observe sequences $X^n$ and $Y^n$ respectively where $\{(X_i, Y_i)\}_{i=1}^n$ are drawn i.i.d. from $P(x,y),$ and they output $U$ and $V$ respectively which is required to have a joint law that is close in total variation to a specified $Q(u,v).$ It is known that the maximal correlation of $U$ and $V$ must necessarily be n… ▽ More We consider the following non-interactive simulation problem: Alice and Bob observe sequences $X^n$ and $Y^n$ respectively where $\{(X_i, Y_i)\}_{i=1}^n$ are drawn i.i.d. from $P(x,y),$ and they output $U$ and $V$ respectively which is required to have a joint law that is close in total variation to a specified $Q(u,v).$ It is known that the maximal correlation of $U$ and $V$ must necessarily be no bigger than that of $X$ and $Y$ if this is to be possible. Our main contribution is to bring hypercontractivity to bear as a tool on this problem. In particular, we show that if $P(x,y)$ is the doubly symmetric binary source, then hypercontractivity provides stronger impossibility results than maximal correlation. Finally, we extend these tools to provide impossibility results for the $k$-agent version of this problem. △ Less

Submitted 9 April, 2016; v1 submitted 4 May, 2015; originally announced May 2015.

Comments: 25 pages, 13 figures

arXiv:1306.6839 [pdf]

W3-Scrape - A Windows based Reconnaissance Tool for Web Application Fingerprinting

Authors: Karthik R, Raghavendra Karthik, Pramod S, Sowmya Kamath

Abstract: Web Application finger printing is a quintessential part of the Information Gathering phase of (ethical) hacking. It allows narrowing down the specifics instead of looking for all clues. Also an application that has been correctly recognized can help in quickly analyzing known weaknesses and then moving ahead with remaining aspects. This step is also essential to allow a pen tester to customize it… ▽ More Web Application finger printing is a quintessential part of the Information Gathering phase of (ethical) hacking. It allows narrowing down the specifics instead of looking for all clues. Also an application that has been correctly recognized can help in quickly analyzing known weaknesses and then moving ahead with remaining aspects. This step is also essential to allow a pen tester to customize its payload or exploitation techniques based on the identification so to increase the chances of successful intrusion. This paper presents a new tool "W3-Scrape" for the relatively nascent field of Web Application finger printing that helps automate web application fingerprinting when performed in the current scenarios. △ Less

Submitted 24 June, 2013; originally announced June 2013.

Comments: International Conference on Emerging Trends in Electrical, Communication and Information Technologies (ICECIT 2012), 6 pages; Organised by SRIT, Ananthpur, India during Dec 21 - 23, 2012. (Publisher - Elsevier Science & Technology; ISBN 8131234118, 9788131234112)

ACM Class: D.4.6; E.3

arXiv:1304.6133 [pdf, other]

On Maximal Correlation, Hypercontractivity, and the Data Processing Inequality studied by Erkip and Cover

Authors: Venkat Anantharam, Amin Gohari, Sudeep Kamath, Chandra Nair

Abstract: In this paper we provide a new geometric characterization of the Hirschfeld-Gebelein-Rényi maximal correlation of a pair of random $(X,Y)$, as well as of the chordal slope of the nontrivial boundary of the hypercontractivity ribbon of $(X,Y)$ at infinity. The new characterizations lead to simple proofs for some of the known facts about these quantities. We also provide a counterexample to a data p… ▽ More In this paper we provide a new geometric characterization of the Hirschfeld-Gebelein-Rényi maximal correlation of a pair of random $(X,Y)$, as well as of the chordal slope of the nontrivial boundary of the hypercontractivity ribbon of $(X,Y)$ at infinity. The new characterizations lead to simple proofs for some of the known facts about these quantities. We also provide a counterexample to a data processing inequality claimed by Erkip and Cover, and find the correct tight constant for this kind of inequality. △ Less

Submitted 22 April, 2013; originally announced April 2013.

Comments: 11 pages

arXiv:1304.1677 [pdf]

Bug Classification: Feature Extraction and Comparison of Event Model using Naïve Bayes Approach

Authors: Sunil Joy Dommati, Ruchi Agrawal, Ram Mohana Reddy G., S. Sowmya Kamath

Abstract: In software industries, individuals at different levels from customer to an engineer apply diverse mechanisms to detect to which class a particular bug should be allocated. Sometimes while a simple search in Internet might help, in many other cases a lot of effort is spent in analyzing the bug report to classify the bug. So there is a great need of a structured mining algorithm - where given a cra… ▽ More In software industries, individuals at different levels from customer to an engineer apply diverse mechanisms to detect to which class a particular bug should be allocated. Sometimes while a simple search in Internet might help, in many other cases a lot of effort is spent in analyzing the bug report to classify the bug. So there is a great need of a structured mining algorithm - where given a crash log, the existing bug database could be mined to find out the class to which the bug should be allocated. This would involve Mining patterns and applying different classification algorithms. This paper focuses on the feature extraction, noise reduction in data and classification of network bugs using probabilistic Naïve Bayes approach. Different event models like Bernoulli and Multinomial are applied on the extracted features. When new, unseen bugs are given as input to the algorithms, the performance comparison of different algorithms is done on the basis of accuracy and recall parameters. △ Less

Submitted 5 April, 2013; originally announced April 2013.

Comments: 5 pages, International Conference on Recent Trends in Computer and Information Engineering (ICRTCIE'2012) April 13-15, 2012 Pattaya, http://psrcentre.org/images/extraimages/412138.pdf

arXiv:1304.1676 [pdf]

Research on Potential Semantic Web Service Discovery Mechanisms

Authors: A Anji Reddy, S Sowmya Kamath

Abstract: The field of Web services is an important paradigm in distributed application development. Currently, many businesses are seeking to convert their applications into web services because of its ability to promote inter-operability among applications. As a number of web services increase, the process of discovering appropriate web services for consumption from user's perspective gains importance. In… ▽ More The field of Web services is an important paradigm in distributed application development. Currently, many businesses are seeking to convert their applications into web services because of its ability to promote inter-operability among applications. As a number of web services increase, the process of discovering appropriate web services for consumption from user's perspective gains importance. In this paper, we present a study of potential ways of discovering web services and issues related to each of them. In addition, we discuss ontology concepts and related technologies, which incorporate semantic meaning and hence give domain knowledge about a web service to improve the discovery mechanism. The paper also presents an overview of related research work, identifying metrics useful in filtering web service search mechanisms. △ Less

Submitted 5 April, 2013; originally announced April 2013.

Comments: 6 pages, International Conference on Recent Trends in Computer Science and Engineering (ICRTCSE' 2012) May 3 - 4, 2012 Chennai, INDIA ISBN: 978-81-9089-807-2

arXiv:1105.6326 [pdf, other]

Two Unicast Information Flows over Linear Deterministic Networks

Authors: I-Hsiang Wang, Sudeep U. Kamath, David N. C. Tse

Abstract: We investigate the two unicast flow problem over layered linear deterministic networks with arbitrary number of nodes. When the minimum cut value between each source-destination pair is constrained to be 1, it is obvious that the triangular rate region {(R_1,R_2):R_1,R_2> 0, R_1+R_2< 1} can be achieved, and that one cannot achieve beyond the square rate region {(R_1,R_2):R_1,R_2> 0, R_1< 1,R_2< 1}… ▽ More We investigate the two unicast flow problem over layered linear deterministic networks with arbitrary number of nodes. When the minimum cut value between each source-destination pair is constrained to be 1, it is obvious that the triangular rate region {(R_1,R_2):R_1,R_2> 0, R_1+R_2< 1} can be achieved, and that one cannot achieve beyond the square rate region {(R_1,R_2):R_1,R_2> 0, R_1< 1,R_2< 1}. Analogous to the work by Wang and Shroff for wired networks, we provide the necessary and sufficient conditions for the capacity region to be the triangular region and the necessary and sufficient conditions for it to be the square region. Moreover, we completely characterize the capacity region and conclude that there are exactly three more possible capacity regions of this class of networks, in contrast to the result in wired networks where only the triangular and square rate regions are possible. Our achievability scheme is based on linear coding over an extension field with at most four nodes performing special linear coding operations, namely interference neutralization and zero forcing, while all other nodes perform random linear coding. △ Less

Submitted 31 May, 2011; originally announced May 2011.

Comments: Extended version of the conference paper to be presented at ISIT 2011

arXiv:0805.0337 [pdf, other]

On Distributed Function Computation in Structure-Free Random Networks

Authors: Sudeep Kamath, D. Manjunath

Abstract: We consider in-network computation of MAX in a structure-free random multihop wireless network. Nodes do not know their relative or absolute locations and use the Aloha MAC protocol. For one-shot computation, we describe a protocol in which the MAX value becomes available at the origin in $O(\sqrt{n/\log n})$ slots with high probability. This is within a constant factor of that required by the b… ▽ More We consider in-network computation of MAX in a structure-free random multihop wireless network. Nodes do not know their relative or absolute locations and use the Aloha MAC protocol. For one-shot computation, we describe a protocol in which the MAX value becomes available at the origin in $O(\sqrt{n/\log n})$ slots with high probability. This is within a constant factor of that required by the best coordinated protocol. A minimal structure (knowledge of hop-distance from the sink) is imposed on the network and with this structure, we describe a protocol for pipelined computation of MAX that achieves a rate of $Ω(1/(\log^2 n)).$ △ Less

Submitted 5 May, 2008; originally announced May 2008.

Comments: 13 pages, 1 figure. Accepted at IEEE International Symposium on Information Theory 2008

Showing 1–29 of 29 results for author: Kamath, S