Search | arXiv e-print repository

arXiv:2406.19630 [pdf, other]

Optimal Video Compression using Pixel Shift Tracking

Authors: Hitesh Saai Mananchery Panneerselvam, Smit Anand

Abstract: The Video comprises approximately ~85\% of all internet traffic, but video encoding/compression is being historically done with hard coded rules, which has worked well but only to a certain limit. We have seen a surge in video compression algorithms using ML-based models in the last few years and many of them have outperformed several legacy codecs. The models range from encoding video end to end… ▽ More The Video comprises approximately ~85\% of all internet traffic, but video encoding/compression is being historically done with hard coded rules, which has worked well but only to a certain limit. We have seen a surge in video compression algorithms using ML-based models in the last few years and many of them have outperformed several legacy codecs. The models range from encoding video end to end using an ML approach or replacing some intermediate steps in legacy codecs using ML models to increase the efficiency of those steps. Optimizing video storage is an essential aspect of video processing, so we are proposing one of the possible approaches to achieve it is by avoiding redundant data at each frame. In this paper, we want to introduce the approach of redundancies removal in subsequent frames for a given video as a main approach for video compression. We call this method Redundancy Removal using Shift (R\textsuperscript2S). This method can be utilized across various Machine Learning model algorithms, and make the compression more accessible and adaptable. In this study, we have utilized a computer vision-based pixel point tracking method to identify redundant pixels to encode video for optimal storage. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.02577 [pdf, other]

Are PPO-ed Language Models Hackable?

Authors: Suraj Anand, David Getzen

Abstract: Numerous algorithms have been proposed to $\textit{align}$ language models to remove undesirable behaviors. However, the challenges associated with a very large state space and creating a proper reward function often result in various jailbreaks. Our paper aims to examine this effect of reward in the controlled setting of positive sentiment language generation. Instead of online training of a rewa… ▽ More Numerous algorithms have been proposed to $\textit{align}$ language models to remove undesirable behaviors. However, the challenges associated with a very large state space and creating a proper reward function often result in various jailbreaks. Our paper aims to examine this effect of reward in the controlled setting of positive sentiment language generation. Instead of online training of a reward model based on human feedback, we employ a statically learned sentiment classifier. We also consider a setting where our model's weights and activations are exposed to an end-user after training. We examine a pretrained GPT-2 through the lens of mechanistic interpretability before and after proximal policy optimization (PPO) has been applied to promote positive sentiment responses. Using these insights, we (1) attempt to "hack" the PPO-ed model to generate negative sentiment responses and (2) add a term to the reward function to try and alter `negative' weights. △ Less

Submitted 28 May, 2024; originally announced June 2024.

Comments: 8 pages, 4 figures

arXiv:2406.00053 [pdf, other]

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

Authors: Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick

Abstract: Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context. This contrasts with in-weights learning, where information is statically encoded in model parameters from iterated observations of the data. Despite this apparent ability to learn in-context, language models are known to struggle when faced with unseen or rarely s… ▽ More Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context. This contrasts with in-weights learning, where information is statically encoded in model parameters from iterated observations of the data. Despite this apparent ability to learn in-context, language models are known to struggle when faced with unseen or rarely seen tokens. Hence, we study $\textbf{structural in-context learning}$, which we define as the ability of a model to execute in-context learning on arbitrary tokens -- so called because the model must generalize on the basis of e.g. sentence structure or task structure, rather than semantic content encoded in token embeddings. An ideal model would be able to do both: flexibly deploy in-weights operations (in order to robustly accommodate ambiguous or unknown contexts using encoded semantic information) and structural in-context operations (in order to accommodate novel tokens). We study structural in-context algorithms in a simple part-of-speech setting using both practical and toy models. We find that active forgetting, a technique that was recently introduced to help models generalize to new languages, forces models to adopt structural in-context learning solutions. Finally, we introduce $\textbf{temporary forgetting}$, a straightforward extension of active forgetting that enables one to control how much a model relies on in-weights vs. in-context solutions. Importantly, temporary forgetting allows us to induce a $\textit{dual process strategy}$ where in-context and in-weights solutions coexist within a single model. △ Less

Submitted 1 July, 2024; v1 submitted 28 May, 2024; originally announced June 2024.

Comments: 9 pages, 5 figures

arXiv:2405.07105 [pdf, other]

Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning

Authors: Bowen Deng, Yunyeong Choi, Peichen Zhong, Janosh Riebesell, Shashwat Anand, Zhuohan Li, KyuJung Jun, Kristin A. Persson, Gerbrand Ceder

Abstract: Machine learning interatomic potentials (MLIPs) have introduced a new paradigm for atomic simulations. Recent advancements have seen the emergence of universal MLIPs (uMLIPs) that are pre-trained on diverse materials datasets, providing opportunities for both ready-to-use universal force fields and robust foundations for downstream machine learning refinements. However, their performance in extrap… ▽ More Machine learning interatomic potentials (MLIPs) have introduced a new paradigm for atomic simulations. Recent advancements have seen the emergence of universal MLIPs (uMLIPs) that are pre-trained on diverse materials datasets, providing opportunities for both ready-to-use universal force fields and robust foundations for downstream machine learning refinements. However, their performance in extrapolating to out-of-distribution complex atomic environments remains unclear. In this study, we highlight a consistent potential energy surface (PES) softening effect in three uMLIPs: M3GNet, CHGNet, and MACE-MP-0, which is characterized by energy and force under-prediction in a series of atomic-modeling benchmarks including surfaces, defects, solid-solution energetics, phonon vibration modes, ion migration barriers, and general high-energy states. We find that the PES softening behavior originates from a systematic underprediction error of the PES curvature, which derives from the biased sampling of near-equilibrium atomic arrangements in uMLIP pre-training datasets. We demonstrate that the PES softening issue can be effectively rectified by fine-tuning with a single additional data point. Our findings suggest that a considerable fraction of uMLIP errors are highly systematic, and can therefore be efficiently corrected. This result rationalizes the data-efficient fine-tuning performance boost commonly observed with foundational MLIPs. We argue for the importance of a comprehensive materials dataset with improved PES sampling for next-generation foundational MLIPs. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2402.10921 [pdf, other]

AM^2-EmoJE: Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding Learning

Authors: Naresh Kumar Devulapally, Sidharth Anand, Sreyasee Das Bhattacharjee, Junsong Yuan

Abstract: Human emotion can be presented in different modes i.e., audio, video, and text. However, the contribution of each mode in exhibiting each emotion is not uniform. Furthermore, the availability of complete mode-specific details may not always be guaranteed in the test time. In this work, we propose AM^2-EmoJE, a model for Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedd… ▽ More Human emotion can be presented in different modes i.e., audio, video, and text. However, the contribution of each mode in exhibiting each emotion is not uniform. Furthermore, the availability of complete mode-specific details may not always be guaranteed in the test time. In this work, we propose AM^2-EmoJE, a model for Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding Learning model that is grounded on two-fold contributions: First, a query adaptive fusion that can automatically learn the relative importance of its mode-specific representations in a query-specific manner. By this the model aims to prioritize the mode-invariant spatial query details of the emotion patterns, while also retaining its mode-exclusive aspects within the learned multimodal query descriptor. Second the multimodal joint embedding learning module that explicitly addresses various missing modality scenarios in test-time. By this, the model learns to emphasize on the correlated patterns across modalities, which may help align the cross-attended mode-specific descriptors pairwise within a joint-embedding space and thereby compensate for missing modalities during inference. By leveraging the spatio-temporal details at the dialogue level, the proposed AM^2-EmoJE not only demonstrates superior performance compared to the best-performing state-of-the-art multimodal methods, by effectively leveraging body language in place of face expression, it also exhibits an enhanced privacy feature. By reporting around 2-5% improvement in the weighted-F1 score, the proposed multimodal joint embedding module facilitates an impressive performance gain in a variety of missing-modality query scenarios during test time. △ Less

Submitted 26 January, 2024; originally announced February 2024.

arXiv:2402.07896 [pdf, other]

Suppressing Pink Elephants with Direct Principle Feedback

Authors: Louis Castricato, Nathan Lile, Suraj Anand, Hailey Schoelkopf, Siddharth Verma, Stella Biderman

Abstract: Existing methods for controlling language models, such as RLHF and Constitutional AI, involve determining which LLM behaviors are desirable and training them into a language model. However, in many cases, it is desirable for LLMs to be controllable at inference time, so that they can be used in multiple contexts with diverse needs. We illustrate this with the Pink Elephant Problem: instructing an… ▽ More Existing methods for controlling language models, such as RLHF and Constitutional AI, involve determining which LLM behaviors are desirable and training them into a language model. However, in many cases, it is desirable for LLMs to be controllable at inference time, so that they can be used in multiple contexts with diverse needs. We illustrate this with the Pink Elephant Problem: instructing an LLM to avoid discussing a certain entity (a ``Pink Elephant''), and instead discuss a preferred entity (``Grey Elephant''). We apply a novel simplification of Constitutional AI, Direct Principle Feedback, which skips the ranking of responses and uses DPO directly on critiques and revisions. Our results show that after DPF fine-tuning on our synthetic Pink Elephants dataset, our 13B fine-tuned LLaMA 2 model significantly outperforms Llama-2-13B-Chat and a prompted baseline, and performs as well as GPT-4 in on our curated test set assessing the Pink Elephant Problem. △ Less

Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: 8 pages, 6 figures

arXiv:2402.00468 [pdf, other]

RadDQN: a Deep Q Learning-based Architecture for Finding Time-efficient Minimum Radiation Exposure Pathway

Authors: Biswajit Sadhu, Trijit Sadhu, S. Anand

Abstract: Recent advancements in deep reinforcement learning (DRL) techniques have sparked its multifaceted applications in the automation sector. Managing complex decision-making problems with DRL encourages its use in the nuclear industry for tasks such as optimizing radiation exposure to the personnel during normal operating conditions and potential accidental scenarios. However, the lack of efficient re… ▽ More Recent advancements in deep reinforcement learning (DRL) techniques have sparked its multifaceted applications in the automation sector. Managing complex decision-making problems with DRL encourages its use in the nuclear industry for tasks such as optimizing radiation exposure to the personnel during normal operating conditions and potential accidental scenarios. However, the lack of efficient reward function and effective exploration strategy thwarted its implementation in the development of radiation-aware autonomous unmanned aerial vehicle (UAV) for achieving maximum radiation protection. Here, in this article, we address these intriguing issues and introduce a deep Q-learning based architecture (RadDQN) that operates on a radiation-aware reward function to provide time-efficient minimum radiation-exposure pathway in a radiation zone. We propose a set of unique exploration strategies that fine-tune the extent of exploration and exploitation based on the state-wise variation in radiation exposure during training. Further, we benchmark the predicted path with grid-based deterministic method. We demonstrate that the formulated reward function in conjugation with adequate exploration strategy is effective in handling several scenarios with drastically different radiation field distributions. When compared to vanilla DQN, our model achieves a superior convergence rate and higher training stability. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 12 pages, 7 main figures, code link (GitHub)

arXiv:2401.15164 [pdf, other]

AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in Group Conversations

Authors: Naresh Kumar Devulapally, Sidharth Anand, Sreyasee Das Bhattacharjee, Junsong Yuan, Yu-** Chang

Abstract: Analyzing individual emotions during group conversation is crucial in develo** intelligent agents capable of natural human-machine interaction. While reliable emotion recognition techniques depend on different modalities (text, audio, video), the inherent heterogeneity between these modalities and the dynamic cross-modal interactions influenced by an individual's unique behavioral patterns make… ▽ More Analyzing individual emotions during group conversation is crucial in develo** intelligent agents capable of natural human-machine interaction. While reliable emotion recognition techniques depend on different modalities (text, audio, video), the inherent heterogeneity between these modalities and the dynamic cross-modal interactions influenced by an individual's unique behavioral patterns make the task of emotion recognition very challenging. This difficulty is compounded in group settings, where the emotion and its temporal evolution are not only influenced by the individual but also by external contexts like audience reaction and context of the ongoing conversation. To meet this challenge, we propose a Multimodal Attention Network that captures cross-modal interactions at various levels of spatial abstraction by jointly learning its interactive bunch of mode-specific Peripheral and Central networks. The proposed MAN injects cross-modal attention via its Peripheral key-value pairs within each layer of a mode-specific Central query network. The resulting cross-attended mode-specific descriptors are then combined using an Adaptive Fusion technique that enables the model to integrate the discriminative and complementary mode-specific data patterns within an instance-specific multimodal descriptor. Given a dialogue represented by a sequence of utterances, the proposed AMuSE model condenses both spatial and temporal features into two dense descriptors: speaker-level and utterance-level. This helps not only in delivering better classification performance (3-5% improvement in Weighted-F1 and 5-7% improvement in Accuracy) in large-scale public datasets but also helps the users in understanding the reasoning behind each emotion prediction made by the model via its Multimodal Explainability Visualization module. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2312.00069 [pdf, other]

SICKLE: A Multi-Sensor Satellite Imagery Dataset Annotated with Multiple Key Crop** Parameters

Authors: Depanshu Sani, Sandeep Mahato, Sourabh Saini, Harsh Kumar Agarwal, Charu Chandra Devshali, Saket Anand, Gaurav Arora, Thiagarajan Jayaraman

Abstract: The availability of well-curated datasets has driven the success of Machine Learning (ML) models. Despite greater access to earth observation data in agriculture, there is a scarcity of curated and labelled datasets, which limits the potential of its use in training ML models for remote sensing (RS) in agriculture. To this end, we introduce a first-of-its-kind dataset called SICKLE, which constitu… ▽ More The availability of well-curated datasets has driven the success of Machine Learning (ML) models. Despite greater access to earth observation data in agriculture, there is a scarcity of curated and labelled datasets, which limits the potential of its use in training ML models for remote sensing (RS) in agriculture. To this end, we introduce a first-of-its-kind dataset called SICKLE, which constitutes a time-series of multi-resolution imagery from 3 distinct satellites: Landsat-8, Sentinel-1 and Sentinel-2. Our dataset constitutes multi-spectral, thermal and microwave sensors during January 2018 - March 2021 period. We construct each temporal sequence by considering the crop** practices followed by farmers primarily engaged in paddy cultivation in the Cauvery Delta region of Tamil Nadu, India; and annotate the corresponding imagery with key crop** parameters at multiple resolutions (i.e. 3m, 10m and 30m). Our dataset comprises 2,370 season-wise samples from 388 unique plots, having an average size of 0.38 acres, for classifying 21 crop types across 4 districts in the Delta, which amounts to approximately 209,000 satellite images. Out of the 2,370 samples, 351 paddy samples from 145 plots are annotated with multiple crop parameters; such as the variety of paddy, its growing season and productivity in terms of per-acre yields. Ours is also one among the first studies that consider the growing season activities pertinent to crop phenology (spans sowing, transplanting and harvesting dates) as parameters of interest. We benchmark SICKLE on three tasks: crop type, crop phenology (sowing, transplanting, harvesting), and yield prediction △ Less

Submitted 29 November, 2023; originally announced December 2023.

Comments: Accepted as an oral presentation at WACV 2024

arXiv:2311.04588 [pdf, other]

Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection

Authors: Akshit **dal, Vikram Goyal, Saket Anand, Chetan Arora

Abstract: Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the… ▽ More Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the pool of available data. Existing attack strategies utilize approaches like Active Learning and Semi-Supervised learning to minimize costs. However, in the black-box setting, these approaches may select sub-optimal samples as they train only one thief model. Depending on the thief model's capacity and the data it was pretrained on, the model might even select noisy samples that harm the learning process. In this work, we explore the usage of an ensemble of deep learning models as our thief model. We call our attack Army of Thieves(AOT) as we train multiple models with varying complexities to leverage the crowd's wisdom. Based on the ensemble's collective decision, uncertain samples are selected for querying, while the most confident samples are directly included in the training data. Our approach is the first one to utilize an ensemble of thief models to perform model extraction. We outperform the base approaches of existing state-of-the-art methods by at least 3% and achieve a 21% higher adversarial sample transferability than previous work for models trained on the CIFAR-10 dataset. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 10 pages, 5 figures, paper accepted to WACV 2024

arXiv:2309.13716 [pdf, other]

MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP

Authors: Prajwal Ganugula, Y S S S Santosh Kumar, N K Sagar Reddy, Prabhath Chellingi, Avinash Thakur, Neeraj Kasera, C Shyam Anand

Abstract: Style transfer driven by text prompts paved a new path for creatively stylizing the images without collecting an actual style image. Despite having promising results, with text-driven stylization, the user has no control over the stylization. If a user wants to create an artistic image, the user requires fine control over the stylization of various entities individually in the content image, which… ▽ More Style transfer driven by text prompts paved a new path for creatively stylizing the images without collecting an actual style image. Despite having promising results, with text-driven stylization, the user has no control over the stylization. If a user wants to create an artistic image, the user requires fine control over the stylization of various entities individually in the content image, which is not addressed by the current state-of-the-art approaches. On the other hand, diffusion style transfer methods also suffer from the same issue because the regional stylization control over the stylized output is ineffective. To address this problem, We propose a new method Multi-Object Segmented Arbitrary Stylization Using CLIP (MOSAIC), that can apply styles to different objects in the image based on the context extracted from the input prompt. Text-based segmentation and stylization modules which are based on vision transformer architecture, were used to segment and stylize the objects. Our method can extend to any arbitrary objects, styles and produce high-quality images compared to the current state of art methods. To our knowledge, this is the first attempt to perform text-guided arbitrary object-wise stylization. We demonstrate the effectiveness of our approach through qualitative and quantitative analysis, showing that it can generate visually appealing stylized images with enhanced control over stylization and the ability to generalize to unseen object classes. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: Camera ready, New Ideas in Vision Transformers workshop, ICCV 2023

arXiv:2309.05668 [pdf, other]

Studying the impacts of pre-training using ChatGPT-generated text on downstream tasks

Authors: Sarthak Anand

Abstract: In recent times, significant advancements have been witnessed in the field of language models, particularly with the emergence of Large Language Models (LLMs) that are trained on vast amounts of data extracted from internet archives. These LLMs, such as ChatGPT, have become widely accessible, allowing users to generate text for various purposes including articles, essays, jokes, and poetry. Given… ▽ More In recent times, significant advancements have been witnessed in the field of language models, particularly with the emergence of Large Language Models (LLMs) that are trained on vast amounts of data extracted from internet archives. These LLMs, such as ChatGPT, have become widely accessible, allowing users to generate text for various purposes including articles, essays, jokes, and poetry. Given that LLMs are trained on a diverse range of text sources, encompassing platforms like Reddit and Twitter, it is foreseeable that future training datasets will also incorporate text generated by previous iterations of the models themselves. In light of this development, our research aims to investigate the influence of artificial text in the pre-training phase of language models. Specifically, we conducted a comparative analysis between a language model, RoBERTa, pre-trained using CNN/DailyMail news articles, and ChatGPT, which employed the same articles for its training and evaluated their performance on three downstream tasks as well as their potential gender bias, using sentiment analysis as a metric. Through a series of experiments, we demonstrate that the utilization of artificial text during pre-training does not have a significant impact on either the performance of the models in downstream tasks or their gender bias. In conclusion, our findings suggest that the inclusion of text generated by LLMs in their own pre-training process does not yield substantial effects on the subsequent performance of the models in downstream tasks or their potential gender bias. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: Master's thesis

arXiv:2309.03294 [pdf, other]

MALITE: Lightweight Malware Detection and Classification for Constrained Devices

Authors: Sidharth Anand, Barsha Mitra, Soumyadeep Dey, Abhinav Rao, Rupsa Dhar, Jaideep Vaidya

Abstract: Today, malware is one of the primary cyberthreats to organizations. Malware has pervaded almost every type of computing device including the ones having limited memory, battery and computation power such as mobile phones, tablets and embedded devices like Internet-of-Things (IoT) devices. Consequently, the privacy and security of the malware infected systems and devices have been heavily jeopardiz… ▽ More Today, malware is one of the primary cyberthreats to organizations. Malware has pervaded almost every type of computing device including the ones having limited memory, battery and computation power such as mobile phones, tablets and embedded devices like Internet-of-Things (IoT) devices. Consequently, the privacy and security of the malware infected systems and devices have been heavily jeopardized. In recent years, researchers have leveraged machine learning based strategies for malware detection and classification. Malware analysis approaches can only be employed in resource constrained environments if the methods are lightweight in nature. In this paper, we present MALITE, a lightweight malware analysis system, that can classify various malware families and distinguish between benign and malicious binaries. MALITE converts a binary into a gray scale or an RGB image and employs low memory and battery power consuming as well as computationally inexpensive malware analysis strategies. We have designed MALITE-MN, a lightweight neural network based architecture and MALITE-HRF, an ultra lightweight random forest based method that uses histogram features extracted by a sliding window. We evaluate the performance of both on six publicly available datasets (Malimg, Microsoft BIG, Dumpware10, MOTIF, Drebin and CICAndMal2017), and compare them to four state-of-the-art malware classification techniques. The results show that MALITE-MN and MALITE-HRF not only accurately identify and classify malware but also respectively consume several orders of magnitude lower resources (in terms of both memory as well as computation capabilities), making them much more suitable for resource constrained environments. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2301.01667 [pdf, other]

Learning-based MPC from Big Data Using Reinforcement Learning

Authors: Shambhuraj Sawant, Akhil S Anand, Dirk Reinhardt, Sebastien Gros

Abstract: This paper presents an approach for learning Model Predictive Control (MPC) schemes directly from data using Reinforcement Learning (RL) methods. The state-of-the-art learning methods use RL to improve the performance of parameterized MPC schemes. However, these learning algorithms are often gradient-based methods that require frequent evaluations of computationally expensive MPC schemes, thereby… ▽ More This paper presents an approach for learning Model Predictive Control (MPC) schemes directly from data using Reinforcement Learning (RL) methods. The state-of-the-art learning methods use RL to improve the performance of parameterized MPC schemes. However, these learning algorithms are often gradient-based methods that require frequent evaluations of computationally expensive MPC schemes, thereby restricting their use on big datasets. We propose to tackle this issue by using tools from RL to learn a parameterized MPC scheme directly from data in an offline fashion. Our approach derives an MPC scheme without having to solve it over the collected dataset, thereby eliminating the computational complexity of existing techniques for big data. We evaluate the proposed method on three simulated experiments of varying complexity. △ Less

Submitted 4 January, 2023; originally announced January 2023.

arXiv:2212.08553 [pdf, other]

Is it Required? Ranking the Skills Required for a Job-Title

Authors: Sarthak Anand, Jens-Joris Decorte, Niels Lowie

Abstract: In this paper, we describe our method for ranking the skills required for a given job title. Our analysis shows that important/relevant skills appear more frequently in similar job titles. We train a Language-agnostic BERT Sentence Encoder (LaBSE) model to predict the importance of the skills using weak supervision. We show the model can learn the importance of skills and perform well in other lan… ▽ More In this paper, we describe our method for ranking the skills required for a given job title. Our analysis shows that important/relevant skills appear more frequently in similar job titles. We train a Language-agnostic BERT Sentence Encoder (LaBSE) model to predict the importance of the skills using weak supervision. We show the model can learn the importance of skills and perform well in other languages. Furthermore, we show how the Inverse Document Frequency factor of skill boosts the specialised skills. △ Less

Submitted 28 November, 2022; originally announced December 2022.

arXiv:2210.11546 [pdf, other]

Proof of Backhaul: Trustfree Measurement of Broadband Bandwidth

Authors: Peiyao Sheng, Nikita Yadav, Vishal Sevani, Arun Babu, SVR Anand, Himanshu Tyagi, Pramod Viswanath

Abstract: Recent years have seen the emergence of decentralized wireless networks consisting of nodes hosted by many individuals and small enterprises, reawakening the decades-old dream of open networking. These networks have been deployed in an organic, distributed manner and are driven by new economic models resting on tokenized incentives. A critical requirement for the incentives to scale is the ability… ▽ More Recent years have seen the emergence of decentralized wireless networks consisting of nodes hosted by many individuals and small enterprises, reawakening the decades-old dream of open networking. These networks have been deployed in an organic, distributed manner and are driven by new economic models resting on tokenized incentives. A critical requirement for the incentives to scale is the ability to prove network performance in a decentralized trustfree manner, i.e., a Byzantine fault tolerant network telemetry system. In this paper, we present a Proof of Backhaul (PoB) protocol which measures the bandwidth of the (broadband) backhaul link of a wireless access point, termed prover, in a decentralized and trustfree manner. In particular, our proposed protocol is the first one to satisfy the following two properties: (1) Trustfree. Bandwidth measurement is secure against Byzantine attacks by collaborations of challenge servers and the prover. (2) Open. The barrier-to-entry for being a challenge server is low; there is no requirement of having a low latency and high throughput path to the measured link. At a high-level, our protocol aggregates the challenge traffic from multiple challenge servers and uses cryptographic primitives to ensure that a subset of challengers or, even challengers and provers, cannot maliciously modify results in their favor. A formal security model allows us to establish guarantees of accurate bandwidth measurement as a function of the fraction of malicious actors. Our evaluation shows that our PoB protocol can verify backhaul bandwidth of up to 1000 Mbps with less than 8% error using measurements lasting only 100 ms. The measurement accuracy is not affected in the presence of corrupted challengers. Importantly, the basic verification protocol lends itself to a minor modification that can measure available bandwidth even in the presence of cross-traffic. △ Less

Submitted 20 October, 2022; originally announced October 2022.

arXiv:2210.06749 [pdf, other]

Reducing Annotation Effort by Identifying and Labeling Contextually Diverse Classes for Semantic Segmentation Under Domain Shift

Authors: Sharat Agarwal, Saket Anand, Chetan Arora

Abstract: In Active Domain Adaptation (ADA), one uses Active Learning (AL) to select a subset of images from the target domain, which are then annotated and used for supervised domain adaptation (DA). Given the large performance gap between supervised and unsupervised DA techniques, ADA allows for an excellent trade-off between annotation cost and performance. Prior art makes use of measures of uncertainty… ▽ More In Active Domain Adaptation (ADA), one uses Active Learning (AL) to select a subset of images from the target domain, which are then annotated and used for supervised domain adaptation (DA). Given the large performance gap between supervised and unsupervised DA techniques, ADA allows for an excellent trade-off between annotation cost and performance. Prior art makes use of measures of uncertainty or disagreement of models to identify `regions' to be annotated by the human oracle. However, these regions frequently comprise of pixels at object boundaries which are hard and tedious to annotate. Hence, even if the fraction of image pixels annotated reduces, the overall annotation time and the resulting cost still remain high. In this work, we propose an ADA strategy, which given a frame, identifies a set of classes that are hardest for the model to predict accurately, thereby recommending semantically meaningful regions to be annotated in a selected frame. We show that these set of `hard' classes are context-dependent and typically vary across frames, and when annotated help the model generalize better. We propose two ADA techniques: the Anchor-based and Augmentation-based approaches to select complementary and diverse regions in the context of the current training set. Our approach achieves 66.6 mIoU on GTA to Cityscapes dataset with an annotation budget of 4.7% in comparison to 64.9 mIoU by MADA using 5% of annotations. Our technique can also be used as a decorator for any existing frame-based AL technique, e.g., we report 1.5% performance improvement for CDAL on Cityscapes using our approach. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted WACV2023

arXiv:2209.12238 [pdf, other]

High-Resolution Satellite Imagery for Modeling the Impact of Aridification on Crop Production

Authors: Depanshu Sani, Sandeep Mahato, Parichya Sirohi, Saket Anand, Gaurav Arora, Charu Chandra Devshali, Thiagarajan Jayaraman, Harsh Kumar Agarwal

Abstract: The availability of well-curated datasets has driven the success of Machine Learning (ML) models. Despite the increased access to earth observation data for agriculture, there is a scarcity of curated, labelled datasets, which limits the potential of its use in training ML models for remote sensing (RS) in agriculture. To this end, we introduce a first-of-its-kind dataset, SICKLE, having time-seri… ▽ More The availability of well-curated datasets has driven the success of Machine Learning (ML) models. Despite the increased access to earth observation data for agriculture, there is a scarcity of curated, labelled datasets, which limits the potential of its use in training ML models for remote sensing (RS) in agriculture. To this end, we introduce a first-of-its-kind dataset, SICKLE, having time-series images at different spatial resolutions from 3 different satellites, annotated with multiple key crop** parameters for paddy cultivation for the Cauvery Delta region in Tamil Nadu, India. The dataset comprises of 2,398 season-wise samples from 388 unique plots distributed across 4 districts of the Delta. The dataset covers multi-spectral, thermal and microwave data between the time period January 2018-March 2021. The paddy samples are annotated with 4 key crop** parameters, i.e. sowing date, transplanting date, harvesting date and crop yield. This is one of the first studies to consider the growing season (using sowing and harvesting dates) as part of a dataset. We also propose a yield prediction strategy that uses time-series data generated based on the observed growing season and the standard seasonal information obtained from Tamil Nadu Agricultural University for the region. The consequent performance improvement highlights the impact of ML techniques that leverage domain knowledge that are consistent with standard practices followed by farmers in a specific region. We benchmark the dataset on 3 separate tasks, namely crop type, phenology date (sowing, transplanting, harvesting) and yield prediction, and develop an end-to-end framework for predicting key crop parameters in a real-world setting. △ Less

Submitted 25 September, 2022; originally announced September 2022.

Comments: Submitted as an End of Google AI4SG Workshop report

arXiv:2209.09614 [pdf, other]

doi 10.1016/j.robot.2023.104531

Deep Model Predictive Variable Impedance Control

Authors: Akhil S Anand, Fares J. Abu-Dakka, Jan Tommy Gravdahl

Abstract: The capability to adapt compliance by varying muscle stiffness is crucial for dexterous manipulation skills in humans. Incorporating compliance in robot motor control is crucial to performing real-world force interaction tasks with human-level dexterity. This work presents a Deep Model Predictive Variable Impedance Controller for compliant robotic manipulation which combines Variable Impedance Con… ▽ More The capability to adapt compliance by varying muscle stiffness is crucial for dexterous manipulation skills in humans. Incorporating compliance in robot motor control is crucial to performing real-world force interaction tasks with human-level dexterity. This work presents a Deep Model Predictive Variable Impedance Controller for compliant robotic manipulation which combines Variable Impedance Control with Model Predictive Control (MPC). A generalized Cartesian impedance model of a robot manipulator is learned using an exploration strategy maximizing the information gain. This model is used within an MPC framework to adapt the impedance parameters of a low-level variable impedance controller to achieve the desired compliance behavior for different manipulation tasks without any retraining or finetuning. The deep Model Predictive Variable Impedance Control approach is evaluated using a Franka Emika Panda robotic manipulator operating on different manipulation tasks in simulations and real experiments. The proposed approach was compared with model-free and model-based reinforcement approaches in variable impedance control for transferability between tasks and performance. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: Preprint submitted to the journal of robotics and autonomous systems

arXiv:2209.06354 [pdf, other]

Tuple Packing: Efficient Batching of Small Graphs in Graph Neural Networks

Authors: Mario Michael Krell, Manuel Lopez, Sreenidhi Anand, Hatem Helal, Andrew William Fitzgibbon

Abstract: When processing a batch of graphs in machine learning models such as Graph Neural Networks (GNN), it is common to combine several small graphs into one overall graph to accelerate processing and remove or reduce the overhead of padding. This is for example supported in the PyG library. However, the sizes of small graphs can vary substantially with respect to the number of nodes and edges, and henc… ▽ More When processing a batch of graphs in machine learning models such as Graph Neural Networks (GNN), it is common to combine several small graphs into one overall graph to accelerate processing and remove or reduce the overhead of padding. This is for example supported in the PyG library. However, the sizes of small graphs can vary substantially with respect to the number of nodes and edges, and hence the size of the combined graph can still vary considerably, especially for small batch sizes. Therefore, the costs of excessive padding and wasted compute are still incurred when working with static shapes, which are preferred for maximum acceleration. This paper proposes a new hardware agnostic approach -- tuple packing -- for generating batches that cause minimal overhead. The algorithm extends recently introduced sequence packing approaches to work on the 2D tuples of (|nodes|, |edges|). A monotone heuristic is applied to the 2D histogram of tuple values to define a priority for packing histogram bins together with the objective to reach a limit on the number of nodes as well as the number of edges. Experiments verify the effectiveness of the algorithm on multiple datasets. △ Less

Submitted 18 September, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

arXiv:2207.12646 [pdf, other]

Learning Hierarchy Aware Features for Reducing Mistake Severity

Authors: Ashima Garg, Depanshu Sani, Saket Anand

Abstract: Label hierarchies are often available apriori as part of biological taxonomy or language datasets WordNet. Several works exploit these to learn hierarchy aware features in order to improve the classifier to make semantically meaningful mistakes while maintaining or reducing the overall error. In this paper, we propose a novel approach for learning Hierarchy Aware Features (HAF) that leverages clas… ▽ More Label hierarchies are often available apriori as part of biological taxonomy or language datasets WordNet. Several works exploit these to learn hierarchy aware features in order to improve the classifier to make semantically meaningful mistakes while maintaining or reducing the overall error. In this paper, we propose a novel approach for learning Hierarchy Aware Features (HAF) that leverages classifiers at each level of the hierarchy that are constrained to generate predictions consistent with the label hierarchy. The classifiers are trained by minimizing a Jensen-Shannon Divergence with target soft labels obtained from the fine-grained classifiers. Additionally, we employ a simple geometric loss that constrains the feature space geometry to capture the semantic structure of the label space. HAF is a training time approach that improves the mistakes while maintaining top-1 error, thereby, addressing the problem of cross-entropy loss that treats all mistakes as equal. We evaluate HAF on three hierarchical datasets and achieve state-of-the-art results on the iNaturalist-19 and CIFAR-100 datasets. The source code is available at https://github.com/07Agarg/HAF △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: 21 pages, 7 figures, Accepted in ECCV 2022

arXiv:2207.02107 [pdf, other]

EasyABM: a lightweight and easy to use heterogeneous agent-based modelling tool written in Julia

Authors: Renu Solanki, Monisha Khanna, Shailly Anand, Anita Gulati, Prateek Kumar, Munendra Kumar, Dushyant Kumar

Abstract: Agent based modelling is a computational approach that aims to understand the behaviour of complex systems through simplified interactions of programmable objects in computer memory called agents. Agent based models (ABMs) are predominantly used in fields of biology, ecology, social sciences and economics where the systems of interest often consist of several interacting entities. In this work, we… ▽ More Agent based modelling is a computational approach that aims to understand the behaviour of complex systems through simplified interactions of programmable objects in computer memory called agents. Agent based models (ABMs) are predominantly used in fields of biology, ecology, social sciences and economics where the systems of interest often consist of several interacting entities. In this work, we present a Julia package EasyABM.jl for simplifying the process of studying agent based models. EasyABM.jl provides an intuitive and easy to understand functional approach for building and analysing agent based models. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: 18 pages, 7 figures

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2205.03104 [pdf, other]

Crop Type Identification for Smallholding Farms: Analyzing Spatial, Temporal and Spectral Resolutions in Satellite Imagery

Authors: Depanshu Sani, Sandeep Mahato, Parichya Sirohi, Saket Anand, Gaurav Arora, Charu Chandra Devshali, T. Jayaraman

Abstract: The integration of the modern Machine Learning (ML) models into remote sensing and agriculture has expanded the scope of the application of satellite images in the agriculture domain. In this paper, we present how the accuracy of crop type identification improves as we move from medium-spatiotemporal-resolution (MSTR) to high-spatiotemporal-resolution (HSTR) satellite images. We further demonstrat… ▽ More The integration of the modern Machine Learning (ML) models into remote sensing and agriculture has expanded the scope of the application of satellite images in the agriculture domain. In this paper, we present how the accuracy of crop type identification improves as we move from medium-spatiotemporal-resolution (MSTR) to high-spatiotemporal-resolution (HSTR) satellite images. We further demonstrate that high spectral resolution in satellite imagery can improve prediction performance for low spatial and temporal resolutions (LSTR) images. The F1-score is increased by 7% when using multispectral data of MSTR images as compared to the best results obtained from HSTR images. Similarly, when crop season based time series of multispectral data is used we observe an increase of 1.2% in the F1-score. The outcome motivates further advancements in the field of synthetic band generation. △ Less

Submitted 6 May, 2022; originally announced May 2022.

Comments: Supported by Google under AI4SG Workshop

arXiv:2201.08020 [pdf, other]

A Deep Learning Approach To Estimation Using Measurements Received Over a Network

Authors: Shivangi Agarwal, Sanjit K. Kaul, Saket Anand, P. B. Sujit

Abstract: We propose a novel deep neural network (DNN) based approximation architecture to learn estimates of measurements. We detail an algorithm that enables training of the DNN. The DNN estimator only uses measurements, if and when they are received over a communication network. The measurements are communicated over a network as packets, at a rate unknown to the estimator. Packets may suffer drops and n… ▽ More We propose a novel deep neural network (DNN) based approximation architecture to learn estimates of measurements. We detail an algorithm that enables training of the DNN. The DNN estimator only uses measurements, if and when they are received over a communication network. The measurements are communicated over a network as packets, at a rate unknown to the estimator. Packets may suffer drops and need retransmission. They may suffer waiting delays as they traverse a network path. Works on estimation often assume knowledge of the dynamic model of the measured system, which may not be available in practice. The DNN estimator doesn't assume knowledge of the dynamic system model or the communication network. It doesn't require a history of measurements, often used by other works. The DNN estimator results in significantly smaller average estimation error than the commonly used Time-varying Kalman Filter and the Unscented Kalman Filter, in simulations of linear and nonlinear dynamic systems. The DNN need not be trained separately for different communications network settings. It is robust to errors in estimation of network delays that occur due to imperfect time synchronization between the measurement source and the estimator. Last but not the least, our simulations shed light on the rate of updates that result in low estimation error. △ Less

Submitted 12 September, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2112.02721 [pdf, other]

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Authors: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, **ho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo , et al. (101 additional authors not shown)

Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split… ▽ More Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter). △ Less

Submitted 11 October, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

arXiv:2111.01223 [pdf, other]

A framework for causal segmentation analysis with machine learning in large-scale digital experiments

Authors: Nima S. Hejazi, Wen**g Zheng, Sathya Anand

Abstract: We present an end-to-end methodological framework for causal segment discovery that aims to uncover differential impacts of treatments across subgroups of users in large-scale digital experiments. Building on recent developments in causal inference and non/semi-parametric statistics, our approach unifies two objectives: (1) the discovery of user segments that stand to benefit from a candidate trea… ▽ More We present an end-to-end methodological framework for causal segment discovery that aims to uncover differential impacts of treatments across subgroups of users in large-scale digital experiments. Building on recent developments in causal inference and non/semi-parametric statistics, our approach unifies two objectives: (1) the discovery of user segments that stand to benefit from a candidate treatment based on subgroup-specific treatment effects, and (2) the evaluation of causal impacts of dynamically assigning units to a study's treatment arm based on their predicted segment-specific benefit or harm. Our proposal is model-agnostic, capable of incorporating state-of-the-art machine learning algorithms into the estimation procedure, and is applicable in randomized A/B tests and quasi-experiments. An open source R package implementation, sherlock, is introduced. △ Less

Submitted 1 November, 2021; originally announced November 2021.

Comments: Accepted by the 8th annual Conference on Digital Experimentation (CODE) at MIT

arXiv:2111.00164 [pdf, other]

HIERMATCH: Leveraging Label Hierarchies for Improving Semi-Supervised Learning

Authors: Ashima Garg, Shaurya Bagga, Yashvardhan Singh, Saket Anand

Abstract: Semi-supervised learning approaches have emerged as an active area of research to combat the challenge of obtaining large amounts of annotated data. Towards the goal of improving the performance of semi-supervised learning methods, we propose a novel framework, HIERMATCH, a semi-supervised approach that leverages hierarchical information to reduce labeling costs and performs as well as a vanilla s… ▽ More Semi-supervised learning approaches have emerged as an active area of research to combat the challenge of obtaining large amounts of annotated data. Towards the goal of improving the performance of semi-supervised learning methods, we propose a novel framework, HIERMATCH, a semi-supervised approach that leverages hierarchical information to reduce labeling costs and performs as well as a vanilla semi-supervised learning method. Hierarchical information is often available as prior knowledge in the form of coarse labels (e.g., woodpeckers) for images with fine-grained labels (e.g., downy woodpeckers or golden-fronted woodpeckers). However, the use of supervision using coarse category labels to improve semi-supervised techniques has not been explored. In the absence of fine-grained labels, HIERMATCH exploits the label hierarchy and uses coarse class labels as a weak supervisory signal. Additionally, HIERMATCH is a generic-approach to improve any semisupervised learning framework, we demonstrate this using our results on recent state-of-the-art techniques MixMatch and FixMatch. We evaluate the efficacy of HIERMATCH on two benchmark datasets, namely CIFAR-100 and NABirds. HIERMATCH can reduce the usage of fine-grained labels by 50% on CIFAR-100 with only a marginal drop of 0.59% in top-1 accuracy as compared to MixMatch. Code: https://github.com/07Agarg/HIERMATCH △ Less

Submitted 21 December, 2021; v1 submitted 29 October, 2021; originally announced November 2021.

Comments: 11 pages, 1 figure, Accepted in WACV 2022

arXiv:2110.10389 [pdf, other]

Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To Reduce Model Bias

Authors: Sharat Agarwal, Sumanyu Muku, Saket Anand, Chetan Arora

Abstract: Contextual information is a valuable cue for Deep Neural Networks (DNNs) to learn better representations and improve accuracy. However, co-occurrence bias in the training dataset may hamper a DNN model's generalizability to unseen scenarios in the real world. For example, in COCO, many object categories have a much higher co-occurrence with men compared to women, which can bias a DNN's prediction… ▽ More Contextual information is a valuable cue for Deep Neural Networks (DNNs) to learn better representations and improve accuracy. However, co-occurrence bias in the training dataset may hamper a DNN model's generalizability to unseen scenarios in the real world. For example, in COCO, many object categories have a much higher co-occurrence with men compared to women, which can bias a DNN's prediction in favor of men. Recent works have focused on task-specific training strategies to handle bias in such scenarios, but fixing the available data is often ignored. In this paper, we propose a novel and more generic solution to address the contextual bias in the datasets by selecting a subset of the samples, which is fair in terms of the co-occurrence with various classes for a protected attribute. We introduce a data repair algorithm using the coefficient of variation, which can curate fair and contextually balanced data for a protected class(es). This helps in training a fair model irrespective of the task, architecture or training methodology. Our proposed solution is simple, effective, and can even be used in an active learning setting where the data labels are not present or being generated incrementally. We demonstrate the effectiveness of our algorithm for the task of object detection and multi-label image classification across different datasets. Through a series of experiments, we validate that curating contextually fair data helps make model predictions fair by balancing the true positive rate for the protected class across groups without compromising on the model's overall performance. △ Less

Submitted 20 October, 2021; originally announced October 2021.

Comments: A variant of this report is accepted in WACV 2022

arXiv:2110.00555 [pdf, other]

doi 10.1109/CDC45484.2021.9683075

Design of multiplicative watermarking against covert attacks

Authors: Alexander J. Gallo, Sribalaji C. Anand, André M. H. Teixeira, Riccardo M. G. Ferrari

Abstract: This paper addresses the design of an active cyberattack detection architecture based on multiplicative watermarking, allowing for detection of covert attacks. We propose an optimal design problem, relying on the so-called output-to-output l2-gain, which characterizes the maximum gain between the residual output of a detection scheme and some performance output. Although optimal, this control prob… ▽ More This paper addresses the design of an active cyberattack detection architecture based on multiplicative watermarking, allowing for detection of covert attacks. We propose an optimal design problem, relying on the so-called output-to-output l2-gain, which characterizes the maximum gain between the residual output of a detection scheme and some performance output. Although optimal, this control problem is non-convex. Hence, we propose an algorithm to design the watermarking filters by solving the problem suboptimally via LMIs. We show that, against covert attacks, the output-to-output l2-gain is unbounded without watermarking, and we provide a sufficient condition for boundedness in the presence of watermarks. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: 6 page conference paper accepted to the 60th IEEE Conference on Decision and Control

Journal ref: 2021 60th IEEE Conference on Decision and Control (CDC), profs. of

arXiv:2106.03186 [pdf, other]

Reverse Engineering the Neural Tangent Kernel

Authors: James B. Simon, Sajant Anand, Michael R. DeWeese

Abstract: The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature… ▽ More The development of methods to guide the design of neural networks is an important open challenge for deep learning theory. As a paradigm for principled neural architecture design, we propose the translation of high-performing kernels, which are better-understood and amenable to first-principles design, into equivalent network architectures, which have superior efficiency, flexibility, and feature learning. To this end, we constructively prove that, with just an appropriate choice of activation function, any positive-semidefinite dot-product kernel can be realized as either the NNGP or neural tangent kernel of a fully-connected neural network with only one hidden layer. We verify our construction numerically and demonstrate its utility as a design tool for finite fully-connected networks in several experiments. △ Less

Submitted 13 August, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

Comments: 15 pages, 5 figures

arXiv:2105.07966 [pdf, other]

A Game-Theoretic Analysis of Competitive Editing in Wikipedia: Contributors' Effort to Influence Articles and the Community's Attempt to Ensure Neutrality

Authors: Santhanakrishnan Anand, Ofer Arazy, Narayan B. Mandayam, Oded Nov

Abstract: Peer production, such as the collaborative authoring of Wikipedia articles, involves both cooperation and competition between contributors, and we focus on the latter. As individuals, contributors compete to align Wikipedia articles with their personal perspectives. As a community, they work collectively to ensure a neutral point of view (NPOV). We study the interplay between individuals' competit… ▽ More Peer production, such as the collaborative authoring of Wikipedia articles, involves both cooperation and competition between contributors, and we focus on the latter. As individuals, contributors compete to align Wikipedia articles with their personal perspectives. As a community, they work collectively to ensure a neutral point of view (NPOV). We study the interplay between individuals' competition and the community's endeavor to ensure neutrality. We develop a two-level game-theoretic model, modeling the interactions of ownership-motivated individuals and neutrality-seeking communal mechanisms as a Stackelberg game. We present our model's predictions regarding the relation between contributors' effort (i.e. typical size of edit) and benefits (i.e. the portion of the article they eventually ``own''). We validate the model's prediction through an empirical analysis, by studying the interactions of 219,811 distinct contributors that co-produced 864 Wikipedia articles over a decade. The analysis and empirical results suggest that contributors who make large edits (``creators'') eventually lose the article's ownership to those who refine the articles and typically make smaller edits (``curators''). Whereas neutrality-seeking mechanisms are essential for ensuring that ownership is not concentrated within a small number of contributors, our findings suggest that the burden of excessive governance may deter contributors from participating. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: 37 pages, 6 figures, 1 Table

arXiv:2101.01693 [pdf, other]

COVID-19 Tests Gone Rogue: Privacy, Efficacy, Mismanagement and Misunderstandings

Authors: Manuel Morales, Rachel Barbar, Darshan Gandhi, Sanskruti Landage, Joseph Bae, Arpita Vats, Jil Kothari, Sheshank Shankar, Rohan Sukumaran, Himi Mathur, Krutika Misra, Aishwarya Saxena, Parth Patwa, Sethuraman T. V., Maurizio Arseni, Shailesh Advani, Kasia Jakimowicz, Sunaina Anand, Priyanshi Katiyar, Ashley Mehra, Rohan Iyer, Srinidhi Murali, Aryan Mahindra, Mikhail Dmitrienko, Saurish Srivastava , et al. (5 additional authors not shown)

Abstract: COVID-19 testing, the cornerstone for effective screening and identification of COVID-19 cases, remains paramount as an intervention tool to curb the spread of COVID-19 both at local and national levels. However, the speed at which the pandemic struck and the response was rolled out, the widespread impact on healthcare infrastructure, the lack of sufficient preparation within the public health sys… ▽ More COVID-19 testing, the cornerstone for effective screening and identification of COVID-19 cases, remains paramount as an intervention tool to curb the spread of COVID-19 both at local and national levels. However, the speed at which the pandemic struck and the response was rolled out, the widespread impact on healthcare infrastructure, the lack of sufficient preparation within the public health system, and the complexity of the crisis led to utter confusion among test-takers. Invasion of privacy remains a crucial concern. The user experience of test takers remains low. User friction affects user behavior and discourages participation in testing programs. Test efficacy has been overstated. Test results are poorly understood resulting in inappropriate follow-up recommendations. Herein, we review the current landscape of COVID-19 testing, identify four key challenges, and discuss the consequences of the failure to address these challenges. The current infrastructure around testing and information propagation is highly privacy-invasive and does not leverage scalable digital components. In this work, we discuss challenges complicating the existing covid-19 testing ecosystem and highlight the need to improve the testing experience for the user and reduce privacy invasions. Digital tools will play a critical role in resolving these challenges. △ Less

Submitted 7 May, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

Comments: 22 pages, 2 figures

arXiv:2012.01772 [pdf, other]

Digital Landscape of COVID-19 Testing: Challenges and Opportunities

Authors: Darshan Gandhi, Rohan Sukumaran, Priyanshi Katiyar, Alex Radunsky, Sunaina Anand, Shailesh Advani, Jil Kothari, Kasia Jakimowicz, Sheshank Shankar, Sethuraman T. V., Krutika Misra, Aishwarya Saxena, Sanskruti Landage, Richa Sonker, Parth Patwa, Aryan Mahindra, Mikhail Dmitrienko, Kanishka Vaish, Ashley Mehra, Srinidhi Murali, Rohan Iyer, Joseph Bae, Vivek Sharma, Abhishek Singh, Rachel Barbar , et al. (1 additional authors not shown)

Abstract: The COVID-19 Pandemic has left a devastating trail all over the world, in terms of loss of lives, economic decline, travel restrictions, trade deficit, and collapsing economy including real-estate, job loss, loss of health benefits, the decline in quality of access to care and services and overall quality of life. Immunization from the anticipated vaccines will not be the stand-alone guideline tha… ▽ More The COVID-19 Pandemic has left a devastating trail all over the world, in terms of loss of lives, economic decline, travel restrictions, trade deficit, and collapsing economy including real-estate, job loss, loss of health benefits, the decline in quality of access to care and services and overall quality of life. Immunization from the anticipated vaccines will not be the stand-alone guideline that will help surpass the pandemic and return to normalcy. Four pillars of effective public health intervention include diagnostic testing for both asymptomatic and symptomatic individuals, contact tracing, quarantine of individuals with symptoms or who are exposed to COVID-19, and maintaining strict hygiene standards at the individual and community level. Digital technology, currently being used for COVID-19 testing include certain mobile apps, web dashboards, and online self-assessment tools. Herein, we look into various digital solutions adapted by communities across universities, businesses, and other organizations. We summarize the challenges experienced using these tools in terms of quality of information, privacy, and user-centric issues. Despite numerous digital solutions available and being developed, many vary in terms of information being shared in terms of both quality and quantity, which can be overwhelming to the users. Understanding the testing landscape through a digital lens will give a clear insight into the multiple challenges that we face including data privacy, cost, and miscommunication. It is the destiny of digitalization to navigate testing for COVID-19. Block-chain based systems can be used for privacy preservation and ensuring ownership of the data to remain with the user. Another solution involves having digital health passports with relevant and correct information. In this early draft, we summarize the challenges and propose possible solutions to address the same. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: 28 pages, 4 figures

arXiv:2011.11228 [pdf, other]

Modeling Functional Similarity in Source Code with Graph-Based Siamese Networks

Authors: Nikita Mehrotra, Navdha Agarwal, Piyush Gupta, Saket Anand, David Lo, Rahul Purandare

Abstract: Code clones are duplicate code fragments that share (nearly) similar syntax or semantics. Code clone detection plays an important role in software maintenance, code refactoring, and reuse. A substantial amount of research has been conducted in the past to detect clones. A majority of these approaches use lexical and syntactic information to detect clones. However, only a few of them target semanti… ▽ More Code clones are duplicate code fragments that share (nearly) similar syntax or semantics. Code clone detection plays an important role in software maintenance, code refactoring, and reuse. A substantial amount of research has been conducted in the past to detect clones. A majority of these approaches use lexical and syntactic information to detect clones. However, only a few of them target semantic clones. Recently, motivated by the success of deep learning models in other fields, including natural language processing and computer vision, researchers have attempted to adopt deep learning techniques to detect code clones. These approaches use lexical information (tokens) and(or) syntactic structures like abstract syntax trees (ASTs) to detect code clones. However, they do not make sufficient use of the available structural and semantic information hence, limiting their capabilities. This paper addresses the problem of semantic code clone detection using program dependency graphs and geometric neural networks, leveraging the structured syntactic and semantic information. We have developed a prototype tool HOLMES, based on our novel approach, and empirically evaluated it on popular code clone benchmarks. Our results show that HOLMES performs considerably better than the other state-of-the-art tool, TBCCD. We also evaluated HOLMES on unseen projects and performed cross dataset experiments to assess the generalizability of HOLMES. Our results affirm that HOLMES outperforms TBCCD since most of the pairs that HOLMES detected were either undetected or suboptimally reported by TBCCD. △ Less

Submitted 25 November, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

Comments: Under Review IEEE Transactions on Software Engineering

arXiv:2009.02619 [pdf, other]

MIDAS at SemEval-2020 Task 10: Emphasis Selection using Label Distribution Learning and Contextual Embeddings

Authors: Sarthak Anand, Pradyumna Gupta, Hemant Yadav, Debanjan Mahata, Rakesh Gosangi, Haimin Zhang, Rajiv Ratn Shah

Abstract: This paper presents our submission to the SemEval 2020 - Task 10 on emphasis selection in written text. We approach this emphasis selection problem as a sequence labeling task where we represent the underlying text with various contextual embedding models. We also employ label distribution learning to account for annotator disagreements. We experiment with the choice of model architectures, traina… ▽ More This paper presents our submission to the SemEval 2020 - Task 10 on emphasis selection in written text. We approach this emphasis selection problem as a sequence labeling task where we represent the underlying text with various contextual embedding models. We also employ label distribution learning to account for annotator disagreements. We experiment with the choice of model architectures, trainability of layers, and different contextual embeddings. Our best performing architecture is an ensemble of different models, which achieved an overall matching score of 0.783, placing us 15th out of 31 participating teams. Lastly, we analyze the results in terms of parts of speech tags, sentence lengths, and word ordering. △ Less

Submitted 5 September, 2020; originally announced September 2020.

arXiv:2008.05723 [pdf, other]

Contextual Diversity for Active Learning

Authors: Sharat Agarwal, Himanshu Arora, Saket Anand, Chetan Arora

Abstract: Requirement of large annotated datasets restrict the use of deep convolutional neural networks (CNNs) for many practical applications. The problem can be mitigated by using active learning (AL) techniques which, under a given annotation budget, allow to select a subset of data that yields maximum accuracy upon fine tuning. State of the art AL approaches typically rely on measures of visual diversi… ▽ More Requirement of large annotated datasets restrict the use of deep convolutional neural networks (CNNs) for many practical applications. The problem can be mitigated by using active learning (AL) techniques which, under a given annotation budget, allow to select a subset of data that yields maximum accuracy upon fine tuning. State of the art AL approaches typically rely on measures of visual diversity or prediction uncertainty, which are unable to effectively capture the variations in spatial context. On the other hand, modern CNN architectures make heavy use of spatial context for achieving highly accurate predictions. Since the context is difficult to evaluate in the absence of ground-truth labels, we introduce the notion of contextual diversity that captures the confusion associated with spatially co-occurring classes. Contextual Diversity (CD) hinges on a crucial observation that the probability vector predicted by a CNN for a region of interest typically contains information from a larger receptive field. Exploiting this observation, we use the proposed CD measure within two AL frameworks: (1) a core-set based strategy and (2) a reinforcement learning based policy, for active frame selection. Our extensive empirical evaluation establish state of the art results for active learning on benchmark datasets of Semantic Segmentation, Object Detection and Image Classification. Our ablation studies show clear advantages of using contextual diversity for active learning. The source code and additional results are available at https://github.com/sharat29ag/CDAL. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: A variant of this report is accepted in ECCV 2020

arXiv:2006.10679 [pdf, other]

REGroup: Rank-aggregating Ensemble of Generative Classifiers for Robust Predictions

Authors: Lokender Tiwari, Anish Madan, Saket Anand, Subhashis Banerjee

Abstract: Deep Neural Networks (DNNs) are often criticized for being susceptible to adversarial attacks. Most successful defense strategies adopt adversarial training or random input transformations that typically require retraining or fine-tuning the model to achieve reasonable performance. In this work, our investigations of intermediate representations of a pre-trained DNN lead to an interesting discover… ▽ More Deep Neural Networks (DNNs) are often criticized for being susceptible to adversarial attacks. Most successful defense strategies adopt adversarial training or random input transformations that typically require retraining or fine-tuning the model to achieve reasonable performance. In this work, our investigations of intermediate representations of a pre-trained DNN lead to an interesting discovery pointing to intrinsic robustness to adversarial attacks. We find that we can learn a generative classifier by statistically characterizing the neural response of an intermediate layer to clean training samples. The predictions of multiple such intermediate-layer based classifiers, when aggregated, show unexpected robustness to adversarial attacks. Specifically, we devise an ensemble of these generative classifiers that rank-aggregates their predictions via a Borda count-based consensus. Our proposed approach uses a subset of the clean training data and a pre-trained model, and yet is agnostic to network architectures or the adversarial attack generation method. We show extensive experiments to establish that our defense strategy achieves state-of-the-art performance on the ImageNet validation set. △ Less

Submitted 24 November, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

Comments: WACV,2022. Project Page : https://lokender.github.io/REGroup.html

arXiv:2006.02413 [pdf, other]

DGSAC: Density Guided Sampling and Consensus

Authors: Lokender Tiwari, Saket Anand

Abstract: Robust multiple model fitting plays a crucial role in many computer vision applications. Unlike single model fitting problems, the multi-model fitting has additional challenges. The unknown number of models and the inlier noise scale are the two most important of them, which are in general provided by the user using ground-truth or some other auxiliary information. Mode seeking/ clustering-based a… ▽ More Robust multiple model fitting plays a crucial role in many computer vision applications. Unlike single model fitting problems, the multi-model fitting has additional challenges. The unknown number of models and the inlier noise scale are the two most important of them, which are in general provided by the user using ground-truth or some other auxiliary information. Mode seeking/ clustering-based approaches crucially depend on the quality of model hypotheses generated. While preference analysis based guided sampling approaches have shown remarkable performance, they operate in a time budget framework, and the user provides the time as a reasonable guess. In this paper, we deviate from the mode seeking and time budget framework. We propose a concept called Kernel Residual Density (KRD) and apply it to various components of a multiple-model fitting pipeline. The Kernel Residual Density act as a key differentiator between inliers and outliers. We use KRD to guide and automatically stop the sampling process. The sampling process stops after generating a set of hypotheses that can explain all the data points. An explanation score is maintained for each data point, which is updated on-the-fly. We propose two model selection algorithms, an optimal quadratic program based, and a greedy. Unlike mode seeking approaches, our model selection algorithms seek to find one representative hypothesis for each genuine structure present in the data. We evaluate our method (dubbed as DGSAC) on a wide variety of tasks like planar segmentation, motion segmentation, vanishing point estimation, plane fitting to 3D point cloud, line, and circle fitting, which shows the effectiveness of our method and its unified nature. △ Less

Submitted 3 June, 2020; originally announced June 2020.

Comments: Working article

arXiv:2005.06037 [pdf, other]

doi 10.1016/j.promfg.2020.05.141

Computer Vision Toolkit for Non-invasive Monitoring of Factory Floor Artifacts

Authors: Aditya M. Deshpande, Anil Kumar Telikicherla, Vinay Jakkali, David A. Wickelhaus, Manish Kumar, Sam Anand

Abstract: Digitization has led to smart, connected technologies be an integral part of businesses, governments and communities. For manufacturing digitization, there has been active research and development with a focus on Cloud Manufacturing (CM) and the Industrial Internet of Things (IIoT). This work presents a computer vision toolkit (CV Toolkit) for non-invasive digitization of the factory floor in line… ▽ More Digitization has led to smart, connected technologies be an integral part of businesses, governments and communities. For manufacturing digitization, there has been active research and development with a focus on Cloud Manufacturing (CM) and the Industrial Internet of Things (IIoT). This work presents a computer vision toolkit (CV Toolkit) for non-invasive digitization of the factory floor in line with Industry 4.0 requirements for factory data collection. Currently, technical challenges persist towards digitization of legacy systems due to the limitation for changes in their design and sensors. This novel toolkit is developed to facilitate easy integration of legacy production machinery and factory floor artifacts with the digital and smart manufacturing environment with no requirement of any physical changes in the machines. The system developed is modular, and allows real-time monitoring of production machinery. Modularity aspect allows the incorporation of new software applications in the current framework of CV Toolkit. To allow connectivity of this toolkit with manufacturing floors in a simple, deployable and cost-effective manner, the toolkit is integrated with a known manufacturing data standard, MTConnect, to "translate" the digital inputs into data streams that can be read by commercial status tracking and reporting software solutions. The proposed toolkit is demonstrated using a mock-panel environment developed in house at the University of Cincinnati to highlight its usability. △ Less

Submitted 12 May, 2020; originally announced May 2020.

Comments: Accepted for publication in 48th SME North American Manufacturing Research Conference (NAMRC48)

Journal ref: Procedia Manufacturing 48 (2020) 1020-1028

arXiv:2005.02936 [pdf, other]

GraCIAS: Grassmannian of Corrupted Images for Adversarial Security

Authors: Ankita Shukla, Pavan Turaga, Saket Anand

Abstract: Input transformation based defense strategies fall short in defending against strong adversarial attacks. Some successful defenses adopt approaches that either increase the randomness within the applied transformations, or make the defense computationally intensive, making it substantially more challenging for the attacker. However, it limits the applicability of such defenses as a pre-processing… ▽ More Input transformation based defense strategies fall short in defending against strong adversarial attacks. Some successful defenses adopt approaches that either increase the randomness within the applied transformations, or make the defense computationally intensive, making it substantially more challenging for the attacker. However, it limits the applicability of such defenses as a pre-processing step, similar to computationally heavy approaches that use retraining and network modifications to achieve robustness to perturbations. In this work, we propose a defense strategy that applies random image corruptions to the input image alone, constructs a self-correlation based subspace followed by a projection operation to suppress the adversarial perturbation. Due to its simplicity, the proposed defense is computationally efficient as compared to the state-of-the-art, and yet can withstand huge perturbations. Further, we develop proximity relationships between the projection operator of a clean image and of its adversarially perturbed version, via bounds relating geodesic distance on the Grassmannian to matrix Frobenius norms. We empirically show that our strategy is complementary to other weak defenses like JPEG compression and can be seamlessly integrated with them to create a stronger defense. We present extensive experiments on the ImageNet dataset across four different models namely InceptionV3, ResNet50, VGG16 and MobileNet models with perturbation magnitude set to ε = 16. Unlike state-of-the-art approaches, even without any retraining, the proposed strategy achieves an absolute improvement of ~ 4.5% in defense accuracy on ImageNet. △ Less

Submitted 7 May, 2020; v1 submitted 6 May, 2020; originally announced May 2020.

Comments: 16 pages

arXiv:2005.02905 [pdf, other]

doi 10.1007/978-3-319-71273-4_3

Automatic Detection and Recognition of Individuals in Patterned Species

Authors: Gullal Singh Cheema, Saket Anand

Abstract: Visual animal biometrics is rapidly gaining popularity as it enables a non-invasive and cost-effective approach for wildlife monitoring applications. Widespread usage of camera traps has led to large volumes of collected images, making manual processing of visual content hard to manage. In this work, we develop a framework for automatic detection and recognition of individuals in different pattern… ▽ More Visual animal biometrics is rapidly gaining popularity as it enables a non-invasive and cost-effective approach for wildlife monitoring applications. Widespread usage of camera traps has led to large volumes of collected images, making manual processing of visual content hard to manage. In this work, we develop a framework for automatic detection and recognition of individuals in different patterned species like tigers, zebras and jaguars. Most existing systems primarily rely on manual input for localizing the animal, which does not scale well to large datasets. In order to automate the detection process while retaining robustness to blur, partial occlusion, illumination and pose variations, we use the recently proposed Faster-RCNN object detection framework to efficiently detect animals in images. We further extract features from AlexNet of the animal's flank and train a logistic regression (or Linear SVM) classifier to recognize the individuals. We primarily test and evaluate our framework on a camera trap tiger image dataset that contains images that vary in overall image quality, animal pose, scale and lighting. We also evaluate our recognition system on zebra and jaguar images to show generalization to other patterned species. Our framework gives perfect detection results in camera trapped tiger images and a similar or better individual recognition performance when compared with state-of-the-art recognition techniques. △ Less

Submitted 6 May, 2020; originally announced May 2020.

Comments: 12 pages, ECML-PKDD 2017

arXiv:2004.14418 [pdf]

To Reduce Gross NPA and Classify Defaulters Using Shannon Entropy

Authors: Ambarish Moharil, Nikhil Sonavane, Chirag Kedia, Mansimran Singh Anand

Abstract: Non Performing Asset(NPA) has been in a serious attention by banks over the past few years. NPA cause a huge loss to the banks hence it becomes an extremely critical step in deciding which loans have the capabilities to become an NPA and thereby deciding which loans to grant and which ones to reject. In this paper which focuses on the exact crux of the matter we have proposed an algorithm which is… ▽ More Non Performing Asset(NPA) has been in a serious attention by banks over the past few years. NPA cause a huge loss to the banks hence it becomes an extremely critical step in deciding which loans have the capabilities to become an NPA and thereby deciding which loans to grant and which ones to reject. In this paper which focuses on the exact crux of the matter we have proposed an algorithm which is designed to handle the financial data very meticulously to predict with a very high accuracy whether a particular loan would be classified as a NPA in future or not. Instead of the conventional less accurate classifiers used to decide which loans can turn to be NPA we build our own classifier model using Entropy as the base. We have created an entropy based classifier using Shannon Entropy. The classifier model categorizes our data points in two categories accepted or rejected. We make use of local entropy and global entropy to help us determine the output. The entropy classifier model is then compared with existing classifiers used to predict NPAs thereby giving us an idea about the performance. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: 11 Pages, 7 Figures

arXiv:2004.11296 [pdf]

Edge Detection using Stationary Wavelet Transform, HMM, and EM algorithm

Authors: S. Anand, K. Nagajothi, K. Nithya

Abstract: Stationary Wavelet Transform (SWT) is an efficient tool for edge analysis. This paper a new edge detection technique using SWT based Hidden Markov Model (WHMM) along with the expectation-maximization (EM) algorithm is proposed. The SWT coefficients contain a hidden state and they indicate the SWT coefficient fits into an edge model or not. Laplacian and Gaussian model is used to check the informat… ▽ More Stationary Wavelet Transform (SWT) is an efficient tool for edge analysis. This paper a new edge detection technique using SWT based Hidden Markov Model (WHMM) along with the expectation-maximization (EM) algorithm is proposed. The SWT coefficients contain a hidden state and they indicate the SWT coefficient fits into an edge model or not. Laplacian and Gaussian model is used to check the information of the state is an edge or no edge. This model is trained by an EM algorithm and the Viterbi algorithm is employed to recover the state. This algorithm can be applied to noisy images efficiently. △ Less

Submitted 23 April, 2020; originally announced April 2020.

Comments: 07 pages, 5 figures

arXiv:2004.10681 [pdf, other]

Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction

Authors: Lokender Tiwari, Pan Ji, Quoc-Huy Tran, Bingbing Zhuang, Saket Anand, Manmohan Chandraker

Abstract: Classical monocular Simultaneous Localization And Map** (SLAM) and the recently emerging convolutional neural networks (CNNs) for monocular depth prediction represent two largely disjoint approaches towards building a 3D map of the surrounding environment. In this paper, we demonstrate that the coupling of these two by leveraging the strengths of each mitigates the other's shortcomings. Specific… ▽ More Classical monocular Simultaneous Localization And Map** (SLAM) and the recently emerging convolutional neural networks (CNNs) for monocular depth prediction represent two largely disjoint approaches towards building a 3D map of the surrounding environment. In this paper, we demonstrate that the coupling of these two by leveraging the strengths of each mitigates the other's shortcomings. Specifically, we propose a joint narrow and wide baseline based self-improving framework, where on the one hand the CNN-predicted depth is leveraged to perform pseudo RGB-D feature-based SLAM, leading to better accuracy and robustness than the monocular RGB SLAM baseline. On the other hand, the bundle-adjusted 3D scene structures and camera poses from the more principled geometric SLAM are injected back into the depth network through novel wide baseline losses proposed for improving the depth prediction network, which then continues to contribute towards better pose and 3D structure estimation in the next iteration. We emphasize that our framework only requires unlabeled monocular videos in both training and inference stages, and yet is able to outperform state-of-the-art self-supervised monocular and stereo depth prediction networks (e.g, Monodepth2) and feature-based monocular SLAM system (i.e, ORB-SLAM). Extensive experiments on KITTI and TUM RGB-D datasets verify the superiority of our self-improving geometry-CNN framework. △ Less

Submitted 7 August, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

Comments: ECCV 2020, Project Page: https://lokender.github.io/self-improving-SLAM.html

arXiv:2004.09632 [pdf, ps, other]

Intelligent Querying for Target Tracking in Camera Networks using Deep Q-Learning with n-Step Bootstrap**

Authors: Anil Sharma, Saket Anand, Sanjit K. Kaul

Abstract: Surveillance camera networks are a useful infrastructure for various visual analytics applications, where high-level inferences and predictions could be made based on target tracking across the network. Most multi-camera tracking works focus on target re-identification and trajectory association problems to track the target. However, since camera networks can generate enormous amount of video data… ▽ More Surveillance camera networks are a useful infrastructure for various visual analytics applications, where high-level inferences and predictions could be made based on target tracking across the network. Most multi-camera tracking works focus on target re-identification and trajectory association problems to track the target. However, since camera networks can generate enormous amount of video data, inefficient schemes for making re-identification or trajectory association queries can incur prohibitively large computational requirements. In this paper, we address the problem of intelligent scheduling of re-identification queries in a multi-camera tracking setting. To this end, we formulate the target tracking problem in a camera network as an MDP and learn a reinforcement learning based policy that selects a camera for making a re-identification query. The proposed approach to camera selection does not assume the knowledge of the camera network topology but the resulting policy implicitly learns it. We have also shown that such a policy can be learnt directly from data. Using the NLPR MCT and the Duke MTMC multi-camera multi-target tracking benchmarks, we empirically show that the proposed approach substantially reduces the number of frames queried. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: Camera Selections for Target Tracking

arXiv:1911.03637 [pdf, ps, other]

Boundary-type Sets of Strong Product of Directed Graphs

Authors: Prasanth G. Narasimha-Shenoi, Bijo S Anand, Mary Shalet T J

Abstract: Let $D=(V,E)$ be a strongly connected digraph and let $u ,v\in V(D)$. The maximum distance $md (u,v)$ is defined as\\ $md(u,v)$=max\{$\overrightarrow{d}(u,v), \overrightarrow{d}(v,u)$\} where $\overrightarrow{d}(u,v)$ denote the length of a shortest directed $u-v$ path in $D$. This is a metric. The boundary, contour, eccentric and peripheral sets of a strong digraph $D$ with respect to this metric… ▽ More Let $D=(V,E)$ be a strongly connected digraph and let $u ,v\in V(D)$. The maximum distance $md (u,v)$ is defined as\\ $md(u,v)$=max\{$\overrightarrow{d}(u,v), \overrightarrow{d}(v,u)$\} where $\overrightarrow{d}(u,v)$ denote the length of a shortest directed $u-v$ path in $D$. This is a metric. The boundary, contour, eccentric and peripheral sets of a strong digraph $D$ with respect to this metric have been defined, and the above said metrically defined sets of a large strong digraph $D$ have been investigated in terms of the factors in its prime factor decomposition with respect to Cartesian product. In this paper we investigate about the above boundary-type sets of a strong digraph $D$ in terms of the factors in its prime factor decomposition with respect to strong product. △ Less

Submitted 9 November, 2019; originally announced November 2019.

arXiv:1910.08292 [pdf, other]

Diversity in Fashion Recommendation using Semantic Parsing

Authors: Sagar Verma, Sukhad Anand, Chetan Arora, Atul Rai

Abstract: Develo** recommendation system for fashion images is challenging due to the inherent ambiguity associated with what criterion a user is looking at. Suggesting multiple images where each output image is similar to the query image on the basis of a different feature or part is one way to mitigate the problem. Existing works for fashion recommendation have used Siamese or Triplet network to learn f… ▽ More Develo** recommendation system for fashion images is challenging due to the inherent ambiguity associated with what criterion a user is looking at. Suggesting multiple images where each output image is similar to the query image on the basis of a different feature or part is one way to mitigate the problem. Existing works for fashion recommendation have used Siamese or Triplet network to learn features between a similar pair and a similar-dissimilar triplet respectively. However, these methods do not provide basic information such as, how two clothing images are similar, or which parts present in the two images make them similar. In this paper, we propose to recommend images by explicitly learning and exploiting part based similarity. We propose a novel approach of learning discriminative features from weakly-supervised data by using visual attention over the parts and a texture encoding network. We show that the learned features surpass the state-of-the-art in retrieval task on DeepFashion dataset. We then use the proposed model to recommend fashion images having an explicit variation with respect to similarity of any of the parts. △ Less

Submitted 18 October, 2019; originally announced October 2019.

Comments: 5 pages, ICIP2018, code: https://github.com/sagarverma/fashion_recommendation_stlstm

arXiv:1907.09554 [pdf, other]

Product of Orthogonal Spheres Parameterization for Disentangled Representation Learning

Authors: Ankita Shukla, Sarthak Bhagat, Shagun Uppal, Saket Anand, Pavan Turaga

Abstract: Learning representations that can disentangle explanatory attributes underlying the data improves interpretabilty as well as provides control on data generation. Various learning frameworks such as VAEs, GANs and auto-encoders have been used in the literature to learn such representations. Most often, the latent space is constrained to a partitioned representation or structured by a prior to impos… ▽ More Learning representations that can disentangle explanatory attributes underlying the data improves interpretabilty as well as provides control on data generation. Various learning frameworks such as VAEs, GANs and auto-encoders have been used in the literature to learn such representations. Most often, the latent space is constrained to a partitioned representation or structured by a prior to impose disentangling. In this work, we advance the use of a latent representation based on a product space of Orthogonal Spheres PrOSe. The PrOSe model is motivated by the reasoning that latent-variables related to the physics of image-formation can under certain relaxed assumptions lead to spherical-spaces. Orthogonality between the spheres is motivated via physical independence models. Imposing the orthogonal-sphere constraint is much simpler than other complicated physical models, is fairly general and flexible, and extensible beyond the factors used to motivate its development. Under further relaxed assumptions of equal-sized latent blocks per factor, the constraint can be written down in closed form as an ortho-normality term in the loss function. We show that our approach improves the quality of disentanglement significantly. We find consistent improvement in disentanglement compared to several state-of-the-art approaches, across several benchmarks and metrics. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: Accepted at British Machine Vision Conference (BMVC) 2019

arXiv:1907.05972 [pdf, other]

Motion Sensor-based Privacy Attack on Smartphones

Authors: S Abhishek Anand, Chen Wang, Jian Liu, Nitesh Saxena, Yingying Chen

Abstract: In this paper, we build a speech privacy attack that exploits speech reverberations generated from a smartphone's in-built loudspeaker captured via a zero-permission motion sensor (accelerometer). We design our attack Spearphone2, and demonstrate that speech reverberations from inbuilt loudspeakers, at an appropriate loudness, can impact the accelerometer, leaking sensitive information about the s… ▽ More In this paper, we build a speech privacy attack that exploits speech reverberations generated from a smartphone's in-built loudspeaker captured via a zero-permission motion sensor (accelerometer). We design our attack Spearphone2, and demonstrate that speech reverberations from inbuilt loudspeakers, at an appropriate loudness, can impact the accelerometer, leaking sensitive information about the speech. In particular, we show that by exploiting the affected accelerometer readings and carefully selecting feature sets along with off-the-shelf machine learning techniques, Spearphone can successfully perform gender classification (accuracy over 90%) and speaker identification (accuracy over 80%) for any audio/video playback on the smartphone. Our results with testing the attack on a voice call and voice assistant response were also encouraging, showcasing the impact of the proposed attack. In addition, we perform speech recognition and speech reconstruction to extract more information about the eavesdropped speech to an extent. Our work brings to light a fundamental design vulnerability in many currently-deployed smartphones, which may put people's speech privacy at risk while using the smartphone in the loudspeaker mode during phone calls, media playback or voice assistant interactions. △ Less

Submitted 19 October, 2020; v1 submitted 12 July, 2019; originally announced July 2019.

Comments: 15 pages, 25 figures

Showing 1–50 of 79 results for author: Anand, S