Search | arXiv e-print repository

Shesha: Multi-head Microarchitectural Leakage Discovery in new-generation Intel Processors

Authors: Anirban Chakraborty, Nimish Mishra, Debdeep Mukhopadhyay

Abstract: Transient execution attacks have been one of the widely explored microarchitectural side channels since the discovery of Spectre and Meltdown. However, much of the research has been driven by manual discovery of new transient paths through well-known speculative events. Although a few attempts exist in literature on automating transient leakage discovery, such tools focus on finding variants of kn… ▽ More Transient execution attacks have been one of the widely explored microarchitectural side channels since the discovery of Spectre and Meltdown. However, much of the research has been driven by manual discovery of new transient paths through well-known speculative events. Although a few attempts exist in literature on automating transient leakage discovery, such tools focus on finding variants of known transient attacks and explore a small subset of instruction set. Further, they take a random fuzzing approach that does not scale as the complexity of search space increases. In this work, we identify that the search space of bad speculation is disjointedly fragmented into equivalence classes, and then use this observation to develop a framework named Shesha, inspired by Particle Swarm Optimization, which exhibits faster convergence rates than state-of-the-art fuzzing techniques for automatic discovery of transient execution attacks. We then use Shesha to explore the vast search space of extensions to the x86 Instruction Set Architecture (ISAs), thereby focusing on previously unexplored avenues of bad speculation. As such, we report five previously unreported transient execution paths in Instruction Set Extensions (ISEs) on new generation of Intel processors. We then perform extensive reverse engineering of each of the transient execution paths and provide root-cause analysis. Using the discovered transient execution paths, we develop attack building blocks to exhibit exploitable transient windows. Finally, we demonstrate data leakage from Fused Multiply-Add instructions through SIMD buffer and extract victim data from various cryptographic implementations. △ Less

Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: USENIX Security Symposium, 2024

arXiv:2404.07172 [pdf, other]

A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial Networks

Authors: Neel Mishra, Bamdev Mishra, Pratik Jawanpuria, Pawan Kumar

Abstract: A novel first-order method is proposed for training generative adversarial networks (GANs). It modifies the Gauss-Newton method to approximate the min-max Hessian and uses the Sherman-Morrison inversion formula to calculate the inverse. The method corresponds to a fixed-point method that ensures necessary contraction. To evaluate its effectiveness, numerical experiments are conducted on various da… ▽ More A novel first-order method is proposed for training generative adversarial networks (GANs). It modifies the Gauss-Newton method to approximate the min-max Hessian and uses the Sherman-Morrison inversion formula to calculate the inverse. The method corresponds to a fixed-point method that ensures necessary contraction. To evaluate its effectiveness, numerical experiments are conducted on various datasets commonly used in image generation tasks, such as MNIST, Fashion MNIST, CIFAR10, FFHQ, and LSUN. Our method is capable of generating high-fidelity images with greater diversity across multiple datasets. It also achieves the highest inception score for CIFAR10 among all compared methods, including state-of-the-art second-order methods. Additionally, its execution time is comparable to that of first-order min-max methods. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: accepted in IJCNN 2023, 9 pages

arXiv:2403.04114 [pdf, other]

Closing the Visual Sim-to-Real Gap with Object-Composable NeRFs

Authors: Nikhil Mishra, Maximilian Sieb, Pieter Abbeel, Xi Chen

Abstract: Deep learning methods for perception are the cornerstone of many robotic systems. Despite their potential for impressive performance, obtaining real-world training data is expensive, and can be impractically difficult for some tasks. Sim-to-real transfer with domain randomization offers a potential workaround, but often requires extensive manual tuning and results in models that are brittle to dis… ▽ More Deep learning methods for perception are the cornerstone of many robotic systems. Despite their potential for impressive performance, obtaining real-world training data is expensive, and can be impractically difficult for some tasks. Sim-to-real transfer with domain randomization offers a potential workaround, but often requires extensive manual tuning and results in models that are brittle to distribution shift between sim and real. In this work, we introduce Composable Object Volume NeRF (COV-NeRF), an object-composable NeRF model that is the centerpiece of a real-to-sim pipeline for synthesizing training data targeted to scenes and objects from the real world. COV-NeRF extracts objects from real images and composes them into new scenes, generating photorealistic renderings and many types of 2D and 3D supervision, including depth maps, segmentation masks, and meshes. We show that COV-NeRF matches the rendering quality of modern NeRF methods, and can be used to rapidly close the sim-to-real gap across a variety of perceptual modalities. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: ICRA 2024

arXiv:2311.11462 [pdf, other]

LLM aided semi-supervision for Extractive Dialog Summarization

Authors: Nishant Mishra, Gaurav Sahu, Iacer Calixto, Ameen Abu-Hanna, Issam H. Laradji

Abstract: Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to f… ▽ More Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to fine-tune a chat summarization model, effectively transferring knowledge from the large LLM into a smaller specialized model. We demonstrate our method on the \tweetsumm dataset, and show that using 10% of the original labelled data set we can achieve 65.9/57.0/61.0 ROUGE-1/-2/-L, whereas the current state-of-the-art trained on the entire training data set obtains 65.16/55.81/64.37 ROUGE-1/-2/-L. In other words, in the worst case (i.e., ROUGE-L) we still effectively retain 94.7% of the performance while using only 10% of the data. △ Less

Submitted 23 November, 2023; v1 submitted 19 November, 2023; originally announced November 2023.

Comments: to be published in EMNLP Findings

arXiv:2310.05172 [pdf, other]

On the Amplification of Cache Occupancy Attacks in Randomized Cache Architectures

Authors: Anirban Chakraborty, Nimish Mishra, Sayandeep Saha, Sarani Bhattacharya, Debdeep Mukhopadhyay

Abstract: In this work, we explore the applicability of cache occupancy attacks and the implications of secured cache design rationales on such attacks. In particular, we show that one of the well-known cache randomization schemes, MIRAGE, touted to be resilient against eviction-based attacks, amplifies the chances of cache occupancy attack, making it more vulnerable compared to contemporary designs. We lev… ▽ More In this work, we explore the applicability of cache occupancy attacks and the implications of secured cache design rationales on such attacks. In particular, we show that one of the well-known cache randomization schemes, MIRAGE, touted to be resilient against eviction-based attacks, amplifies the chances of cache occupancy attack, making it more vulnerable compared to contemporary designs. We leverage MIRAGE's global eviction property to demonstrate covert channel with byte-level granularity, with far less cache occupancy requirement (just $10\%$ of LLC) than other schemes. For instance, ScatterCache (a randomisation scheme with lesser security guarantees than MIRAGE) and generic set-associative caches require $40\%$ and $30\%$ cache occupancy, respectively, to exhibit covert communication. Furthermore, we extend our attack vectors to include side-channel, template-based fingerprinting of workloads in a cross-core setting. We demonstrate the potency of such fingerprinting on both inhouse LLC simulator as well as on SPEC2017 workloads on gem5. Finally, we pinpoint implementation inconsistencies in MIRAGE's publicly available gem5 artifact which motivates a re-evaluation of the performance statistics of MIRAGE with respect to ScatterCache and baseline set-associative cache. We find MIRAGE, in reality, performs worse than what is previously reported in literature, a concern that should be addressed in successor generations of secured caches. △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2308.00091 [pdf, other]

Convolutional Occupancy Models for Dense Packing of Complex, Novel Objects

Authors: Nikhil Mishra, Pieter Abbeel, Xi Chen, Maximilian Sieb

Abstract: Dense packing in pick-and-place systems is an important feature in many warehouse and logistics applications. Prior work in this space has largely focused on planning algorithms in simulation, but real-world packing performance is often bottlenecked by the difficulty of perceiving 3D object geometry in highly occluded, partially observed scenes. In this work, we present a fully-convolutional shape… ▽ More Dense packing in pick-and-place systems is an important feature in many warehouse and logistics applications. Prior work in this space has largely focused on planning algorithms in simulation, but real-world packing performance is often bottlenecked by the difficulty of perceiving 3D object geometry in highly occluded, partially observed scenes. In this work, we present a fully-convolutional shape completion model, F-CON, which can be easily combined with off-the-shelf planning methods for dense packing in the real world. We also release a simulated dataset, COB-3D-v2, that can be used to train shape completion models for real-word robotics applications, and use it to demonstrate that F-CON outperforms other state-of-the-art shape completion methods. Finally, we equip a real-world pick-and-place system with F-CON, and demonstrate dense packing of complex, unseen objects in cluttered scenes. Across multiple planning methods, F-CON enables substantially better dense packing than other shape completion methods. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: In IROS 2023. Code and dataset are available at https://sites.google.com/view/fcon-packing/

arXiv:2307.01877 [pdf, other]

Fast Private Kernel Density Estimation via Locality Sensitive Quantization

Authors: Tal Wagner, Yonatan Naamad, Nina Mishra

Abstract: We study efficient mechanisms for differentially private kernel density estimation (DP-KDE). Prior work for the Gaussian kernel described algorithms that run in time exponential in the number of dimensions $d$. This paper breaks the exponential barrier, and shows how the KDE can privately be approximated in time linear in $d$, making it feasible for high-dimensional data. We also present improved… ▽ More We study efficient mechanisms for differentially private kernel density estimation (DP-KDE). Prior work for the Gaussian kernel described algorithms that run in time exponential in the number of dimensions $d$. This paper breaks the exponential barrier, and shows how the KDE can privately be approximated in time linear in $d$, making it feasible for high-dimensional data. We also present improved bounds for low-dimensional data. Our results are obtained through a general framework, which we term Locality Sensitive Quantization (LSQ), for constructing private KDE mechanisms where existing KDE approximation techniques can be applied. It lets us leverage several efficient non-private KDE methods -- like Random Fourier Features, the Fast Gauss Transform, and Locality Sensitive Hashing -- and ``privatize'' them in a black-box manner. Our experiments demonstrate that our resulting DP-KDE mechanisms are fast and accurate on large datasets in both high and low dimensions. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: ICML 2023

arXiv:2307.00897 [pdf, other]

Fixing confirmation bias in feature attribution methods via semantic match

Authors: Giovanni Cinà, Daniel Fernandez-Llaneza, Ludovico Deponte, Nishant Mishra, Tabea E. Röber, Sandro Pezzelle, Iacer Calixto, Rob Goedhart, Ş. İlker Birbil

Abstract: Feature attribution methods have become a staple method to disentangle the complex behavior of black box models. Despite their success, some scholars have argued that such methods suffer from a serious flaw: they do not allow a reliable interpretation in terms of human concepts. Simply put, visualizing an array of feature contributions is not enough for humans to conclude something about a model's… ▽ More Feature attribution methods have become a staple method to disentangle the complex behavior of black box models. Despite their success, some scholars have argued that such methods suffer from a serious flaw: they do not allow a reliable interpretation in terms of human concepts. Simply put, visualizing an array of feature contributions is not enough for humans to conclude something about a model's internal representations, and confirmation bias can trick users into false beliefs about model behavior. We argue that a structured approach is required to test whether our hypotheses on the model are confirmed by the feature attributions. This is what we call the "semantic match" between human concepts and (sub-symbolic) explanations. Building on the conceptual framework put forward in Cinà et al. [2023], we propose a structured approach to evaluate semantic match in practice. We showcase the procedure in a suite of experiments spanning tabular and image data, and show how the assessment of semantic match can give insight into both desirable (e.g., focusing on an object relevant for prediction) and undesirable model behaviors (e.g., focusing on a spurious correlation). We couple our experimental results with an analysis on the metrics to measure semantic match, and argue that this approach constitutes the first step towards resolving the issue of confirmation bias in XAI. △ Less

Submitted 26 February, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

arXiv:2306.06325 [pdf, other]

Explaining a machine learning decision to physicians via counterfactuals

Authors: Supriya Nagesh, Nina Mishra, Yonatan Naamad, James M. Rehg, Mehul A. Shah, Alexei Wagner

Abstract: Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system. However, the lack of explainability is a major roadblock to their adoption in hospitals. \textit{How can the decision of an ML model be explained to a physician?} The explanations considered in this paper are counterfactuals (CFs), hypothetical scenarios that would have resulte… ▽ More Machine learning models perform well on several healthcare tasks and can help reduce the burden on the healthcare system. However, the lack of explainability is a major roadblock to their adoption in hospitals. \textit{How can the decision of an ML model be explained to a physician?} The explanations considered in this paper are counterfactuals (CFs), hypothetical scenarios that would have resulted in the opposite outcome. Specifically, time-series CFs are investigated, inspired by the way physicians converse and reason out decisions `I would have given the patient a vasopressor if their blood pressure was lower and falling'. Key properties of CFs that are particularly meaningful in clinical settings are outlined: physiological plausibility, relevance to the task and sparse perturbations. Past work on CF generation does not satisfy these properties, specifically plausibility in that realistic time-series CFs are not generated. A variational autoencoder (VAE)-based approach is proposed that captures these desired properties. The method produces CFs that improve on prior approaches quantitatively (more plausible CFs as evaluated by their likelihood w.r.t original data distribution, and 100$\times$ faster at generating CFs) and qualitatively (2$\times$ more plausible and relevant) as evaluated by three physicians. △ Less

Submitted 9 June, 2023; originally announced June 2023.

arXiv:2305.01910 [pdf, other]

Distributional Instance Segmentation: Modeling Uncertainty and High Confidence Predictions with Latent-MaskRCNN

Authors: YuXuan Liu, Nikhil Mishra, Pieter Abbeel, Xi Chen

Abstract: Object recognition and instance segmentation are fundamental skills in any robotic or autonomous system. Existing state-of-the-art methods are often unable to capture meaningful uncertainty in challenging or ambiguous scenes, and as such can cause critical errors in high-performance applications. In this paper, we explore a class of distributional instance segmentation models using latent codes th… ▽ More Object recognition and instance segmentation are fundamental skills in any robotic or autonomous system. Existing state-of-the-art methods are often unable to capture meaningful uncertainty in challenging or ambiguous scenes, and as such can cause critical errors in high-performance applications. In this paper, we explore a class of distributional instance segmentation models using latent codes that can model uncertainty over plausible hypotheses of object masks. For robotic picking applications, we propose a confidence mask method to achieve the high precision necessary in industrial use cases. We show that our method can significantly reduce critical errors in robotic systems, including our newly released dataset of ambiguous scenes in a robotic application. On a real-world apparel-picking robot, our method significantly reduces double pick errors while maintaining high performance. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: In ICRA 2023. Code and dataset are available at https://segm.yuxuanliu.com/

arXiv:2304.10457 [pdf, other]

Angle based dynamic learning rate for gradient descent

Authors: Neel Mishra, Pawan Kumar

Abstract: In our work, we propose a novel yet simple approach to obtain an adaptive learning rate for gradient-based descent methods on classification tasks. Instead of the traditional approach of selecting adaptive learning rates via the decayed expectation of gradient-based terms, we use the angle between the current gradient and the new gradient: this new gradient is computed from the direction orthogona… ▽ More In our work, we propose a novel yet simple approach to obtain an adaptive learning rate for gradient-based descent methods on classification tasks. Instead of the traditional approach of selecting adaptive learning rates via the decayed expectation of gradient-based terms, we use the angle between the current gradient and the new gradient: this new gradient is computed from the direction orthogonal to the current gradient, which further helps us in determining a better adaptive learning rate based on angle history, thereby, leading to relatively better accuracy compared to the existing state-of-the-art optimizers. On a wide variety of benchmark datasets with prominent image classification architectures such as ResNet, DenseNet, EfficientNet, and VGG, we find that our method leads to the highest accuracy in most of the datasets. Moreover, we prove that our method is convergent. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 8 pages, 7 figures

arXiv:2210.07424 [pdf, other]

Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction

Authors: YuXuan Liu, Nikhil Mishra, Maximilian Sieb, Yide Shentu, Pieter Abbeel, Xi Chen

Abstract: 3D bounding boxes are a widespread intermediate representation in many computer vision applications. However, predicting them is a challenging task, largely due to partial observability, which motivates the need for a strong sense of uncertainty. While many recent methods have explored better architectures for consuming sparse and unstructured point cloud data, we hypothesize that there is room fo… ▽ More 3D bounding boxes are a widespread intermediate representation in many computer vision applications. However, predicting them is a challenging task, largely due to partial observability, which motivates the need for a strong sense of uncertainty. While many recent methods have explored better architectures for consuming sparse and unstructured point cloud data, we hypothesize that there is room for improvement in the modeling of the output distribution and explore how this can be achieved using an autoregressive prediction head. Additionally, we release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications, where 3D bounding box prediction has largely been underexplored. We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures, achieving strong results on SUN-RGBD, Scannet, KITTI, and our new dataset. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: In ECCV 2022. Code and dataset are available at https://bbox.yuxuanliu.com

arXiv:2108.12001 [pdf, other]

Understanding the Logit Distributions of Adversarially-Trained Deep Neural Networks

Authors: Landan Seguin, Anthony Ndirango, Neeli Mishra, SueYeon Chung, Tyler Lee

Abstract: Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks. Almost all defense strategies achieve this invariance through adversarial training i.e. training on inputs with adversarial perturbations. Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) mod… ▽ More Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks. Almost all defense strategies achieve this invariance through adversarial training i.e. training on inputs with adversarial perturbations. Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood. Motivated by a recent study on learning robustness without input perturbations by distilling an AT model, we explore what is learned during adversarial training by analyzing the distribution of logits in AT models. We identify three logit characteristics essential to learning adversarial robustness. First, we provide a theoretical justification for the finding that adversarial training shrinks two important characteristics of the logit distribution: the max logit values and the "logit gaps" (difference between the logit max and next largest values) are on average lower for AT models. Second, we show that AT and standard models differ significantly on which samples are high or low confidence, then illustrate clear qualitative differences by visualizing samples with the largest confidence difference. Finally, we find learning information about incorrect classes to be essential to learning robustness by manipulating the non-max logit information during distillation and measuring the impact on the student's robustness. Our results indicate that learning some adversarial robustness without input perturbations requires a model to learn specific sample-wise confidences and incorrect class orderings that follow complex distributions. △ Less

Submitted 26 August, 2021; originally announced August 2021.

Comments: 29 pages (13 main, 16 supplemental), 22 figures (5 main, 17 supplemental)

arXiv:2105.01975 [pdf, other]

Does elderly enjoy playing Bingo with a robot? A case study with the humanoid robot Nadine

Authors: Nidhi Mishra, Gauri Tulsulkar, Hanhui Li, Nadia Magnenat Thalmann

Abstract: There are considerable advancements in medical health care in recent years, resulting in rising older population. As the workforce for such a population is not kee** pace, there is an urgent need to address this problem. Having robots to stimulating recreational activities for older adults can reduce the workload for caretakers and give them time to address the emotional needs of the elderly. In… ▽ More There are considerable advancements in medical health care in recent years, resulting in rising older population. As the workforce for such a population is not kee** pace, there is an urgent need to address this problem. Having robots to stimulating recreational activities for older adults can reduce the workload for caretakers and give them time to address the emotional needs of the elderly. In this paper, we investigate the effects of the humanoid social robot Nadine as an activity host for the elderly. This study aims to analyse if the elderly feels comfortable and enjoy playing game/activity with the humanoid robot Nadine. We propose to evaluate this by placing Nadine humanoid social robot in a nursing home as a caretaker where she hosts bingo game. We record sessions with and without Nadine to understand the difference and acceptance of these two scenarios. We use computer vision methods to analyse the activities of the elderly to detect emotions and their involvement in the game. We envision that such humanoid robots will make recreational activities more readily available for the elderly. Our results present positive enforcement during recreational activity, Bingo, in the presence of Nadine. △ Less

Submitted 5 May, 2021; originally announced May 2021.

arXiv:2102.01441 [pdf, other]

Face Recognition using 3D CNNs

Authors: Nayaneesh Kumar Mishra, Satish Kumar Singh

Abstract: The area of face recognition is one of the most widely researched areas in the domain of computer vision and biometric. This is because, the non-intrusive nature of face biometric makes it comparatively more suitable for application in area of surveillance at public places such as airports. The application of primitive methods in face recognition could not give very satisfactory performance. Howev… ▽ More The area of face recognition is one of the most widely researched areas in the domain of computer vision and biometric. This is because, the non-intrusive nature of face biometric makes it comparatively more suitable for application in area of surveillance at public places such as airports. The application of primitive methods in face recognition could not give very satisfactory performance. However, with the advent of machine and deep learning methods and their application in face recognition, several major breakthroughs were obtained. The use of 2D Convolution Neural networks(2D CNN) in face recognition crossed the human face recognition accuracy and reached to 99%. Still, robust face recognition in the presence of real world conditions such as variation in resolution, illumination and pose is a major challenge for researchers in face recognition. In this work, we used video as input to the 3D CNN architectures for capturing both spatial and time domain information from the video for face recognition in real world environment. For the purpose of experimentation, we have developed our own video dataset called CVBL video dataset. The use of 3D CNN for face recognition in videos shows promising results with DenseNets performing the best with an accuracy of 97% on CVBL dataset. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: 15 pages, 4 figures, Data Science Book

arXiv:2102.01404 [pdf, other]

Face Recognition Using $Sf_{3}CNN$ With Higher Feature Discrimination

Authors: Nayaneesh Kumar Mishra, Satish Kumar Singh

Abstract: With the advent of 2-dimensional Convolution Neural Networks (2D CNNs), the face recognition accuracy has reached above 99%. However, face recognition is still a challenge in real world conditions. A video, instead of an image, as an input can be more useful to solve the challenges of face recognition in real world conditions. This is because a video provides more features than an image. However,… ▽ More With the advent of 2-dimensional Convolution Neural Networks (2D CNNs), the face recognition accuracy has reached above 99%. However, face recognition is still a challenge in real world conditions. A video, instead of an image, as an input can be more useful to solve the challenges of face recognition in real world conditions. This is because a video provides more features than an image. However, 2D CNNs cannot take advantage of the temporal features present in the video. We therefore, propose a framework called $Sf_{3}CNN$ for face recognition in videos. The $Sf_{3}CNN$ framework uses 3-dimensional Residual Network (3D Resnet) and A-Softmax loss for face recognition in videos. The use of 3D ResNet helps to capture both spatial and temporal features into one compact feature map. However, the 3D CNN features must be highly discriminative for efficient face recognition. The use of A-Softmax loss helps to extract highly discriminative features from the video for face recognition. $Sf_{3}CNN$ framework gives an increased accuracy of 99.10% on CVBL video database in comparison to the previous 97% on the same database using 3D ResNets. △ Less

Submitted 2 February, 2021; originally announced February 2021.

Comments: 6 pages, 3 figures, 1 table, 5th IAPR International Conference on Computer Vision & Image Processing 2020

arXiv:1909.11898 [pdf, other]

Fine-tune Bert for DocRED with Two-step Process

Authors: Hong Wang, Christfried Focke, Rob Sylvester, Nilesh Mishra, William Wang

Abstract: Modelling relations between multiple entities has attracted increasing attention recently, and a new dataset called DocRED has been collected in order to accelerate the research on the document-level relation extraction. Current baselines for this task uses BiLSTM to encode the whole document and are trained from scratch. We argue that such simple baselines are not strong enough to model to comple… ▽ More Modelling relations between multiple entities has attracted increasing attention recently, and a new dataset called DocRED has been collected in order to accelerate the research on the document-level relation extraction. Current baselines for this task uses BiLSTM to encode the whole document and are trained from scratch. We argue that such simple baselines are not strong enough to model to complex interaction between entities. In this paper, we further apply a pre-trained language model (BERT) to provide a stronger baseline for this task. We also find that solving this task in phases can further improve the performance. The first step is to predict whether or not two entities have a relation, the second step is to predict the specific relation. △ Less

Submitted 26 September, 2019; originally announced September 2019.

arXiv:1905.08937 [pdf, other]

Can a Humanoid Robot be part of the Organizational Workforce? A User Study Leveraging Sentiment Analysis

Authors: Nidhi Mishra, Manoj Ramanathan, Ranjan Satapathy, Erik Cambria, Nadia Magnenat-Thalmann

Abstract: Hiring robots for the workplaces is a challenging task as robots have to cater to customer demands, follow organizational protocols and behave with social etiquette. In this study, we propose to have a humanoid social robot, Nadine, as a customer service agent in an open social work environment. The objective of this study is to analyze the effects of humanoid robots on customers at work environme… ▽ More Hiring robots for the workplaces is a challenging task as robots have to cater to customer demands, follow organizational protocols and behave with social etiquette. In this study, we propose to have a humanoid social robot, Nadine, as a customer service agent in an open social work environment. The objective of this study is to analyze the effects of humanoid robots on customers at work environment, and see if it can handle social scenarios. We propose to evaluate these objectives through two modes, namely, survey questionnaire and customer feedback. We also propose a novel approach to analyze customer feedback data (text) using sentic computing methods. Specifically, we employ aspect extraction and sentiment analysis to analyze the data. From our framework, we detect sentiment associated to the aspects that mainly concerned the customers during their interaction. This allows us to understand customers expectations and current limitations of robots as employees. △ Less

Submitted 30 May, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

Comments: Submitted to IEEE RO-MAN2019

arXiv:1712.09763 [pdf, other]

PixelSNAIL: An Improved Autoregressive Generative Model

Authors: Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel

Abstract: Autoregressive generative models consistently achieve the best results in density estimation tasks involving high dimensional data, such as images or audio. They pose density estimation as a sequence modeling task, where a recurrent neural network (RNN) models the conditional distribution over the next element conditioned on all previous elements. In this paradigm, the bottleneck is the extent to… ▽ More Autoregressive generative models consistently achieve the best results in density estimation tasks involving high dimensional data, such as images or audio. They pose density estimation as a sequence modeling task, where a recurrent neural network (RNN) models the conditional distribution over the next element conditioned on all previous elements. In this paradigm, the bottleneck is the extent to which the RNN can model long-range dependencies, and the most successful approaches rely on causal convolutions, which offer better access to earlier parts of the sequence than conventional RNNs. Taking inspiration from recent work in meta reinforcement learning, where dealing with long-range dependencies is also essential, we introduce a new generative model architecture that combines causal convolutions with self attention. In this note, we describe the resulting model and present state-of-the-art log-likelihood results on CIFAR-10 (2.85 bits per dim) and $32 \times 32$ ImageNet (3.80 bits per dim). Our implementation is available at https://github.com/neocxi/pixelsnail-public △ Less

Submitted 28 December, 2017; originally announced December 2017.

arXiv:1710.05006 [pdf, other]

Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC)

Authors: Noel C. F. Codella, David Gutman, M. Emre Celebi, Brian Helba, Michael A. Marchetti, Stephen W. Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kittler, Allan Halpern

Abstract: This article describes the design, implementation, and results of the latest installment of the dermoscopic image analysis benchmark challenge. The goal is to support research and development of algorithms for automated diagnosis of melanoma, the most lethal skin cancer. The challenge was divided into 3 tasks: lesion segmentation, feature detection, and disease classification. Participation involv… ▽ More This article describes the design, implementation, and results of the latest installment of the dermoscopic image analysis benchmark challenge. The goal is to support research and development of algorithms for automated diagnosis of melanoma, the most lethal skin cancer. The challenge was divided into 3 tasks: lesion segmentation, feature detection, and disease classification. Participation involved 593 registrations, 81 pre-submissions, 46 finalized submissions (including a 4-page manuscript), and approximately 50 attendees, making this the largest standardized and comparative study in this field to date. While the official challenge duration and ranking of participants has concluded, the dataset snapshots remain available for further research and development. △ Less

Submitted 8 January, 2018; v1 submitted 13 October, 2017; originally announced October 2017.

arXiv:1709.05490 [pdf]

doi 10.1063/1.4984158

Performance analysis of dual-hop optical wireless communication systems over k-distribution turbulence channel with pointing error

Authors: Neha Mishra, D. Sriram Kumar, Pranav Kumar Jha

Abstract: In this paper, we investigate the performance of the dual-hop free space optical (FSO) communication systems under the effect of strong atmospheric turbulence together with misalignment effects (pointing error). We consider a relay assisted link using decode and forward (DF) relaying protocol between source and destination with the assumption that Channel State Information is available at both tra… ▽ More In this paper, we investigate the performance of the dual-hop free space optical (FSO) communication systems under the effect of strong atmospheric turbulence together with misalignment effects (pointing error). We consider a relay assisted link using decode and forward (DF) relaying protocol between source and destination with the assumption that Channel State Information is available at both transmitting and receiving terminals. The atmospheric turbulence channels are modeled by k-distribution with pointing error impairment. The exact closed form expression is derived for outage probability and bit error rate and illustrated through numerical plots. Further BER results are compared for the different modulation schemes. △ Less

Submitted 16 September, 2017; originally announced September 2017.

MSC Class: 41-XX Approximations and expansions

Journal ref: AIP Conference Proceedings 1849, 020011 (2017)

arXiv:1709.05489 [pdf]

doi 10.1063/1.4984154

Challenges and potentials for visible light communications: State of the art

Authors: Pranav Kumar Jha, Neha Mishra, D. Sriram Kumar

Abstract: Visible Light Communication is the emerging field in the area of Indoor Optical Wireless Communication which uses white light LEDs for transmitting data and light simultaneously. LEDs can be modulated at very high speeds which increases its efficiency and enabling it for the dual purposes of data communication and illumination simultaneously. Radio Frequency have some limitations which is not at p… ▽ More Visible Light Communication is the emerging field in the area of Indoor Optical Wireless Communication which uses white light LEDs for transmitting data and light simultaneously. LEDs can be modulated at very high speeds which increases its efficiency and enabling it for the dual purposes of data communication and illumination simultaneously. Radio Frequency have some limitations which is not at par with the current demand of bandwidth but using visible light, it is possible to achieve higher data rates per user. In this paper, we discuss some challenges, potentials and possible future applications for this new technology. Basically, visible light communication is for indoor application capable of multiuser access. We also design a very basic illumination pattern inside a room using uniform power distribution. △ Less

Submitted 16 September, 2017; originally announced September 2017.

Journal ref: AIP Conference Proceedings 1849, 020007 (2017)

arXiv:1707.03141 [pdf, other]

A Simple Neural Attentive Meta-Learner

Authors: Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, Pieter Abbeel

Abstract: Deep neural networks excel in regimes with large amounts of data, but tend to struggle when data is scarce or when they need to adapt quickly to changes in the task. In response, recent work in meta-learning proposes training a meta-learner on a distribution of similar tasks, in the hopes of generalization to novel but related tasks by learning a high-level strategy that captures the essence of th… ▽ More Deep neural networks excel in regimes with large amounts of data, but tend to struggle when data is scarce or when they need to adapt quickly to changes in the task. In response, recent work in meta-learning proposes training a meta-learner on a distribution of similar tasks, in the hopes of generalization to novel but related tasks by learning a high-level strategy that captures the essence of the problem it is asked to solve. However, many recent meta-learning approaches are extensively hand-designed, either using architectures specialized to a particular application, or hard-coding algorithmic components that constrain how the meta-learner solves the task. We propose a class of simple and generic meta-learner architectures that use a novel combination of temporal convolutions and soft attention; the former to aggregate information from past experience and the latter to pinpoint specific pieces of information. In the most extensive set of meta-learning experiments to date, we evaluate the resulting Simple Neural AttentIve Learner (or SNAIL) on several heavily-benchmarked tasks. On all tasks, in both supervised and reinforcement learning, SNAIL attains state-of-the-art performance by significant margins. △ Less

Submitted 24 February, 2018; v1 submitted 11 July, 2017; originally announced July 2017.

Comments: iclr 2018 version

arXiv:1703.04070 [pdf, other]

Prediction and Control with Temporal Segment Models

Authors: Nikhil Mishra, Pieter Abbeel, Igor Mordatch

Abstract: We introduce a method for learning the dynamics of complex nonlinear systems based on deep generative models over temporal segments of states and actions. Unlike dynamics models that operate over individual discrete timesteps, we learn the distribution over future state trajectories conditioned on past state, past action, and planned future action trajectories, as well as a latent prior over actio… ▽ More We introduce a method for learning the dynamics of complex nonlinear systems based on deep generative models over temporal segments of states and actions. Unlike dynamics models that operate over individual discrete timesteps, we learn the distribution over future state trajectories conditioned on past state, past action, and planned future action trajectories, as well as a latent prior over action trajectories. Our approach is based on convolutional autoregressive models and variational autoencoders. It makes stable and accurate predictions over long horizons for complex, stochastic systems, effectively expressing uncertainty and modeling the effects of collisions, sensory noise, and action delays. The learned dynamics model and action prior can be used for end-to-end, fully differentiable trajectory optimization and model-based policy optimization, which we use to evaluate the performance and sample-efficiency of our method. △ Less

Submitted 13 July, 2017; v1 submitted 11 March, 2017; originally announced March 2017.

Comments: camera-ready version, ICML 2017

arXiv:1605.01397 [pdf, other]

Skin Lesion Analysis toward Melanoma Detection: A Challenge at the International Symposium on Biomedical Imaging (ISBI) 2016, hosted by the International Skin Imaging Collaboration (ISIC)

Authors: David Gutman, Noel C. F. Codella, Emre Celebi, Brian Helba, Michael Marchetti, Nabin Mishra, Allan Halpern

Abstract: In this article, we describe the design and implementation of a publicly accessible dermatology image analysis benchmark challenge. The goal of the challenge is to sup- port research and development of algorithms for automated diagnosis of melanoma, a lethal form of skin cancer, from dermoscopic images. The challenge was divided into sub-challenges for each task involved in image analysis, includi… ▽ More In this article, we describe the design and implementation of a publicly accessible dermatology image analysis benchmark challenge. The goal of the challenge is to sup- port research and development of algorithms for automated diagnosis of melanoma, a lethal form of skin cancer, from dermoscopic images. The challenge was divided into sub-challenges for each task involved in image analysis, including lesion segmentation, dermoscopic feature detection within a lesion, and classification of melanoma. Training data included 900 images. A separate test dataset of 379 images was provided to measure resultant performance of systems developed with the training data. Ground truth for both training and test sets was generated by a panel of dermoscopic experts. In total, there were 79 submissions from a group of 38 participants, making this the largest standardized and comparative study for melanoma diagnosis in dermoscopic images to date. While the official challenge duration and ranking of participants has concluded, the datasets remain available for further research and development. △ Less

Submitted 4 May, 2016; originally announced May 2016.

arXiv:1601.07843 [pdf]

An Overview of Melanoma Detection in Dermoscopy Images Using Image Processing and Machine Learning

Authors: Nabin K. Mishra, M. Emre Celebi

Abstract: The incidence of malignant melanoma continues to increase worldwide. This cancer can strike at any age; it is one of the leading causes of loss of life in young persons. Since this cancer is visible on the skin, it is potentially detectable at a very early stage when it is curable. New developments have converged to make fully automatic early melanoma detection a real possibility. First, the adven… ▽ More The incidence of malignant melanoma continues to increase worldwide. This cancer can strike at any age; it is one of the leading causes of loss of life in young persons. Since this cancer is visible on the skin, it is potentially detectable at a very early stage when it is curable. New developments have converged to make fully automatic early melanoma detection a real possibility. First, the advent of dermoscopy has enabled a dramatic boost in clinical diagnostic ability to the point that melanoma can be detected in the clinic at the very earliest stages. The global adoption of this technology has allowed accumulation of large collections of dermoscopy images of melanomas and benign lesions validated by histopathology. The development of advanced technologies in the areas of image processing and machine learning have given us the ability to allow distinction of malignant melanoma from the many benign mimics that require no biopsy. These new technologies should allow not only earlier detection of melanoma, but also reduction of the large number of needless and costly biopsy procedures. Although some of the new systems reported for these technologies have shown promise in preliminary trials, widespread implementation must await further technical progress in accuracy and reproducibility. In this paper, we provide an overview of computerized detection of melanoma in dermoscopy images. First, we discuss the various aspects of lesion segmentation. Then, we provide a brief overview of clinical feature segmentation. Finally, we discuss the classification stage where machine learning algorithms are applied to the attributes generated from the segmented features to predict the existence of melanoma. △ Less

Submitted 28 January, 2016; originally announced January 2016.

Comments: 15 pages, 3 figures

ACM Class: I.4; I.5.4; J.3

arXiv:1403.5049

Hybrid Virtual network Embedding Algorithm with K-core Decomposition using Path Splitting

Authors: Neha Mishra

Abstract: Network virtualization is an efficient approach of solving the ossification problem of the Internet. It has become a promising way of supporting lots of heterogeneous network onto substrate physical network. A major challenge in network virtualization is how to map multiple virtual networks onto specific nodes and links in the shared substrate network, known as virtual network embedding problem. D… ▽ More Network virtualization is an efficient approach of solving the ossification problem of the Internet. It has become a promising way of supporting lots of heterogeneous network onto substrate physical network. A major challenge in network virtualization is how to map multiple virtual networks onto specific nodes and links in the shared substrate network, known as virtual network embedding problem. Due to its NPhardness, many heuristic approaches have been proposed. In this thesis, we propose hybrid VN embedding algorithms that map multiple VN requests with node and link constraints with K-core decomposition using path splitting. Based on network pruning, a virtual network is decomposed to core network and edge network. The map** process is divided into two phases: core network map** and edge network map**. Path splitting enables better resource utilization by allowing the substrate to accept more VN requests. It splits the available bandwidth of the path into small bandwidth to satisfy the resource constraints. The proposed algorithm improves the performance of the algorithm by splitting the path into small bandwidth. Due to path splitting proposed algorithm will accept many virtual network requests that will increase the revenue and acceptance ratio of the algorithm. △ Less

Submitted 13 November, 2014; v1 submitted 20 March, 2014; originally announced March 2014.

Comments: This paper has been withdrawn by the author due to a crucial sign error in equation 1

arXiv:1305.2959 [pdf]

Automatic Speech Recognition Using Template Model for Man-Machine Interface

Authors: Neema Mishra, Urmila Shrawankar, V M Thakare

Abstract: Speech is a natural form of communication for human beings, and computers with the ability to understand speech and speak with a human voice are expected to contribute to the development of more natural man-machine interfaces. Computers with this kind of ability are gradually becoming a reality, through the evolution of speech recognition technologies. Speech is being an important mode of interact… ▽ More Speech is a natural form of communication for human beings, and computers with the ability to understand speech and speak with a human voice are expected to contribute to the development of more natural man-machine interfaces. Computers with this kind of ability are gradually becoming a reality, through the evolution of speech recognition technologies. Speech is being an important mode of interaction with computers. In this paper Feature extraction is implemented using well-known Mel-Frequency Cepstral Coefficients (MFCC).Pattern matching is done using Dynamic time war** (DTW) algorithm. △ Less

Submitted 9 May, 2013; originally announced May 2013.

Comments: Pages: 05 Figures : 01 Tables : 03 Proceedings of the International Conference ICAET 2010, Chennai, India. arXiv admin note: text overlap with arXiv:1305.2847

arXiv:1305.2847 [pdf]

An Overview of Hindi Speech Recognition

Authors: Neema Mishra, Urmila Shrawankar, V M Thakare

Abstract: In this age of information technology, information access in a convenient manner has gained importance. Since speech is a primary mode of communication among human beings, it is natural for people to expect to be able to carry out spoken dialogue with computer. Speech recognition system permits ordinary people to speak to the computer to retrieve information. It is desirable to have a human comput… ▽ More In this age of information technology, information access in a convenient manner has gained importance. Since speech is a primary mode of communication among human beings, it is natural for people to expect to be able to carry out spoken dialogue with computer. Speech recognition system permits ordinary people to speak to the computer to retrieve information. It is desirable to have a human computer dialogue in local language. Hindi being the most widely spoken Language in India is the natural primary human language candidate for human machine interaction. There are five pairs of vowels in Hindi languages; one member is longer than the other one. This paper describes an overview of speech recognition system that includes how speech is produced and the properties and characteristics of Hindi Phoneme. △ Less

Submitted 9 May, 2013; originally announced May 2013.

Comments: Pages: 05 Figures : 04 Tables : 03 Proceedings of the International Conference ICCSCT 2010, Tirunelveli, India

arXiv:1206.6440 [pdf]

Predicting Preference Flips in Commerce Search

Authors: Or Sheffet, Nina Mishra, Samuel Ieong

Abstract: Traditional approaches to ranking in web search follow the paradigm of rank-by-score: a learned function gives each query-URL combination an absolute score and URLs are ranked according to this score. This paradigm ensures that if the score of one URL is better than another then one will always be ranked higher than the other. Scoring contradicts prior work in behavioral economics that showed that… ▽ More Traditional approaches to ranking in web search follow the paradigm of rank-by-score: a learned function gives each query-URL combination an absolute score and URLs are ranked according to this score. This paradigm ensures that if the score of one URL is better than another then one will always be ranked higher than the other. Scoring contradicts prior work in behavioral economics that showed that users' preferences between two items depend not only on the items but also on the presented alternatives. Thus, for the same query, users' preference between items A and B depends on the presence/absence of item C. We propose a new model of ranking, the Random Shopper Model, that allows and explains such behavior. In this model, each feature is viewed as a Markov chain over the items to be ranked, and the goal is to find a weighting of the features that best reflects their importance. We show that our model can be learned under the empirical risk minimization framework, and give an efficient learning algorithm. Experiments on commerce search logs demonstrate that our algorithm outperforms scoring-based approaches including regression and listwise ranking. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

arXiv:1204.2606 [pdf, ps, other]

doi 10.29012/jpc.v5i1.625

Privacy via the Johnson-Lindenstrauss Transform

Authors: Krishnaram Kenthapadi, Aleksandra Korolova, Ilya Mironov, Nina Mishra

Abstract: Suppose that party A collects private information about its users, where each user's data is represented as a bit vector. Suppose that party B has a proprietary data mining algorithm that requires estimating the distance between users, such as clustering or nearest neighbors. We ask if it is possible for party A to publish some information about each user so that B can estimate the distance betwee… ▽ More Suppose that party A collects private information about its users, where each user's data is represented as a bit vector. Suppose that party B has a proprietary data mining algorithm that requires estimating the distance between users, such as clustering or nearest neighbors. We ask if it is possible for party A to publish some information about each user so that B can estimate the distance between users without being able to infer any private bit of a user. Our method involves projecting each user's representation into a random, lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then adding Gaussian noise to each entry of the lower-dimensional representation. We show that the method preserves differential privacy---where the more privacy is desired, the larger the variance of the Gaussian noise. Further, we show how to approximate the true distances between users via only the lower-dimensional, perturbed data. Finally, we consider other perturbation methods such as randomized response and draw comparisons to sketch-based methods. While the goal of releasing user-specific data to third parties is more broad than preserving distances, this work shows that distance computations with privacy is an achievable goal. △ Less

Submitted 11 April, 2012; originally announced April 2012.

Comments: 24 pages

ACM Class: K.4.1; F.2; H.3.5; G.3; I.5.3; H.3.3; H.2.8; E.1; G.1.3

Journal ref: Journal of Privacy and Confidentiality, Volume 5, Issue 1, Pages 39-71, 2013

arXiv:1007.3799 [pdf, ps, other]

Adapting to the Shifting Intent of Search Queries

Authors: Umar Syed, Aleksandrs Slivkins, Nina Mishra

Abstract: Search engines today present results that are often oblivious to abrupt shifts in intent. For example, the query `independence day' usually refers to a US holiday, but the intent of this query abruptly changed during the release of a major film by that name. While no studies exactly quantify the magnitude of intent-shifting traffic, studies suggest that news events, seasonal topics, pop culture, e… ▽ More Search engines today present results that are often oblivious to abrupt shifts in intent. For example, the query `independence day' usually refers to a US holiday, but the intent of this query abruptly changed during the release of a major film by that name. While no studies exactly quantify the magnitude of intent-shifting traffic, studies suggest that news events, seasonal topics, pop culture, etc account for 50% of all search queries. This paper shows that the signals a search engine receives can be used to both determine that a shift in intent has happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-shifting traffic increases. △ Less

Submitted 22 July, 2010; originally announced July 2010.

Comments: This is the full version of the paper in NIPS'09

arXiv:1003.5623 [pdf]

Spoken Language Identification Using Hybrid Feature Extraction Methods

Authors: Pawan Kumar, Astik Biswas, A . N. Mishra, Mahesh Chandra

Abstract: This paper introduces and motivates the use of hybrid robust feature extraction technique for spoken language identification (LID) system. The speech recognizers use a parametric form of a signal to get the most important distinguishable features of speech signal for recognition task. In this paper Mel-frequency cepstral coefficients (MFCC), Perceptual linear prediction coefficients (PLP) along wi… ▽ More This paper introduces and motivates the use of hybrid robust feature extraction technique for spoken language identification (LID) system. The speech recognizers use a parametric form of a signal to get the most important distinguishable features of speech signal for recognition task. In this paper Mel-frequency cepstral coefficients (MFCC), Perceptual linear prediction coefficients (PLP) along with two hybrid features are used for language Identification. Two hybrid features, Bark Frequency Cepstral Coefficients (BFCC) and Revised Perceptual Linear Prediction Coefficients (RPLP) were obtained from combination of MFCC and PLP. Two different classifiers, Vector Quantization (VQ) with Dynamic Time War** (DTW) and Gaussian Mixture Model (GMM) were used for classification. The experiment shows better identification rate using hybrid feature extraction techniques compared to conventional feature extraction methods.BFCC has shown better performance than MFCC with both classifiers. RPLP along with GMM has shown best identification performance among all feature extraction techniques. △ Less

Submitted 29 March, 2010; originally announced March 2010.

Journal ref: Journal of Telecommunications, Volume 1, Issue 2, pp11-15, March 2010

Showing 1–33 of 33 results for author: Mishra, N