-
End-to-End Learning of Flowchart Grounded Task-Oriented Dialogs
Authors:
Dinesh Raghu,
Shantanu Agarwal,
Sachindra Joshi,
Mausam
Abstract:
We propose a novel problem within end-to-end learning of task-oriented dialogs (TOD), in which the dialog system mimics a troubleshooting agent who helps a user by diagnosing their problem (e.g., car not starting). Such dialogs are grounded in domain-specific flowcharts, which the agent is supposed to follow during the conversation. Our task exposes novel technical challenges for neural TOD, such…
▽ More
We propose a novel problem within end-to-end learning of task-oriented dialogs (TOD), in which the dialog system mimics a troubleshooting agent who helps a user by diagnosing their problem (e.g., car not starting). Such dialogs are grounded in domain-specific flowcharts, which the agent is supposed to follow during the conversation. Our task exposes novel technical challenges for neural TOD, such as grounding an utterance to the flowchart without explicit annotation, referring to additional manual pages when user asks a clarification question, and ability to follow unseen flowcharts at test time. We release a dataset (FloDial) consisting of 2,738 dialogs grounded on 12 different troubleshooting flowcharts. We also design a neural model, FloNet, which uses a retrieval-augmented generation architecture to train the dialog agent. Our experiments find that FloNet can do zero-shot transfer to unseen flowcharts, and sets a strong baseline for future research.
△ Less
Submitted 7 December, 2021; v1 submitted 15 September, 2021;
originally announced September 2021.
-
From polarization multipoles to higher-order coherences
Authors:
Aaron Z. Goldberg,
Andrei B. Klimov,
Hubert de Guise,
Gerd Leuchs,
Girish S. Agarwal,
Luis L. Sánchez-Soto
Abstract:
We demonstrate that the multipoles associated with the density matrix are truly observable quantities that can be unambiguously determined from intensity moments. Given their correct transformation properties, these multipoles are the natural variables to deal with a number of problems in the quantum domain. In the case of polarization, the moments are measured after the light has passed through t…
▽ More
We demonstrate that the multipoles associated with the density matrix are truly observable quantities that can be unambiguously determined from intensity moments. Given their correct transformation properties, these multipoles are the natural variables to deal with a number of problems in the quantum domain. In the case of polarization, the moments are measured after the light has passed through two quarter-wave plates, one half-wave plate, and a polarizing beam splitter for specific values of the angles of the waveplates. For more general two-mode problems, equivalent measurements can be performed.
△ Less
Submitted 16 December, 2021; v1 submitted 9 September, 2021;
originally announced September 2021.
-
On-demand Parity-Time symmetry in a lone oscillator through complex, synthetic gauge fields
Authors:
Mario A. Quiroz-Juárez,
Kaustubh S. Agarwal,
Zachary A. Cochran,
José L. Aragón,
Yogesh N. Joglekar,
Roberto de J. León-Montiel
Abstract:
What is the fate of an oscillator when its inductance and capacitance are varied while its frequency is kept constant? Inspired by this question, we propose a protocol to implement parity-time (PT) symmetry in a lone oscillator. Different forms of constrained variations lead to static, periodic, or arbitrary balanced gain and loss profiles, that can be interpreted as purely imaginary gauge fields.…
▽ More
What is the fate of an oscillator when its inductance and capacitance are varied while its frequency is kept constant? Inspired by this question, we propose a protocol to implement parity-time (PT) symmetry in a lone oscillator. Different forms of constrained variations lead to static, periodic, or arbitrary balanced gain and loss profiles, that can be interpreted as purely imaginary gauge fields. With a state-of-the-art, dynamically tunable LC oscillator comprising synthetic circuit elements, we demonstrate static and Floquet PT breaking transitions, including those at vanishingly small gain and loss, by tracking the circuit energy. Concurrently, we derive and observe conserved quantities in this open, balanced gain-loss system, both in the static and Floquet cases. Lastly, by measuring the circuit energy, we unveil a giant dynamical asymmetry along exceptional point (EP) contours that emerge symmetrically from the Hermitian degeneracies at Floquet resonances. Distinct from material or parametric gain and loss mechanisms, our protocol enables on-demand parity-time symmetry in a minimal classical system -- a single oscillator -- and may be ported to other realizations including metamaterials and optomechanical systems.
△ Less
Submitted 26 October, 2021; v1 submitted 8 September, 2021;
originally announced September 2021.
-
On the Accuracy of Analog Neural Network Inference Accelerators
Authors:
T. Patrick Xiao,
Ben Feinberg,
Christopher H. Bennett,
Venkatraman Prabhakar,
Prashant Saxena,
Vineet Agrawal,
Sapan Agarwal,
Matthew J. Marinella
Abstract:
Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy ef…
▽ More
Specialized accelerators have recently garnered attention as a method to reduce the power consumption of neural network inference. A promising category of accelerators utilizes nonvolatile memory arrays to both store weights and perform $\textit{in situ}$ analog computation inside the array. While prior work has explored the design space of analog accelerators to optimize performance and energy efficiency, there is seldom a rigorous evaluation of the accuracy of these accelerators. This work shows how architectural design decisions, particularly in map** neural network parameters to analog memory cells, influence inference accuracy. When evaluated using ResNet50 on ImageNet, the resilience of the system to analog non-idealities - cell programming errors, analog-to-digital converter resolution, and array parasitic resistances - all improve when analog quantities in the hardware are made proportional to the weights in the network. Moreover, contrary to the assumptions of prior work, nearly equivalent resilience to cell imprecision can be achieved by fully storing weights as analog quantities, rather than spreading weight bits across multiple devices, often referred to as bit slicing. By exploiting proportionality, analog system designers have the freedom to match the precision of the hardware to the needs of the algorithm, rather than attempting to guarantee the same level of precision in the intermediate results as an equivalent digital accelerator. This ultimately results in an analog accelerator that is more accurate, more robust to analog errors, and more energy-efficient.
△ Less
Submitted 3 February, 2022; v1 submitted 2 September, 2021;
originally announced September 2021.
-
Synthesizing five-body interaction in a superconducting quantum circuit
Authors:
Ke Zhang,
Hekang Li,
Pengfei Zhang,
Jiale Yuan,
**yan Chen,
Wenhui Ren,
Zhen Wang,
Chao Song,
Da-Wei Wang,
H. Wang,
Shiyao Zhu,
Girish S. Agarwal,
Marlan O. Scully
Abstract:
Synthesizing many-body interaction Hamiltonian is a central task in quantum simulation. However, it is challenging to synthesize interactions including more than two spins. Borrowing tools from quantum optics, we synthesize five-body spin-exchange interaction in a superconducting quantum circuit by simultaneously exciting four independent qubits with time-energy correlated photon quadruples genera…
▽ More
Synthesizing many-body interaction Hamiltonian is a central task in quantum simulation. However, it is challenging to synthesize interactions including more than two spins. Borrowing tools from quantum optics, we synthesize five-body spin-exchange interaction in a superconducting quantum circuit by simultaneously exciting four independent qubits with time-energy correlated photon quadruples generated from a qudit. During the dynamic evolution of the five-body interaction, a Greenberger-Horne-Zeilinger state is generated in a single step with fidelity estimated to be $0.685$. We compare the influence of noise on the three-, four- and five-body interaction as a step toward answering the question on the quantum origin of chiral molecules. We also demonstrate a many-body Mach-Zehnder interferometer which potentially has a Heisenberg-limit sensitivity. This study paves a way for quantum simulation involving many-body interactions and high excited states of quantum circuits.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Impact of Attention on Adversarial Robustness of Image Classification Models
Authors:
Prachi Agrawal,
Narinder Singh Punn,
Sanjay Kumar Sonbhadra,
Sonali Agarwal
Abstract:
Adversarial attacks against deep learning models have gained significant attention and recent works have proposed explanations for the existence of adversarial examples and techniques to defend the models against these attacks. Attention in computer vision has been used to incorporate focused learning of important features and has led to improved accuracy. Recently, models with attention mechanism…
▽ More
Adversarial attacks against deep learning models have gained significant attention and recent works have proposed explanations for the existence of adversarial examples and techniques to defend the models against these attacks. Attention in computer vision has been used to incorporate focused learning of important features and has led to improved accuracy. Recently, models with attention mechanisms have been proposed to enhance adversarial robustness. Following this context, this work aims at a general understanding of the impact of attention on adversarial robustness. This work presents a comparative study of adversarial robustness of non-attention and attention based image classification models trained on CIFAR-10, CIFAR-100 and Fashion MNIST datasets under the popular white box and black box attacks. The experimental results show that the robustness of attention based models may be dependent on the datasets used i.e. the number of classes involved in the classification. In contrast to the datasets with less number of classes, attention based models are observed to show better robustness towards classification.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Quantum Fisher Information Bounds on Precision Limits of Circular Dichroism
Authors:
Jiaxuan Wang,
Girish S. Agarwal
Abstract:
Circular dichroism (CD) is a widely used technique for investigating optically chiral molecules, especially for biomolecules. It is thus of great importance that these parameters be estimated precisely so that the molecules with desired functionalities can be designed. In order to surpass the limits of classical measurements, we need to probe the system with quantum light. We develop quantum Fishe…
▽ More
Circular dichroism (CD) is a widely used technique for investigating optically chiral molecules, especially for biomolecules. It is thus of great importance that these parameters be estimated precisely so that the molecules with desired functionalities can be designed. In order to surpass the limits of classical measurements, we need to probe the system with quantum light. We develop quantum Fisher information matrix (QFIM) for precision estimates of the circular dichroism and the optical rotary dispersion for a variety of input quantum states of light. The Cramer-Rao bounds, for all four chirality parameters are obtained, from QFIM for (a) single photon input states with a specific linear polarization and for (b) NOON states having two photons with both either left polarized or right polarized. The QFIM bounds, using quantum light, are compared with bounds obtained for classical light beams i.e., beams in coherent states. Quite generally, both the single photon state and the NOON state exhibit superior precision in the estimation of absorption and phase shift in relation to a coherent source of comparable intensity, especially in the weak absorption regime. In particular, the NOON state naturally offers the best precision among the three. We compare QFIM bounds with the error sensitivity bounds, as the latter are relatively easier to measure whereas the QFIM bounds require full state tomography. We also outline an empirical scheme for estimating the measurement sensitivities by projective measurements with single-photon detectors.
△ Less
Submitted 31 August, 2021;
originally announced August 2021.
-
Deep learning for surrogate modelling of 2D mantle convection
Authors:
Siddhant Agarwal,
Nicola Tosi,
Pan Kessel,
Doris Breuer,
Grégoire Montavon
Abstract:
Traditionally, 1D models based on scaling laws have been used to parameterized convective heat transfer rocks in the interior of terrestrial planets like Earth, Mars, Mercury and Venus to tackle the computational bottleneck of high-fidelity forward runs in 2D or 3D. However, these are limited in the amount of physics they can model (e.g. depth dependent material properties) and predict only mean q…
▽ More
Traditionally, 1D models based on scaling laws have been used to parameterized convective heat transfer rocks in the interior of terrestrial planets like Earth, Mars, Mercury and Venus to tackle the computational bottleneck of high-fidelity forward runs in 2D or 3D. However, these are limited in the amount of physics they can model (e.g. depth dependent material properties) and predict only mean quantities such as the mean mantle temperature. We recently showed that feedforward neural networks (FNN) trained using a large number of 2D simulations can overcome this limitation and reliably predict the evolution of entire 1D laterally-averaged temperature profile in time for complex models. We now extend that approach to predict the full 2D temperature field, which contains more information in the form of convection structures such as hot plumes and cold downwellings. Using a dataset of 10,525 two-dimensional simulations of the thermal evolution of the mantle of a Mars-like planet, we show that deep learning techniques can produce reliable parameterized surrogates (i.e. surrogates that predict state variables such as temperature based only on parameters) of the underlying partial differential equations. We first use convolutional autoencoders to compress the temperature fields by a factor of 142 and then use FNN and long-short term memory networks (LSTM) to predict the compressed fields. On average, the FNN predictions are 99.30% and the LSTM predictions are 99.22% accurate with respect to unseen simulations. Proper orthogonal decomposition (POD) of the LSTM and FNN predictions shows that despite a lower mean absolute relative accuracy, LSTMs capture the flow dynamics better than FNNs. When summed, the POD coefficients from FNN predictions and from LSTM predictions amount to 96.51% and 97.66% relative to the coefficients of the original simulations, respectively.
△ Less
Submitted 5 November, 2021; v1 submitted 23 August, 2021;
originally announced August 2021.
-
The Dynamical Ensemble of the Posner Molecule is not Symmetric
Authors:
Shivang Agarwal,
Clarice D. Aiello,
Daniel R. Kattnig,
Amartya S. Banerjee
Abstract:
The Posner molecule, $\text{Ca}_9(\text{PO}_4)_6$, has long been recognized to have biochemical relevance in various physiological processes. It has found recent attention for its possible role as a biological quantum information processor, whereby the molecule purportedly maintains long-lived nuclear spin coherences among its ${^{31}\text{P}}$ nuclei (presumed to be symmetrically arranged), allow…
▽ More
The Posner molecule, $\text{Ca}_9(\text{PO}_4)_6$, has long been recognized to have biochemical relevance in various physiological processes. It has found recent attention for its possible role as a biological quantum information processor, whereby the molecule purportedly maintains long-lived nuclear spin coherences among its ${^{31}\text{P}}$ nuclei (presumed to be symmetrically arranged), allowing it to function as a room temperature qubit. The structure of the molecule has been of much dispute in the literature, although the $\text{S}_6$ point group symmetry has often been assumed and exploited in calculations. Using a variety of simulation techniques (including ab initio molecular dynamics and structural relaxation), rigorous data analysis tools and by exploring thousands of individual configurations, we establish that the molecule predominantly assumes low symmetry structures ($\text{C}_\text{s}$ and $\text{C}_\text{i}$) at room temperature, as opposed to the higher symmetry configurations explored previously. Our findings have important implications on the viability of this molecule as a qubit.
△ Less
Submitted 26 October, 2021; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Design and Analysis of Modular Pipe Climber-III with a Multi-Output Differential Mechanism
Authors:
Vishnu Kumar,
Saharsh Agarwal,
Rama Vadapalli,
Nagamanikandan Govindan,
Madhava Krishna
Abstract:
This paper presents the design of an in-pipe climbing robot that operates using a novel `Three-output open differential'(3-OOD) mechanism to traverse complex networks of pipes. Conventional wheeled/tracked in-pipe climbing robots are prone to slip and drag while traversing in pipe bends. The 3-OOD mechanism helps in achieving the novel result of eliminating slip and drag in the robot tracks during…
▽ More
This paper presents the design of an in-pipe climbing robot that operates using a novel `Three-output open differential'(3-OOD) mechanism to traverse complex networks of pipes. Conventional wheeled/tracked in-pipe climbing robots are prone to slip and drag while traversing in pipe bends. The 3-OOD mechanism helps in achieving the novel result of eliminating slip and drag in the robot tracks during motion. The proposed differential realizes the functional abilities of the traditional two-output differential, which is achieved the first time for a differential with three outputs. The 3-OOD mechanism mechanically modulates the track speeds of the robot based on the forces exerted on each track inside the pipe network, by eliminating the need for any active control. The simulation of the robot traversing in the pipe network in different orientations and in pipe-bends without slip shows the proposed design's effectiveness.
△ Less
Submitted 8 January, 2022; v1 submitted 11 July, 2021;
originally announced August 2021.
-
VerSaChI: Finding Statistically Significant Subgraph Matches using Chebyshev's Inequality
Authors:
Shubhangi Agarwal,
Sourav Dutta,
Arnab Bhattacharya
Abstract:
Approximate subgraph matching, which is an important primitive for many applications like question answering, community detection, and motif discovery, often involves large labeled graphs such as knowledge graphs, social networks, and protein sequences. Effective methods for extracting matching subgraphs, in terms of label and structural similarities to a query, should depict accuracy, computation…
▽ More
Approximate subgraph matching, which is an important primitive for many applications like question answering, community detection, and motif discovery, often involves large labeled graphs such as knowledge graphs, social networks, and protein sequences. Effective methods for extracting matching subgraphs, in terms of label and structural similarities to a query, should depict accuracy, computational efficiency, and robustness to noise. In this paper, we propose VerSaChI for finding the top-k most similar subgraphs based on 2-hop label and structural overlap similarity with the query. The similarity is characterized using Chebyshev's inequality to compute the chi-square statistical significance for measuring the degree of matching of the subgraphs. Experiments on real-life graph datasets showcase significant improvements in terms of accuracy compared to state-of-the-art methods, as well as robustness to noise.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
White blood cell subtype detection and classification
Authors:
Nalla Praveen,
Narinder Singh Punn,
Sanjay Kumar Sonbhadra,
Sonali Agarwal,
M. Syafrullah,
Krisna Adiyarta
Abstract:
Machine learning has endless applications in the health care industry. White blood cell classification is one of the interesting and promising area of research. The classification of the white blood cells plays an important part in the medical diagnosis. In practise white blood cell classification is performed by the haematologist by taking a small smear of blood and careful examination under the…
▽ More
Machine learning has endless applications in the health care industry. White blood cell classification is one of the interesting and promising area of research. The classification of the white blood cells plays an important part in the medical diagnosis. In practise white blood cell classification is performed by the haematologist by taking a small smear of blood and careful examination under the microscope. The current procedures to identify the white blood cell subtype is more time taking and error-prone. The computer aided detection and diagnosis of the white blood cells tend to avoid the human error and reduce the time taken to classify the white blood cells. In the recent years several deep learning approaches have been developed in the context of classification of the white blood cells that are able to identify but are unable to localize the positions of white blood cells in the blood cell image. Following this, the present research proposes to utilize YOLOv3 object detection technique to localize and classify the white blood cells with bounding boxes. With exhaustive experimental analysis, the proposed work is found to detect the white blood cell with 99.2% accuracy and classify with 90% accuracy.
△ Less
Submitted 21 October, 2021; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications
Authors:
Sandhini Agarwal,
Gretchen Krueger,
Jack Clark,
Alec Radford,
Jong Wook Kim,
Miles Brundage
Abstract:
Recently, there have been breakthroughs in computer vision ("CV") models that are more generalizable with the advent of models such as CLIP and ALIGN. In this paper, we analyze CLIP and highlight some of the challenges such models pose. CLIP reduces the need for task specific training data, potentially opening up many niche tasks to automation. CLIP also allows its users to flexibly specify image…
▽ More
Recently, there have been breakthroughs in computer vision ("CV") models that are more generalizable with the advent of models such as CLIP and ALIGN. In this paper, we analyze CLIP and highlight some of the challenges such models pose. CLIP reduces the need for task specific training data, potentially opening up many niche tasks to automation. CLIP also allows its users to flexibly specify image classification classes in natural language, which we find can shift how biases manifest. Additionally, through some preliminary probes we find that CLIP can inherit biases found in prior computer vision systems. Given the wide and unpredictable domain of uses for such models, this raises questions regarding what sufficiently safe behaviour for such systems may look like. These results add evidence to the growing body of work calling for a change in the notion of a 'better' model--to move beyond simply looking at higher accuracy at task-oriented capability evaluations, and towards a broader 'better' that takes into account deployment-critical features such as different use contexts, and people who interact with the model when thinking about model deployment.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
RCA-IUnet: A residual cross-spatial attention guided inception U-Net model for tumor segmentation in breast ultrasound imaging
Authors:
Narinder Singh Punn,
Sonali Agarwal
Abstract:
The advancements in deep learning technologies have produced immense contributions to biomedical image analysis applications. With breast cancer being the common deadliest disease among women, early detection is the key means to improve survivability. Medical imaging like ultrasound presents an excellent visual representation of the functioning of the organs; however, for any radiologist analysing…
▽ More
The advancements in deep learning technologies have produced immense contributions to biomedical image analysis applications. With breast cancer being the common deadliest disease among women, early detection is the key means to improve survivability. Medical imaging like ultrasound presents an excellent visual representation of the functioning of the organs; however, for any radiologist analysing such scans is challenging and time consuming which delays the diagnosis process. Although various deep learning based approaches are proposed that achieved promising results, the present article introduces an efficient residual cross-spatial attention guided inception U-Net (RCA-IUnet) model with minimal training parameters for tumor segmentation using breast ultrasound imaging to further improve the segmentation performance of varying tumor sizes. The RCA-IUnet model follows U-Net topology with residual inception depth-wise separable convolution and hybrid pooling (max pooling and spectral pooling) layers. In addition, cross-spatial attention filters are added to suppress the irrelevant features and focus on the target structure. The segmentation performance of the proposed model is validated on two publicly available datasets using standard segmentation evaluation metrics, where it outperformed the other state-of-the-art segmentation models.
△ Less
Submitted 2 January, 2022; v1 submitted 5 August, 2021;
originally announced August 2021.
-
DECAF: Deep Extreme Classification with Label Features
Authors:
Anshul Mittal,
Kunal Dahiya,
Sheshansh Agrawal,
Deepak Saini,
Sumeet Agarwal,
Purushottam Kar,
Manik Varma
Abstract:
Extreme multi-label classification (XML) involves tagging a data point with its most relevant subset of labels from an extremely large label set, with several applications such as product-to-product recommendation with millions of products. Although leading XML algorithms scale to millions of labels, they largely ignore label meta-data such as textual descriptions of the labels. On the other hand,…
▽ More
Extreme multi-label classification (XML) involves tagging a data point with its most relevant subset of labels from an extremely large label set, with several applications such as product-to-product recommendation with millions of products. Although leading XML algorithms scale to millions of labels, they largely ignore label meta-data such as textual descriptions of the labels. On the other hand, classical techniques that can utilize label metadata via representation learning using deep networks struggle in extreme settings. This paper develops the DECAF algorithm that addresses these challenges by learning models enriched by label metadata that jointly learn model parameters and feature representations using deep networks and offer accurate classification at the scale of millions of labels. DECAF makes specific contributions to model architecture design, initialization, and training, enabling it to offer up to 2-6% more accurate prediction than leading extreme classifiers on publicly available benchmark product-to-product recommendation datasets, such as LF-AmazonTitles-1.3M. At the same time, DECAF was found to be up to 22x faster at inference than leading deep extreme classifiers, which makes it suitable for real-time applications that require predictions within a few milliseconds. The code for DECAF is available at the following URL https://github.com/Extreme-classification/DECAF.
△ Less
Submitted 1 August, 2021;
originally announced August 2021.
-
ECLARE: Extreme Classification with Label Graph Correlations
Authors:
Anshul Mittal,
Noveen Sachdeva,
Sheshansh Agrawal,
Sumeet Agarwal,
Purushottam Kar,
Manik Varma
Abstract:
Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during training. Such rare labels hold the key to personalized recommendations that can delight and surprise a user. However, the large number of rare labels a…
▽ More
Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during training. Such rare labels hold the key to personalized recommendations that can delight and surprise a user. However, the large number of rare labels and small amount of training data per rare label offer significant statistical and computational challenges. State-of-the-art deep XC methods attempt to remedy this by incorporating textual descriptions of labels but do not adequately address the problem. This paper presents ECLARE, a scalable deep learning architecture that incorporates not only label text, but also label correlations, to offer accurate real-time predictions within a few milliseconds. Core contributions of ECLARE include a frugal architecture and scalable techniques to train deep models along with label correlation graphs at the scale of millions of labels. In particular, ECLARE offers predictions that are 2 to 14% more accurate on both publicly available benchmark datasets as well as proprietary datasets for a related products recommendation task sourced from the Bing search engine. Code for ECLARE is available at https://github.com/Extreme-classification/ECLARE.
△ Less
Submitted 31 July, 2021;
originally announced August 2021.
-
MAG-Net: Multi-task attention guided network for brain tumor segmentation and classification
Authors:
Sachin Gupta,
Narinder Singh Punn,
Sanjay Kumar Sonbhadra,
Sonali Agarwal
Abstract:
Brain tumor is the most common and deadliest disease that can be found in all age groups. Generally, MRI modality is adopted for identifying and diagnosing tumors by the radiologists. The correct identification of tumor regions and its type can aid to diagnose tumors with the followup treatment plans. However, for any radiologist analysing such scans is a complex and time-consuming task. Motivated…
▽ More
Brain tumor is the most common and deadliest disease that can be found in all age groups. Generally, MRI modality is adopted for identifying and diagnosing tumors by the radiologists. The correct identification of tumor regions and its type can aid to diagnose tumors with the followup treatment plans. However, for any radiologist analysing such scans is a complex and time-consuming task. Motivated by the deep learning based computer-aided-diagnosis systems, this paper proposes multi-task attention guided encoder-decoder network (MAG-Net) to classify and segment the brain tumor regions using MRI images. The MAG-Net is trained and evaluated on the Figshare dataset that includes coronal, axial, and sagittal views with 3 types of tumors meningioma, glioma, and pituitary tumor. With exhaustive experimental trials the model achieved promising results as compared to existing state-of-the-art models, while having least number of training parameters among other state-of-the-art models.
△ Less
Submitted 6 December, 2021; v1 submitted 26 July, 2021;
originally announced July 2021.
-
A Penalized Shared-parameter Algorithm for Estimating Optimal Dynamic Treatment Regimens
Authors:
Palash Ghosh,
Trikay Nalamada,
Shruti Agarwal,
Maria Jahja,
Bibhas Chakraborty
Abstract:
A dynamic treatment regimen (DTR) is a set of decision rules to personalize treatments for an individual using their medical history. The Q-learning based Q-shared algorithm has been used to develop DTRs that involve decision rules shared across multiple stages of intervention. We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-lea…
▽ More
A dynamic treatment regimen (DTR) is a set of decision rules to personalize treatments for an individual using their medical history. The Q-learning based Q-shared algorithm has been used to develop DTRs that involve decision rules shared across multiple stages of intervention. We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-learning setup, and identify the condition in which Q-shared fails. Leveraging properties from expansion-constrained ordinary least-squares, we give a penalized Q-shared algorithm that not only converges in settings that violate the condition, but can outperform the original Q-shared algorithm even when the condition is satisfied. We give evidence for the proposed method in a real-world application and several synthetic simulations.
△ Less
Submitted 26 May, 2022; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Recommending best course of treatment based on similarities of prognostic markers
Authors:
Sudhanshu,
Narinder Singh Punn,
Sanjay Kumar Sonbhadra,
Sonali Agarwal
Abstract:
With the advancement in the technology sector spanning over every field, a huge influx of information is inevitable. Among all the opportunities that the advancements in the technology have brought, one of them is to propose efficient solutions for data retrieval. This means that from an enormous pile of data, the retrieval methods should allow the users to fetch the relevant and recent data over…
▽ More
With the advancement in the technology sector spanning over every field, a huge influx of information is inevitable. Among all the opportunities that the advancements in the technology have brought, one of them is to propose efficient solutions for data retrieval. This means that from an enormous pile of data, the retrieval methods should allow the users to fetch the relevant and recent data over time. In the field of entertainment and e-commerce, recommender systems have been functioning to provide the aforementioned. Employing the same systems in the medical domain could definitely prove to be useful in variety of ways. Following this context, the goal of this paper is to propose collaborative filtering based recommender system in the healthcare sector to recommend remedies based on the symptoms experienced by the patients. Furthermore, a new dataset is developed consisting of remedies concerning various diseases to address the limited availability of the data. The proposed recommender system accepts the prognostic markers of a patient as the input and generates the best remedy course. With several experimental trials, the proposed model achieved promising results in recommending the possible remedy for given prognostic markers.
△ Less
Submitted 19 July, 2021; v1 submitted 15 July, 2021;
originally announced July 2021.
-
A Linear Dynamical Perspective on Epidemiology: Interplay Between Early COVID-19 Outbreak and Human Mobility
Authors:
Shakib Mustavee,
Shaurya Agarwal,
Chinwendu Enyioha,
Suddhasattwa Das
Abstract:
This paper investigates the impact of human activity and mobility (HAM) in the spreading dynamics of an epidemic. Specifically, it explores the interconnections between HAM and its effect on the early spread of the COVID-19 virus. During the early stages of the pandemic, effective reproduction numbers exhibited a high correlation with human mobility patterns, leading to a hypothesis that the HAM s…
▽ More
This paper investigates the impact of human activity and mobility (HAM) in the spreading dynamics of an epidemic. Specifically, it explores the interconnections between HAM and its effect on the early spread of the COVID-19 virus. During the early stages of the pandemic, effective reproduction numbers exhibited a high correlation with human mobility patterns, leading to a hypothesis that the HAM system can be studied as a coupled system with disease spread dynamics. This study applies the generalized Koopman framework with control inputs to determine the nonlinear disease spread dynamics and the input-output characteristics as a locally linear controlled dynamical system. The approach solely relies on the snapshots of spatiotemporal data and does not require any knowledge of the system's physical laws. We exploit the Koopman operator framework by utilizing the Hankel Dynamic Mode Decomposition with Control (HDMDc) algorithm to obtain a linear disease spread model incorporating human mobility as a control input. The study demonstrated that the proposed methodology could capture the impact of local mobility on the early dynamics of the ongoing global pandemic. The obtained locally linear model can accurately forecast the number of new infections for various prediction windows ranging from two to four weeks. The study corroborates a leader-follower relationship between mobility and disease spread dynamics. In addition, the effect of delay embedding in the HDMDc algorithm is also investigated and reported. A case study was performed using COVID infection data from Florida, US, and HAM data extracted from Google community mobility data report.
△ Less
Submitted 4 August, 2021; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Exploring DMD-type Algorithms for Modeling Signalised Intersections
Authors:
Kazi Redwan Shabab,
Shakib Mustavee,
Shaurya Agarwal,
Mohamed H. Zaki,
Sajal Das
Abstract:
This paper explores a novel data-driven approach based on recent developments in Koopman operator theory and dynamic mode decomposition (DMD) for modeling signalized intersections. Vehicular flow and queue formation on signalized intersections have complex nonlinear dynamics, making system identification, modeling, and controller design tasks challenging. We employ a Koopman theoretic approach to…
▽ More
This paper explores a novel data-driven approach based on recent developments in Koopman operator theory and dynamic mode decomposition (DMD) for modeling signalized intersections. Vehicular flow and queue formation on signalized intersections have complex nonlinear dynamics, making system identification, modeling, and controller design tasks challenging. We employ a Koopman theoretic approach to transform the original nonlinear dynamics into locally linear infinite-dimensional dynamics. The data-driven approach relies entirely on spatio-temporal snapshots of the traffic data. We investigate several key aspects of the approach and provide insights into the usage of DMD-type algorithms for application in adaptive signalized intersections. To demonstrate the utility of the obtained linearized dynamics, we perform prediction of the queue lengths at the intersection; and compare the results with the state-of-the-art long short term memory (LSTM) method. The case study involves the morning peak vehicle movements and queue lengths at two Orlando area signalized intersections. It is observed that DMD-based algorithms are able to capture complex dynamics with a linear approximation to a reasonable extent.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Modality specific U-Net variants for biomedical image segmentation: A survey
Authors:
Narinder Singh Punn,
Sonali Agarwal
Abstract:
With the advent of advancements in deep learning approaches, such as deep convolution neural network, residual neural network, adversarial network; U-Net architectures are most widely utilized in biomedical image segmentation to address the automation in identification and detection of the target regions or sub-regions. In recent studies, U-Net based approaches have illustrated state-of-the-art pe…
▽ More
With the advent of advancements in deep learning approaches, such as deep convolution neural network, residual neural network, adversarial network; U-Net architectures are most widely utilized in biomedical image segmentation to address the automation in identification and detection of the target regions or sub-regions. In recent studies, U-Net based approaches have illustrated state-of-the-art performance in different applications for the development of computer-aided diagnosis systems for early diagnosis and treatment of diseases such as brain tumor, lung cancer, alzheimer, breast cancer, etc., using various modalities. This article contributes in presenting the success of these approaches by describing the U-Net framework, followed by the comprehensive analysis of the U-Net variants by performing 1) inter-modality, and 2) intra-modality categorization to establish better insights into the associated challenges and solutions. Besides, this article also highlights the contribution of U-Net based frameworks in the ongoing pandemic, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) also known as COVID-19. Finally, the strengths and similarities of these U-Net variants are analysed along with the challenges involved in biomedical image segmentation to uncover promising future research directions in this area.
△ Less
Submitted 27 January, 2022; v1 submitted 9 July, 2021;
originally announced July 2021.
-
Impossibility results for fair representations
Authors:
Tosca Lechner,
Shai Ben-David,
Sushant Agarwal,
Nivasini Ananthakrishnan
Abstract:
With the growing awareness to fairness in machine learning and the realization of the central role that data representation has in data processing tasks, there is an obvious interest in notions of fair data representations. The goal of such representations is that a model trained on data under the representation (e.g., a classifier) will be guaranteed to respect some fairness constraints.
Such r…
▽ More
With the growing awareness to fairness in machine learning and the realization of the central role that data representation has in data processing tasks, there is an obvious interest in notions of fair data representations. The goal of such representations is that a model trained on data under the representation (e.g., a classifier) will be guaranteed to respect some fairness constraints.
Such representations are useful when they can be fixed for training models on various different tasks and also when they serve as data filtering between the raw data (known to the representation designer) and potentially malicious agents that use the data under the representation to learn predictive models and make decisions.
A long list of recent research papers strive to provide tools for achieving these goals.
However, we prove that this is basically a futile effort. Roughly stated, we prove that no representation can guarantee the fairness of classifiers for different tasks trained using it; even the basic goal of achieving label-independent Demographic Parity fairness fails once the marginal data distribution shifts. More refined notions of fairness, like Odds Equality, cannot be guaranteed by a representation that does not take into account the task specific labeling rule with respect to which such fairness will be evaluated (even if the marginal data distribution is known a priory). Furthermore, except for trivial cases, no representation can guarantee Odds Equality fairness for any two different tasks, while allowing accurate label predictions for both.
While some of our conclusions are intuitive, we formulate (and prove) crisp statements of such impossibilities, often contrasting impressions conveyed by many recent works on fair representations.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
Hate speech detection using static BERT embeddings
Authors:
Gaurav Rajput,
Narinder Singh punn,
Sanjay Kumar Sonbhadra,
Sonali Agarwal
Abstract:
With increasing popularity of social media platforms hate speech is emerging as a major concern, where it expresses abusive speech that targets specific group characteristics, such as gender, religion or ethnicity to spread violence. Earlier people use to verbally deliver hate speeches but now with the expansion of technology, some people are deliberately using social media platforms to spread hat…
▽ More
With increasing popularity of social media platforms hate speech is emerging as a major concern, where it expresses abusive speech that targets specific group characteristics, such as gender, religion or ethnicity to spread violence. Earlier people use to verbally deliver hate speeches but now with the expansion of technology, some people are deliberately using social media platforms to spread hate by posting, sharing, commenting, etc. Whether it is Christchurch mosque shootings or hate crimes against Asians in west, it has been observed that the convicts are very much influenced from hate text present online. Even though AI systems are in place to flag such text but one of the key challenges is to reduce the false positive rate (marking non hate as hate), so that these systems can detect hate speech without undermining the freedom of expression. In this paper, we use ETHOS hate speech detection dataset and analyze the performance of hate speech detection classifier by replacing or integrating the word embeddings (fastText (FT), GloVe (GV) or FT + GV) with static BERT embeddings (BE). With the extensive experimental trails it is observed that the neural network performed better with static BE compared to using FT, GV or FT + GV as word embeddings. In comparison to fine-tuned BERT, one metric that significantly improved is specificity.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Entangled Photons Enabled Time- and Frequency-Resolved Coherent Raman Spectroscopy in Condensed Phase Molecules
Authors:
Zhedong Zhang,
Tao Peng,
Xiaoyu Nie,
Girish S. Agarwal,
Marlan O. Scully
Abstract:
We develop an ultrafast frequency-resolved Raman spectroscopy with entangled photons for polyatomic molecules in condensed phases, to probe the electronic and vibrational coherences. Using quantum correlation between the photons, the signal shows the capability of both temporal and spectral resolutions that are not accessible by either classical pulses or the fields without entanglement. We develo…
▽ More
We develop an ultrafast frequency-resolved Raman spectroscopy with entangled photons for polyatomic molecules in condensed phases, to probe the electronic and vibrational coherences. Using quantum correlation between the photons, the signal shows the capability of both temporal and spectral resolutions that are not accessible by either classical pulses or the fields without entanglement. We develop a microscopic theory for this Raman spectroscopy, revealing the electronic coherence dynamics which often shows a rapid decay within $\sim$50fs. The heterodyne-detected Raman signal is further developed to capture the phases of electronic coherence and emission in real-time domain.
△ Less
Submitted 22 June, 2021; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Automated triaging of head MRI examinations using convolutional neural networks
Authors:
David A. Wood,
Sina Kafiabadi,
Ayisha Al Busaidi,
Emily Guilhem,
Antanas Montvila,
Siddharth Agarwal,
Jeremy Lynch,
Matthew Townend,
Gareth Barker,
Sebastien Ourselin,
James H. Cole,
Thomas C. Booth
Abstract:
The growing demand for head magnetic resonance imaging (MRI) examinations, along with a global shortage of radiologists, has led to an increase in the time taken to report head MRI scans around the world. For many neurological conditions, this delay can result in increased morbidity and mortality. An automated triaging tool could reduce reporting times for abnormal examinations by identifying abno…
▽ More
The growing demand for head magnetic resonance imaging (MRI) examinations, along with a global shortage of radiologists, has led to an increase in the time taken to report head MRI scans around the world. For many neurological conditions, this delay can result in increased morbidity and mortality. An automated triaging tool could reduce reporting times for abnormal examinations by identifying abnormalities at the time of imaging and prioritizing the reporting of these scans. In this work, we present a convolutional neural network for detecting clinically-relevant abnormalities in $\text{T}_2$-weighted head MRI scans. Using a validated neuroradiology report classifier, we generated a labelled dataset of 43,754 scans from two large UK hospitals for model training, and demonstrate accurate classification (area under the receiver operating curve (AUC) = 0.943) on a test set of 800 scans labelled by a team of neuroradiologists. Importantly, when trained on scans from only a single hospital the model generalized to scans from the other hospital ($Δ$AUC $\leq$ 0.02). A simulation study demonstrated that our model would reduce the mean reporting time for abnormal examinations from 28 days to 14 days and from 9 days to 5 days at the two hospitals, demonstrating feasibility for use in a clinical triage environment.
△ Less
Submitted 28 June, 2022; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Force-Sensing Tensegrity for Investigating Physical Human-Robot Interaction in Compliant Robotic Systems
Authors:
Andrew R. Barkan,
Akhil Padmanabha,
Sala R. Tiemann,
Albert Lee,
Matthew P. Kanter,
Yash S. Agarwal,
Alice M. Agogino
Abstract:
Advancements in the domain of physical human-robot interaction (pHRI) have tremendously improved the ability of humans and robots to communicate, collaborate, and coexist. In particular, compliant robotic systems offer many characteristics that can be leveraged towards enabling physical interactions that more efficiently and intuitively communicate intent, making compliant systems potentially usef…
▽ More
Advancements in the domain of physical human-robot interaction (pHRI) have tremendously improved the ability of humans and robots to communicate, collaborate, and coexist. In particular, compliant robotic systems offer many characteristics that can be leveraged towards enabling physical interactions that more efficiently and intuitively communicate intent, making compliant systems potentially useful in more physically demanding subsets of human-robot collaborative scenarios. Tensegrity robots are an example of compliant systems that are well-suited to physical interactions while still retaining useful rigid properties that make them practical for a variety of applications. In this paper, we present the design and preliminary testing of a 6-bar spherical tensegrity with force-sensing capabilities. Using this prototype, we demonstrate the ability of its force-sensor array to detect a variety of physical interaction types that might arise in a human context. We then train and test a series of classifiers using data from unique and representative interactions in order to demonstrate the feasibility of using this physical modality of sensing to reliably communicate goals and intents from a human operator in a human-robot collaborative setting.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Deciphering capacitance frequency technique for performance limiting defect state parameters in energy harvesting perovskites
Authors:
Vikas Nandal,
Sumanshu Agarwal,
Pradeep R. Nair
Abstract:
With emerging thin film PIN based optoelectronics devices, a significant research thrust is focused on the passivation of trap states for performance enhancement. Among various methods, capacitance frequency technique (CFT) is often employed to quantify trap state parameters, however, the trapped charge induced electrostatic effect on the same is not yet established for such devices. Herein, we pr…
▽ More
With emerging thin film PIN based optoelectronics devices, a significant research thrust is focused on the passivation of trap states for performance enhancement. Among various methods, capacitance frequency technique (CFT) is often employed to quantify trap state parameters, however, the trapped charge induced electrostatic effect on the same is not yet established for such devices. Herein, we present a theoretical methodology to incorporate such effects in the CF characteristics of well-established carrier selective perovskite-based PIN devices. We show that the electrostatic effect of trapped charges leads to non-linear energy bands in perovskite layer which results in the underestimation of trap density from existing models of CFT. Consequently, a parabolic band approximation with effective length PBAEL model is developed which accurately predicts the trap density for shallow or deep states from CFT analysis. In addition, we demonstrate that the attempt to escape frequency, crucial for trapped charge dynamics with continuum energy bands, can be well extracted by eliminating non-linear effects at reduced perovskite thickness. We believe that our work provides a unified theoretical platform for CFT to extract trap state parameters for a broad class of organic and hybrid materials-based thin film devices for energy conversion applications such as solar cells, LEDs, etc.
△ Less
Submitted 14 June, 2021; v1 submitted 13 June, 2021;
originally announced June 2021.
-
Machine learning equipped web based disease prediction and recommender system
Authors:
Harish Rajora,
Narinder Singh Punn,
Sanjay Kumar Sonbhadra,
Sonali Agarwal
Abstract:
Worldwide, several cases go undiagnosed due to poor healthcare support in remote areas. In this context, a centralized system is needed for effective monitoring and analysis of the medical records. A web-based patient diagnostic system is a central platform to store the medical history and predict the possible disease based on the current symptoms experienced by a patient to ensure faster and accu…
▽ More
Worldwide, several cases go undiagnosed due to poor healthcare support in remote areas. In this context, a centralized system is needed for effective monitoring and analysis of the medical records. A web-based patient diagnostic system is a central platform to store the medical history and predict the possible disease based on the current symptoms experienced by a patient to ensure faster and accurate diagnosis. Early disease prediction can help the users determine the severity of the disease and take quick action. The proposed web-based disease prediction system utilizes machine learning based classification techniques on a data set acquired from the National Centre of Disease Control (NCDC). $K$-nearest neighbor (K-NN), random forest and naive bayes classification approaches are utilized and an ensemble voting algorithm is also proposed where each classifier is assigned weights dynamically based on the prediction confidence. The proposed system is also equipped with a recommendation scheme to recommend the type of tests based on the existing symptoms of the patient, so that necessary precautions can be taken. A centralized database ensures that the medical data is preserved and there is transparency in the system. The tampering into the system is prevented by giving the no "updation" rights once the diagnosis is created.
△ Less
Submitted 4 July, 2021; v1 submitted 5 June, 2021;
originally announced June 2021.
-
BERT-Based Sentiment Analysis: A Software Engineering Perspective
Authors:
Himanshu Batra,
Narinder Singh Punn,
Sanjay Kumar Sonbhadra,
Sonali Agarwal
Abstract:
Sentiment analysis can provide a suitable lead for the tools used in software engineering along with the API recommendation systems and relevant libraries to be used. In this context, the existing tools like SentiCR, SentiStrength-SE, etc. exhibited low f1-scores that completely defeats the purpose of deployment of such strategies, thereby there is enough scope for performance improvement. Recent…
▽ More
Sentiment analysis can provide a suitable lead for the tools used in software engineering along with the API recommendation systems and relevant libraries to be used. In this context, the existing tools like SentiCR, SentiStrength-SE, etc. exhibited low f1-scores that completely defeats the purpose of deployment of such strategies, thereby there is enough scope for performance improvement. Recent advancements show that transformer based pre-trained models (e.g., BERT, RoBERTa, ALBERT, etc.) have displayed better results in the text classification task. Following this context, the present research explores different BERT-based models to analyze the sentences in GitHub comments, Jira comments, and Stack Overflow posts. The paper presents three different strategies to analyse BERT based model for sentiment analysis, where in the first strategy the BERT based pre-trained models are fine-tuned; in the second strategy an ensemble model is developed from BERT variants, and in the third strategy a compressed model (Distil BERT) is used. The experimental results show that the BERT based ensemble approach and the compressed BERT model attain improvements by 6-12% over prevailing tools for the F1 measure on all three datasets.
△ Less
Submitted 2 July, 2021; v1 submitted 4 June, 2021;
originally announced June 2021.
-
Conserved quantities, exceptional points, and antilinear symmetries in non-Hermitian systems
Authors:
Frantisek Ruzicka,
Kaustubh S. Agarwal,
Yogesh N. Joglekar
Abstract:
Over the past two decades, open systems that are described by a non-Hermitian Hamiltonian have become a subject of intense research. These systems encompass classical wave systems with balanced gain and loss, semiclassical models with mode selective losses, and minimal quantum systems, and the meteoric research on them has mainly focused on the wide range of novel functionalities they demonstrate.…
▽ More
Over the past two decades, open systems that are described by a non-Hermitian Hamiltonian have become a subject of intense research. These systems encompass classical wave systems with balanced gain and loss, semiclassical models with mode selective losses, and minimal quantum systems, and the meteoric research on them has mainly focused on the wide range of novel functionalities they demonstrate. Here, we address the following questions: Does anything remain constant in the dynamics of such open systems? What are the consequences of such conserved quantities? Through spectral-decomposition method and explicit, recursive procedure, we obtain all conserved observables for general $\mathcal{PT}$-symmetric systems. We then generalize the analysis to Hamiltonians with other antilinear symmetries, and discuss the consequences of conservation laws for open systems. We illustrate our findings with several physically motivated examples.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Minimal Data Fidelity for Detection of Stellar Features or Companions
Authors:
Sahil Agarwal,
John S. Wettlaufer
Abstract:
Technological advances in instrumentation have led to an exponential increase in exoplanet detection and scrutiny of stellar features such as spots and faculae. While the spots and faculae enable us to understand the stellar dynamics, exoplanets provide us with a glimpse into stellar evolution. While the ubiquity of noise (e.g., telluric, instrumental, or photonic) is unavoidable, combining this w…
▽ More
Technological advances in instrumentation have led to an exponential increase in exoplanet detection and scrutiny of stellar features such as spots and faculae. While the spots and faculae enable us to understand the stellar dynamics, exoplanets provide us with a glimpse into stellar evolution. While the ubiquity of noise (e.g., telluric, instrumental, or photonic) is unavoidable, combining this with increased spectrographic resolution compounds technological challenges. To account for these noise sources and resolution issues, we use a temporal multifractal framework to study data from the SOAP 2.0 tool, which simulates a stellar spectrum in the presence of a spot, a facula or a planet. Given these controlled simulations, we vary the resolution as well as the signal-to-noise (S/N) ratio to obtain a lower limit on the resolution and S/N required to robustly detect features. We show that a spot and a facula with a 1% coverage of the stellar disk can be robustly detected for a S/N (per pixel) of 35 and 60 respectively, for any spectral resolution above 20,000, while a planet with a radial velocity (RV) of 10 m/s can be detected for a S/N (per pixel) of 600. Rather than viewing noise as an impediment, our approach uses noise as a source of information.
△ Less
Submitted 21 April, 2021;
originally announced April 2021.
-
Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems
Authors:
Anish Acharya,
Suranjit Adhikari,
Sanchit Agarwal,
Vincent Auvray,
Nehal Belgamwar,
Arijit Biswas,
Shubhra Chandra,
Tagyoung Chung,
Maryam Fazel-Zarandi,
Raefer Gabriel,
Shuyang Gao,
Rahul Goel,
Dilek Hakkani-Tur,
Jan Jezabek,
Abhay Jha,
Jiun-Yu Kao,
Prakash Krishnan,
Peter Ku,
Anuj Goyal,
Chien-Wei Lin,
Qing Liu,
Arindam Mandal,
Angeliki Metallinou,
Vishal Naik,
Yi Pan
, et al. (6 additional authors not shown)
Abstract:
Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and…
▽ More
Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and do not scale either. End-to-End dialogue systems, on the other hand, do not require module-specific annotations but need a large amount of data for training. To overcome these problems, in this demo, we present Alexa Conversations, a new approach for building goal-oriented dialogue systems that is scalable, extensible as well as data efficient. The components of this system are trained in a data-driven manner, but instead of collecting annotated conversations for training, we generate them using a novel dialogue simulator based on a few seed dialogues and specifications of APIs and entities provided by the developer. Our approach provides out-of-the-box support for natural conversational phenomena like entity sharing across turns or users changing their mind during conversation without requiring developers to provide any such dialogue flows. We exemplify our approach using a simple pizza ordering task and showcase its value in reducing the developer burden for creating a robust experience. Finally, we evaluate our system using a typical movie ticket booking task and show that the dialogue simulator is an essential component of the system that leads to over $50\%$ improvement in turn-level action signature prediction accuracy.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Flatland Competition 2020: MAPF and MARL for Efficient Train Coordination on a Grid World
Authors:
Florian Laurent,
Manuel Schneider,
Christian Scheller,
Jeremy Watson,
Jiaoyang Li,
Zhe Chen,
Yi Zheng,
Shao-Hung Chan,
Konstantin Makhnev,
Oleg Svidchenko,
Vladimir Egorov,
Dmitry Ivanov,
Aleksei Shpilman,
Evgenija Spirovska,
Oliver Tanevski,
Aleksandar Nikov,
Ramon Grunder,
David Galevski,
Jakov Mitrovski,
Guillaume Sartoretti,
Zhiyao Luo,
Mehul Damani,
Nilabha Bhattacharya,
Shivam Agarwal,
Adrian Egli
, et al. (2 additional authors not shown)
Abstract:
The Flatland competition aimed at finding novel approaches to solve the vehicle re-scheduling problem (VRSP). The VRSP is concerned with scheduling trips in traffic networks and the re-scheduling of vehicles when disruptions occur, for example the breakdown of a vehicle. While solving the VRSP in various settings has been an active area in operations research (OR) for decades, the ever-growing com…
▽ More
The Flatland competition aimed at finding novel approaches to solve the vehicle re-scheduling problem (VRSP). The VRSP is concerned with scheduling trips in traffic networks and the re-scheduling of vehicles when disruptions occur, for example the breakdown of a vehicle. While solving the VRSP in various settings has been an active area in operations research (OR) for decades, the ever-growing complexity of modern railway networks makes dynamic real-time scheduling of traffic virtually impossible. Recently, multi-agent reinforcement learning (MARL) has successfully tackled challenging tasks where many agents need to be coordinated, such as multiplayer video games. However, the coordination of hundreds of agents in a real-life setting like a railway network remains challenging and the Flatland environment used for the competition models these real-world properties in a simplified manner. Submissions had to bring as many trains (agents) to their target stations in as little time as possible. While the best submissions were in the OR category, participants found many promising MARL approaches. Using both centralized and decentralized learning based approaches, top submissions used graph representations of the environment to construct tree-based observations. Further, different coordination mechanisms were implemented, such as communication and prioritization between agents. This paper presents the competition setup, four outstanding solutions to the competition, and a cross-comparison between them.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
Ultralow threshold bistability and generation of long-lived mode in a dissipatively coupled nonlinear system: application to magnonics
Authors:
Jayakrishnan M. P. Nair,
Debsuvra Mukhopadhyay,
Girish S. Agarwal
Abstract:
The prospect of a system possessing two or more stable states for a given excitation condition is of topical interest with applications in information processing networks. In this work, we establish the remote transfer of bistability from a nonlinear resource in a dissipatively coupled two-mode system. As a clear advantage over coherently coupled settings, the dissipative nature of interaction is…
▽ More
The prospect of a system possessing two or more stable states for a given excitation condition is of topical interest with applications in information processing networks. In this work, we establish the remote transfer of bistability from a nonlinear resource in a dissipatively coupled two-mode system. As a clear advantage over coherently coupled settings, the dissipative nature of interaction is found to support a lower pum** threshold for bistable signals. For comparable parameters, the bistability threshold for dissipatively coupled systems is lower by a factor of about five. The resulting hysteresis can be studied spectroscopically by applying a probe field through the waveguide and examining the polariton character of the transmitted field. Our model is generic, apropos of an extensive set of quantum systems, and we demonstrate our results in the context of magnonics where experimental interest has flourished of late. As a consequence of dissipative coupling and the nonlinearity, a long-lived mode emerges, which is responsible for heightened transmission levels and pronounced sensitivity in signal propagation through the fiber.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Analysis of intensity correlation enhanced plasmonic structured illumination microscopy
Authors:
Anton Classen,
Xinghua Liu,
Aleksei M. Zheltikov,
Girish S. Agarwal
Abstract:
We propose to enhance the performance of localized plasmon structured illumination microscopy (LP-SIM) via intensity correlations. LP-SIM uses sub-wavelength illumination patterns to encode high spatial frequency information. It can enhance the resolution up to three-fold before gaps in the OTF support arise. For blinking fluorophores or for quantum antibunching an intensity correlation analysis i…
▽ More
We propose to enhance the performance of localized plasmon structured illumination microscopy (LP-SIM) via intensity correlations. LP-SIM uses sub-wavelength illumination patterns to encode high spatial frequency information. It can enhance the resolution up to three-fold before gaps in the OTF support arise. For blinking fluorophores or for quantum antibunching an intensity correlation analysis induces higher harmonics of the illumination pattern and enlarges the effective OTF. This enables ultrahigh resolutions without gaps in the OTF support, and thus a fully deterministic imaging scheme. We present simulations that include shot and external noise and demonstrate the resolution power under realistic photon budgets. The technique has potential in light microscopy where low-intensity illumination is paramount while aiming for high spatial but moderate temporal resolutions.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
A Systematic Review of Reproducibility Research in Natural Language Processing
Authors:
Anya Belz,
Shubham Agarwal,
Anastasia Shimorina,
Ehud Reiter
Abstract:
Against the background of what has been termed a reproducibility crisis in science, the NLP field is becoming increasingly interested in, and conscientious about, the reproducibility of its results. The past few years have seen an impressive range of new initiatives, events and active research in the area. However, the field is far from reaching a consensus about how reproducibility should be defi…
▽ More
Against the background of what has been termed a reproducibility crisis in science, the NLP field is becoming increasingly interested in, and conscientious about, the reproducibility of its results. The past few years have seen an impressive range of new initiatives, events and active research in the area. However, the field is far from reaching a consensus about how reproducibility should be defined, measured and addressed, with diversity of views currently increasing rather than converging. With this focused contribution, we aim to provide a wide-angle, and as near as possible complete, snapshot of current work on reproducibility in NLP, delineating differences and similarities, and providing pointers to common denominators.
△ Less
Submitted 21 March, 2021; v1 submitted 14 March, 2021;
originally announced March 2021.
-
$\mathcal{PT}$-symmetry breaking in a Kitaev chain with one pair of gain-loss potentials
Authors:
Kaustubh S. Agarwal,
Yogesh N. Joglekar
Abstract:
Parity-time ($\mathcal{PT}$) symmetric systems are classical, gain-loss systems whose dynamics are governed by non-Hermitian Hamiltonians with exceptional-point (EP) degeneracies. The eigenvalues of a $\mathcal{PT}$-symmetric Hamiltonian change from real to complex conjugates at a critical value of gain-loss strength that is called the $\mathcal{PT}$ breaking threshold. Here, we obtain the…
▽ More
Parity-time ($\mathcal{PT}$) symmetric systems are classical, gain-loss systems whose dynamics are governed by non-Hermitian Hamiltonians with exceptional-point (EP) degeneracies. The eigenvalues of a $\mathcal{PT}$-symmetric Hamiltonian change from real to complex conjugates at a critical value of gain-loss strength that is called the $\mathcal{PT}$ breaking threshold. Here, we obtain the $\mathcal{PT}$-threshold for a one-dimensional, finite Kitaev chain -- a prototype for a p-wave superconductor -- in the presence of a single pair of gain and loss potentials as a function of the superconducting order parameter, on-site potential, and the distance between the gain and loss sites. In addition to a robust, non-local threshold, we find a rich phase diagram for the threshold that can be qualitatively understood in terms of the band-structure of the Hermitian Kitaev mo del. In particular, for an even chain with zero on-site potential, we find a re-entrant $\mathcal{PT}$-symmetric phase bounded by second-order EP contours. Our numerical results are supplemented by analytical calculations for small system sizes.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Pufferfish: Communication-efficient Models At No Extra Cost
Authors:
Hongyi Wang,
Saurabh Agarwal,
Dimitris Papailiopoulos
Abstract:
To mitigate communication overheads in distributed model training, several studies propose the use of compressed stochastic gradients, usually achieved by sparsification or quantization. Such techniques achieve high compression ratios, but in many cases incur either significant computational overheads or some accuracy loss. In this work, we present Pufferfish, a communication and computation effic…
▽ More
To mitigate communication overheads in distributed model training, several studies propose the use of compressed stochastic gradients, usually achieved by sparsification or quantization. Such techniques achieve high compression ratios, but in many cases incur either significant computational overheads or some accuracy loss. In this work, we present Pufferfish, a communication and computation efficient distributed training framework that incorporates the gradient compression into the model training process via training low-rank, pre-factorized deep networks. Pufferfish not only reduces communication, but also completely bypasses any computation overheads related to compression, and achieves the same accuracy as state-of-the-art, off-the-shelf deep models. Pufferfish can be directly integrated into current deep learning frameworks with minimum implementation modification. Our extensive experiments over real distributed setups, across a variety of large-scale machine learning tasks, indicate that Pufferfish achieves up to 1.64x end-to-end speedup over the latest distributed training API in PyTorch without accuracy loss. Compared to the Lottery Ticket Hypothesis models, Pufferfish leads to equally accurate, small-parameter models while avoiding the burden of "winning the lottery". Pufferfish also leads to more accurate and smaller models than SOTA structured model pruning methods.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
EaZy Learning: An Adaptive Variant of Ensemble Learning for Fingerprint Liveness Detection
Authors:
Shivang Agarwal,
C. Ravindranath Chowdary,
Vivek Sourabh
Abstract:
In the field of biometrics, fingerprint recognition systems are vulnerable to presentation attacks made by artificially generated spoof fingerprints. Therefore, it is essential to perform liveness detection of a fingerprint before authenticating it. Fingerprint liveness detection mechanisms perform well under the within-dataset environment but fail miserably under cross-sensor (when tested on a fi…
▽ More
In the field of biometrics, fingerprint recognition systems are vulnerable to presentation attacks made by artificially generated spoof fingerprints. Therefore, it is essential to perform liveness detection of a fingerprint before authenticating it. Fingerprint liveness detection mechanisms perform well under the within-dataset environment but fail miserably under cross-sensor (when tested on a fingerprint acquired by a new sensor) and cross-dataset (when trained on one dataset and tested on another) settings. To enhance the generalization abilities, robustness and the interoperability of the fingerprint spoof detectors, the learning models need to be adaptive towards the data. We propose a generic model, EaZy learning which can be considered as an adaptive midway between eager and lazy learning. We show the usefulness of this adaptivity under cross-sensor and cross-dataset environments. EaZy learning examines the properties intrinsic to the dataset while generating a pool of hypotheses. EaZy learning is similar to ensemble learning as it generates an ensemble of base classifiers and integrates them to make a prediction. Still, it differs in the way it generates the base classifiers. EaZy learning develops an ensemble of entirely disjoint base classifiers which has a beneficial influence on the diversity of the underlying ensemble. Also, it integrates the predictions made by these base classifiers based on their performance on the validation data. Experiments conducted on the standard high dimensional datasets LivDet 2011, LivDet 2013 and LivDet 2015 prove the efficacy of the model under cross-dataset and cross-sensor environments.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
On the Utility of Gradient Compression in Distributed Training Systems
Authors:
Saurabh Agarwal,
Hongyi Wang,
Shivaram Venkataraman,
Dimitris Papailiopoulos
Abstract:
A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training. To alleviate these bottlenecks, a long line of recent work proposes gradient and model compression methods. In this work, we evaluate the efficacy of gradient compression methods and compare their scalability with optimized implementations of synchronous data-parallel SGD acr…
▽ More
A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training. To alleviate these bottlenecks, a long line of recent work proposes gradient and model compression methods. In this work, we evaluate the efficacy of gradient compression methods and compare their scalability with optimized implementations of synchronous data-parallel SGD across more than 200 different setups. Surprisingly, we observe that only in 6 cases out of more than 200, gradient compression methods provide speedup over optimized synchronous data-parallel training in the typical data-center setting. We conduct an extensive investigation to identify the root causes of this phenomenon, and offer a performance model that can be used to identify the benefits of gradient compression for a variety of system setups. Based on our analysis, we propose a list of desirable properties that gradient compression methods should satisfy, in order for them to provide a meaningful end-to-end speedup.
△ Less
Submitted 29 June, 2021; v1 submitted 28 February, 2021;
originally announced March 2021.
-
Learning Transferable Visual Models From Natural Language Supervision
Authors:
Alec Radford,
Jong Wook Kim,
Chris Hallacy,
Aditya Ramesh,
Gabriel Goh,
Sandhini Agarwal,
Girish Sastry,
Amanda Askell,
Pamela Mishkin,
Jack Clark,
Gretchen Krueger,
Ilya Sutskever
Abstract:
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstr…
▽ More
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.
△ Less
Submitted 26 February, 2021;
originally announced March 2021.
-
Towards the Unification and Robustness of Perturbation and Gradient Based Explanations
Authors:
Sushant Agarwal,
Shahin Jabbari,
Chirag Agarwal,
Sohini Upadhyay,
Zhiwei Steven Wu,
Himabindu Lakkaraju
Abstract:
As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on develo** techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbati…
▽ More
As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on develo** techniques for explaining these black boxes in a post hoc manner. In this work, we analyze two popular post hoc interpretation techniques: SmoothGrad which is a gradient based method, and a variant of LIME which is a perturbation based method. More specifically, we derive explicit closed form expressions for the explanations output by these two methods and show that they both converge to the same explanation in expectation, i.e., when the number of perturbed samples used by these methods is large. We then leverage this connection to establish other desirable properties, such as robustness, for these techniques. We also derive finite sample complexity bounds for the number of perturbations required for these methods to converge to their expected explanation. Finally, we empirically validate our theory using extensive experimentation on both synthetic and real world datasets.
△ Less
Submitted 19 July, 2021; v1 submitted 21 February, 2021;
originally announced February 2021.
-
End-to-end lyrics Recognition with Voice to Singing Style Transfer
Authors:
Sakya Basak,
Shrutina Agarwal,
Sriram Ganapathy,
Naoya Takahashi
Abstract:
Automatic transcription of monophonic/polyphonic music is a challenging task due to the lack of availability of large amounts of transcribed data. In this paper, we propose a data augmentation method that converts natural speech to singing voice based on vocoder based speech synthesizer. This approach, called voice to singing (V2S), performs the voice style conversion by modulating the F0 contour…
▽ More
Automatic transcription of monophonic/polyphonic music is a challenging task due to the lack of availability of large amounts of transcribed data. In this paper, we propose a data augmentation method that converts natural speech to singing voice based on vocoder based speech synthesizer. This approach, called voice to singing (V2S), performs the voice style conversion by modulating the F0 contour of the natural speech with that of a singing voice. The V2S model based style transfer can generate good quality singing voice thereby enabling the conversion of large corpora of natural speech to singing voice that is useful in building an E2E lyrics transcription system. In our experiments on monophonic singing voice data, the V2S style transfer provides a significant gain (relative improvements of 21%) for the E2E lyrics transcription system. We also discuss additional components like transfer learning and lyrics based language modeling to improve the performance of the lyrics transcription system.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Federated Evaluation and Tuning for On-Device Personalization: System Design & Applications
Authors:
Matthias Paulik,
Matt Seigel,
Henry Mason,
Dominic Telaar,
Joris Kluivers,
Rogier van Dalen,
Chi Wai Lau,
Luke Carlson,
Filip Granqvist,
Chris Vandevelde,
Sudeep Agarwal,
Julien Freudiger,
Andrew Byde,
Abhishek Bhowmick,
Gaurav Kapoor,
Si Beaumont,
Áine Cahill,
Dominic Hughes,
Omid Javidbakht,
Fei Dong,
Rehan Rishi,
Stanley Hung
Abstract:
We describe the design of our federated task processing system. Originally, the system was created to support two specific federated tasks: evaluation and tuning of on-device ML systems, primarily for the purpose of personalizing these systems. In recent years, support for an additional federated task has been added: federated learning (FL) of deep neural networks. To our knowledge, only one other…
▽ More
We describe the design of our federated task processing system. Originally, the system was created to support two specific federated tasks: evaluation and tuning of on-device ML systems, primarily for the purpose of personalizing these systems. In recent years, support for an additional federated task has been added: federated learning (FL) of deep neural networks. To our knowledge, only one other system has been described in literature that supports FL at scale. We include comparisons to that system to help discuss design decisions and attached trade-offs. Finally, we describe two specific large scale personalization use cases in detail to showcase the applicability of federated tuning to on-device personalization and to highlight application specific solutions.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Predicting the Characteristics of Defect Transitions on Curved Surfaces
Authors:
Siddhansh Agarwal,
Sascha Hilgenfeldt
Abstract:
The energetically optimal position of lattice defects on intrinsically curved surfaces is a complex function of shape parameters. For open surfaces, a simple condition predicts the critical size for which a central disclination yields lower energy than a boundary disclination. In practice, this transition is modified by activation energies or more favorable intermediate defect positions. Here it i…
▽ More
The energetically optimal position of lattice defects on intrinsically curved surfaces is a complex function of shape parameters. For open surfaces, a simple condition predicts the critical size for which a central disclination yields lower energy than a boundary disclination. In practice, this transition is modified by activation energies or more favorable intermediate defect positions. Here it is shown that these transition characteristics (continuous or discontinuous, first or second order) can also be inferred from analytical, general criteria evaluated from the surface shape. A universal scale of activation energy is found, and the criterion is generalized to predict transition order as symmetries such as that of the shape are broken. The results give practical insight into structural transitions to disorder in many cellular materials of technological and biological importance.
△ Less
Submitted 14 February, 2021;
originally announced February 2021.
-
Roadmap on Integrated Quantum Photonics
Authors:
Galan Moody,
Volker J. Sorger,
Daniel J. Blumenthal,
Paul W. Juodawlkis,
William Loh,
Cheryl Sorace-Agaskar,
Alex E. Jones,
Krishna C. Balram,
Jonathan C. F. Matthews,
Anthony Laing,
Marcelo Davanco,
Lin Chang,
John E. Bowers,
Niels Quack,
Christophe Galland,
Igor Aharonovich,
Martin A. Wolff,
Carsten Schuck,
Neil Sinclair,
Marko Lončar,
Tin Komljenovic,
David Weld,
Shayan Mookherjea,
Sonia Buckley,
Marina Radulaski
, et al. (30 additional authors not shown)
Abstract:
Integrated photonics is at the heart of many classical technologies, from optical communications to biosensors, LIDAR, and data center fiber interconnects. There is strong evidence that these integrated technologies will play a key role in quantum systems as they grow from few-qubit prototypes to tens of thousands of qubits. The underlying laser and optical quantum technologies, with the required…
▽ More
Integrated photonics is at the heart of many classical technologies, from optical communications to biosensors, LIDAR, and data center fiber interconnects. There is strong evidence that these integrated technologies will play a key role in quantum systems as they grow from few-qubit prototypes to tens of thousands of qubits. The underlying laser and optical quantum technologies, with the required functionality and performance, can only be realized through the integration of these components onto quantum photonic integrated circuits (QPICs) with accompanying electronics. In the last decade, remarkable advances in quantum photonic integration and a dramatic reduction in optical losses have enabled benchtop experiments to be scaled down to prototype chips with improvements in efficiency, robustness, and key performance metrics. The reduction in size, weight, power, and improvement in stability that will be enabled by QPICs will play a key role in increasing the degree of complexity and scale in quantum demonstrations. In the next decade, with sustained research, development, and investment in the quantum photonic ecosystem (i.e. PIC-based platforms, devices and circuits, fabrication and integration processes, packaging, and testing and benchmarking), we will witness the transition from single- and few-function prototypes to the large-scale integration of multi-functional and reconfigurable QPICs that will define how information is processed, stored, transmitted, and utilized for quantum computing, communications, metrology, and sensing. This roadmap highlights the current progress in the field of integrated quantum photonics, future challenges, and advances in science and technology needed to meet these challenges.
△ Less
Submitted 22 September, 2021; v1 submitted 5 February, 2021;
originally announced February 2021.
-
A Credibility Approach on Fuzzy Slacks Based Measure (SBM) DEA Model
Authors:
Deepak Mahla,
Shivi Agarwal
Abstract:
Data Envelopment Analysis (DEA) is a multi-criteria technique based on linear programming to deal with many real-life problems, mostly in nonprofit organizations. The slacks-based measure (SBM) model is one of the DEA model used to assess the relative efficiencies of decision-making units (DMUs). The SBM DEA model directly used input slacks and output slacks to determine the relative efficiency of…
▽ More
Data Envelopment Analysis (DEA) is a multi-criteria technique based on linear programming to deal with many real-life problems, mostly in nonprofit organizations. The slacks-based measure (SBM) model is one of the DEA model used to assess the relative efficiencies of decision-making units (DMUs). The SBM DEA model directly used input slacks and output slacks to determine the relative efficiency of DMUs. In order to deal with qualitative or uncertain data, a fuzzy SBM DEA model is used to assess the performance of DMUs in this study. The credibility measure approach, transform the fuzzy SBM DEA model into a crisp linear programming model at different credibility levels is used. The results came from the fuzzy DEA model are more rational to the real-world situation than the conventional DEA model. In the end, the data of Indian oil refineries is collected, and the efficiency behavior of the companies obtained by applying the proposed model for its numerical illustration.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning
Authors:
Yuhan Liu,
Saurabh Agarwal,
Shivaram Venkataraman
Abstract:
With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine tuning models which were pre-trained on a large corpus of data. However, our experiments show that even fine-tuning on models like BERT can take many hours even when using modern accelerators like GPUs. While prior work proposes limiting the number of layers that are fine-tuned, e.g., freezing all la…
▽ More
With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine tuning models which were pre-trained on a large corpus of data. However, our experiments show that even fine-tuning on models like BERT can take many hours even when using modern accelerators like GPUs. While prior work proposes limiting the number of layers that are fine-tuned, e.g., freezing all layers but the last layer, we find that such static approaches lead to reduced accuracy. We propose, AutoFreeze, a system that uses an adaptive approach to choose which layers are trained and show how this can accelerate model fine-tuning while preserving accuracy. We also develop mechanisms to enable efficient caching of intermediate activations which can reduce the forward computation time when performing fine-tuning. We extend AutoFreeze to perform distributed fine-tuning and design two execution modes that minimize cost and running time respectively. Our evaluation on ten NLP tasks shows that AutoFreeze, with caching enabled, can improve fine-tuning on a single GPU by up to 2.55x. On a 64 GPU cluster, for fine-tuning on the AG's news dataset, AutoFreeze is able to achieve up to 4.38x speedup when optimizing for end-to-end training time and 5.03x reduction in total cost when optimizing for efficiency, without affecting model accuracy.
△ Less
Submitted 3 April, 2021; v1 submitted 2 February, 2021;
originally announced February 2021.
-
S++: A Fast and Deployable Secure-Computation Framework for Privacy-Preserving Neural Network Training
Authors:
Prashanthi Ramachandran,
Shivam Agarwal,
Arup Mondal,
Aastha Shah,
Debayan Gupta
Abstract:
We introduce S++, a simple, robust, and deployable framework for training a neural network (NN) using private data from multiple sources, using secret-shared secure function evaluation. In short, consider a virtual third party to whom every data-holder sends their inputs, and which computes the neural network: in our case, this virtual third party is actually a set of servers which individually le…
▽ More
We introduce S++, a simple, robust, and deployable framework for training a neural network (NN) using private data from multiple sources, using secret-shared secure function evaluation. In short, consider a virtual third party to whom every data-holder sends their inputs, and which computes the neural network: in our case, this virtual third party is actually a set of servers which individually learn nothing, even with a malicious (but non-colluding) adversary.
Previous work in this area has been limited to just one specific activation function: ReLU, rendering the approach impractical for many use-cases. For the first time, we provide fast and verifiable protocols for all common activation functions and optimize them for running in a secret-shared manner. The ability to quickly, verifiably, and robustly compute exponentiation, softmax, sigmoid, etc., allows us to use previously written NNs without modification, vastly reducing developer effort and complexity of code. In recent times, ReLU has been found to converge much faster and be more computationally efficient as compared to non-linear functions like sigmoid or tanh. However, we argue that it would be remiss not to extend the mechanism to non-linear functions such as the logistic sigmoid, tanh, and softmax that are fundamental due to their ability to express outputs as probabilities and their universal approximation property. Their contribution in RNNs and a few recent advancements also makes them more relevant.
△ Less
Submitted 28 January, 2021;
originally announced January 2021.