-
Investigating the Robustness of LLMs on Math Word Problems
Authors:
Ujjwala Anantheswaran,
Himanshu Gupta,
Kevin Scaria,
Shreyas Verma,
Chitta Baral,
Swaroop Mishra
Abstract:
Large Language Models (LLMs) excel at various tasks, including solving math word problems (MWPs), but struggle with real-world problems containing irrelevant information. To address this, we propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables. We introduce a dataset, ProbleMATHIC, containing both adversarial and non-adversarial MWPs. Our experim…
▽ More
Large Language Models (LLMs) excel at various tasks, including solving math word problems (MWPs), but struggle with real-world problems containing irrelevant information. To address this, we propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables. We introduce a dataset, ProbleMATHIC, containing both adversarial and non-adversarial MWPs. Our experiments reveal that LLMs are susceptible to distraction by numerical noise, resulting in an average relative performance drop of ~26% on adversarial MWPs. To mitigate this, we fine-tune LLMs (Llama-2, Mistral) on the adversarial samples from our dataset. Fine-tuning on adversarial training instances improves performance on adversarial MWPs by ~8%, indicating increased robustness to noise and better ability to identify relevant data for reasoning. Finally, to assess the generalizability of our prompting framework, we introduce GSM-8K-Adv, an adversarial variant of the GSM-8K benchmark. LLMs continue to struggle when faced with adversarial information, reducing performance by up to ~6%.
△ Less
Submitted 30 May, 2024;
originally announced June 2024.
-
NAC-QFL: Noise Aware Clustered Quantum Federated Learning
Authors:
Himanshu Sahu,
Hari Prabhat Gupta
Abstract:
Recent advancements in quantum computing, alongside successful deployments of quantum communication, hold promises for revolutionizing mobile networks. While Quantum Machine Learning (QML) presents opportunities, it contends with challenges like noise in quantum devices and scalability. Furthermore, the high cost of quantum communication constrains the practical application of QML in real-world sc…
▽ More
Recent advancements in quantum computing, alongside successful deployments of quantum communication, hold promises for revolutionizing mobile networks. While Quantum Machine Learning (QML) presents opportunities, it contends with challenges like noise in quantum devices and scalability. Furthermore, the high cost of quantum communication constrains the practical application of QML in real-world scenarios. This paper introduces a noise-aware clustered quantum federated learning system that addresses noise mitigation, limited quantum device capacity, and high quantum communication costs in distributed QML. It employs noise modelling and clustering to select devices with minimal noise and distribute QML tasks efficiently. Using circuit partitioning to deploy smaller models on low-noise devices and aggregating similar devices, the system enhances distributed QML performance and reduces communication costs. Leveraging circuit cutting, QML techniques are more effective for smaller circuit sizes and fidelity. We conduct experimental evaluations to assess the performance of the proposed system. Additionally, we introduce a noisy dataset for QML to demonstrate the impact of noise on proposed accuracy.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers
Authors:
Harshit Gupta,
Manav Chaudhary,
Tathagata Raha,
Shivansh Subramanian,
Vasudeva Varma
Abstract:
This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propo…
▽ More
This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained language models, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
BrainStorm @ iREL at SMM4H 2024: Leveraging Translation and Topical Embeddings for Annotation Detection in Tweets
Authors:
Manav Chaudhary,
Harshit Gupta,
Vasudeva Varma
Abstract:
The proliferation of LLMs in various NLP tasks has sparked debates regarding their reliability, particularly in annotation tasks where biases and hallucinations may arise. In this shared task, we address the challenge of distinguishing annotations made by LLMs from those made by human domain experts in the context of COVID-19 symptom detection from tweets in Latin American Spanish. This paper pres…
▽ More
The proliferation of LLMs in various NLP tasks has sparked debates regarding their reliability, particularly in annotation tasks where biases and hallucinations may arise. In this shared task, we address the challenge of distinguishing annotations made by LLMs from those made by human domain experts in the context of COVID-19 symptom detection from tweets in Latin American Spanish. This paper presents BrainStorm @ iREL's approach to the SMM4H 2024 Shared Task, leveraging the inherent topical information in tweets, we propose a novel approach to identify and classify annotations, aiming to enhance the trustworthiness of annotated data.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Optimized Generation of Entanglement by Real-Time Ordering of Swap** Operations
Authors:
Ranjani G Sundaram,
Himanshu Gupta
Abstract:
Long-distance quantum communication in quantum networks faces significant challenges due to the constraints imposed by the no-cloning theorem. Most existing quantum communication protocols rely on the a priori distribution of entanglement pairs (EPs), a process known to incur considerable latency due to its stochastic nature. In this work, we consider the problem of minimizing the latency of estab…
▽ More
Long-distance quantum communication in quantum networks faces significant challenges due to the constraints imposed by the no-cloning theorem. Most existing quantum communication protocols rely on the a priori distribution of entanglement pairs (EPs), a process known to incur considerable latency due to its stochastic nature. In this work, we consider the problem of minimizing the latency of establishing an EP across a pair of nodes in a quantum network. While prior research has primarily focused on minimizing the expected generation latency by selecting {\em static} entanglement routes and/or swap** trees in advance, our approach considers a real-time adaptive strategy -- wherein the order of entanglement-swap** operations (hence, the swap** tree used) is progressively determined at runtime based on the runtime success/failure of the stochastic events. In this context, we present a greedy algorithm that iteratively determines the best route and/or entanglement-swap** operation to perform at each stage based on the current network. We evaluate our schemes on randomly generated networks and observe a reduction in latency of up to 40% from the optimal offline approach.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Distributed Quantum Computation with Minimum Circuit Execution Time over Quantum Networks
Authors:
Ranjani G Sundaram,
Himanshu Gupta,
C. R. Ramakrishnan
Abstract:
Present quantum computers are constrained by limited qubit capacity and restricted physical connectivity, leading to challenges in large-scale quantum computations. Distributing quantum computations across a network of quantum computers is a promising way to circumvent these challenges and facilitate large quantum computations. However, distributed quantum computations require entanglements (to ex…
▽ More
Present quantum computers are constrained by limited qubit capacity and restricted physical connectivity, leading to challenges in large-scale quantum computations. Distributing quantum computations across a network of quantum computers is a promising way to circumvent these challenges and facilitate large quantum computations. However, distributed quantum computations require entanglements (to execute remote gates) which can incur significant generation latency and, thus, lead to decoherence of qubits. In this work, we consider the problem of distributing quantum circuits across a quantum network to minimize the execution time. The problem entails map** the circuit qubits to network memories, including within each computer since limited connectivity within computers can affect the circuit execution time. We provide two-step solutions for the above problem: In the first step, we allocate qubits to memories to minimize the estimated execution time; for this step, we design an efficient algorithm based on an approximation algorithm for the max-quadratic-assignment problem. In the second step, we determine an efficient execution scheme, including generating required entanglements with minimum latency under the network resource and decoherence constraints; for this step, we develop two algorithms with appropriate performance guarantees under certain settings or assumptions. We consider multiple protocols for executing remote gates, viz., telegates and cat-entanglements. With extensive simulations over NetSquid, a quantum network simulator, we demonstrate the effectiveness of our developed techniques and show that they outperform a scheme based on prior work by up to 95%.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Sufficient conditions for total positivity, compounds, and Dodgson condensation
Authors:
Shaun Fallat,
Himanshu Gupta,
Charles R. Johnson
Abstract:
A $n$-by-$n$ matrix is called totally positive ($TP$) if all its minors are positive and $TP_k$ if all of its $k$-by-$k$ submatrices are $TP$. For an arbitrary totally positive matrix or $TP_k$ matrix, we investigate if the $r$th compound ($1<r<n$) is in turn $TP$ or $TP_k$, and demonstrate a strong negative resolution in general. Focus is then shifted to Dodgson's algorithm for calculating the de…
▽ More
A $n$-by-$n$ matrix is called totally positive ($TP$) if all its minors are positive and $TP_k$ if all of its $k$-by-$k$ submatrices are $TP$. For an arbitrary totally positive matrix or $TP_k$ matrix, we investigate if the $r$th compound ($1<r<n$) is in turn $TP$ or $TP_k$, and demonstrate a strong negative resolution in general. Focus is then shifted to Dodgson's algorithm for calculating the determinant of a generic matrix, and we analyze whether the associated condensed matrices are possibly totally positive or $TP_k$. We also show that all condensed matrices associated with a $TP$ Hankel matrix are $TP$.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Optimized Distribution of Entanglement Graph States in Quantum Networks
Authors:
Xiaojie Fan,
Caitao Zhan,
Himanshu Gupta,
C. R. Ramakrishnan
Abstract:
Building large-scale quantum computers, essential to demonstrating quantum advantage, is a key challenge. Quantum Networks (QNs) can help address this challenge by enabling the construction of large, robust, and more capable quantum computing platforms by connecting smaller quantum computers. Moreover, unlike classical systems, QNs can enable fully secured long-distance communication. Thus, quantu…
▽ More
Building large-scale quantum computers, essential to demonstrating quantum advantage, is a key challenge. Quantum Networks (QNs) can help address this challenge by enabling the construction of large, robust, and more capable quantum computing platforms by connecting smaller quantum computers. Moreover, unlike classical systems, QNs can enable fully secured long-distance communication. Thus, quantum networks lie at the heart of the success of future quantum information technologies. In quantum networks, multipartite entangled states distributed over the network help implement and support many quantum network applications for communications, sensing, and computing. Our work focuses on develo** optimal techniques to generate and distribute multipartite entanglement states efficiently. Prior works on generating general multipartite entanglement states have focused on the objective of minimizing the number of maximally entangled pairs (EPs) while ignoring the heterogeneity of the network nodes and links as well as the stochastic nature of underlying processes. In this work, we develop a hypergraph based linear programming framework that delivers optimal (under certain assumptions) generation schemes for general multipartite entanglement represented by graph states, under the network resources, decoherence, and fidelity constraints, while considering the stochasticity of the underlying processes. We illustrate our technique by develo** generation schemes for the special cases of path and tree graph states, and discuss optimized generation schemes for more general classes of graph states. Using extensive simulations over a quantum network simulator (NetSquid), we demonstrate the effectiveness of our developed techniques and show that they outperform prior known schemes by up to orders of magnitude.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Analysis of Distributed Optimization Algorithms on a Real Processing-In-Memory System
Authors:
Steve Rhyner,
Haocong Luo,
Juan Gómez-Luna,
Mohammad Sadrosadati,
Jiawei Jiang,
Ataberk Olgun,
Harshita Gupta,
Ce Zhang,
Onur Mutlu
Abstract:
Machine Learning (ML) training on large-scale datasets is a very expensive and time-consuming workload. Processor-centric architectures (e.g., CPU, GPU) commonly used for modern ML training workloads are limited by the data movement bottleneck, i.e., due to repeatedly accessing the training dataset. As a result, processor-centric systems suffer from performance degradation and high energy consumpt…
▽ More
Machine Learning (ML) training on large-scale datasets is a very expensive and time-consuming workload. Processor-centric architectures (e.g., CPU, GPU) commonly used for modern ML training workloads are limited by the data movement bottleneck, i.e., due to repeatedly accessing the training dataset. As a result, processor-centric systems suffer from performance degradation and high energy consumption. Processing-In-Memory (PIM) is a promising solution to alleviate the data movement bottleneck by placing the computation mechanisms inside or near memory.
Our goal is to understand the capabilities and characteristics of popular distributed optimization algorithms on real-world PIM architectures to accelerate data-intensive ML training workloads. To this end, we 1) implement several representative centralized distributed optimization algorithms on UPMEM's real-world general-purpose PIM system, 2) rigorously evaluate these algorithms for ML training on large-scale datasets in terms of performance, accuracy, and scalability, 3) compare to conventional CPU and GPU baselines, and 4) discuss implications for future PIM hardware and the need to shift to an algorithm-hardware codesign perspective to accommodate decentralized distributed optimization algorithms.
Our results demonstrate three major findings: 1) Modern general-purpose PIM architectures can be a viable alternative to state-of-the-art CPUs and GPUs for many memory-bound ML training workloads, when operations and datatypes are natively supported by PIM hardware, 2) the importance of carefully choosing the optimization algorithm that best fit PIM, and 3) contrary to popular belief, contemporary PIM architectures do not scale approximately linearly with the number of nodes for many data-intensive ML training workloads. To facilitate future research, we aim to open-source our complete codebase.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Positivity preservers over finite fields
Authors:
Dominique Guillot,
Himanshu Gupta,
Prateek Kumar Vishwakarma
Abstract:
We resolve an algebraic version of Schoenberg's celebrated theorem [Duke Math. J., 1942] characterizing entrywise matrix transforms that preserve positive definiteness. Compared to the classical real and complex settings, we consider matrices with entries in a finite field and obtain a complete characterization of such preservers for matrices of a fixed dimension. When the dimension of the matrice…
▽ More
We resolve an algebraic version of Schoenberg's celebrated theorem [Duke Math. J., 1942] characterizing entrywise matrix transforms that preserve positive definiteness. Compared to the classical real and complex settings, we consider matrices with entries in a finite field and obtain a complete characterization of such preservers for matrices of a fixed dimension. When the dimension of the matrices is at least $3$, we prove that, surprisingly, the positivity preservers are precisely the positive multiples of the field's automorphisms. Our work makes crucial use of the well-known character-sum bound due to Weil, and of a result of Carlitz [Proc. Amer. Math. Soc., 1960] that provides a characterization of the automorphisms of Paley graphs.
△ Less
Submitted 25 April, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
Transient Waiting Time Distributions in Small Call Centres with Skills-Based Routing
Authors:
Mark Fackrell,
Hritika Gupta,
Peter G. Taylor
Abstract:
Many call centres are subject to service level agreements that stipulate that they must achieve targets in terms of the proportion of calls that are answered within a specified time. In order to manage a centre so that targets like these are met, we need to have a method of calculating the waiting time distributions experienced by customers. In this paper, we provide such a method for small call c…
▽ More
Many call centres are subject to service level agreements that stipulate that they must achieve targets in terms of the proportion of calls that are answered within a specified time. In order to manage a centre so that targets like these are met, we need to have a method of calculating the waiting time distributions experienced by customers. In this paper, we provide such a method for small call centres that employ skills-based routing. We first build the methodology for the single-skill case and then extend it to a multi-skill case. We model the call centre system as a continuous-time Markov chain and then make use of the Laplace transform to calculate the relevant quantities. We later demonstrate the use of this method to find the optimal routing policy in a certain class of policies.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
A Study on Stock Forecasting Using Deep Learning and Statistical Models
Authors:
Himanshu Gupta,
Aditya Jaiswal
Abstract:
Predicting a fast and accurate model for stock price forecasting is been a challenging task and this is an active area of research where it is yet to be found which is the best way to forecast the stock price. Machine learning, deep learning and statistical analysis techniques are used here to get the accurate result so the investors can see the future trend and maximize the return of investment i…
▽ More
Predicting a fast and accurate model for stock price forecasting is been a challenging task and this is an active area of research where it is yet to be found which is the best way to forecast the stock price. Machine learning, deep learning and statistical analysis techniques are used here to get the accurate result so the investors can see the future trend and maximize the return of investment in stock trading. This paper will review many deep learning algorithms for stock price forecasting. We use a record of s&p 500 index data for training and testing. The survey motive is to check various deep learning and statistical model techniques for stock price forecasting that are Moving Averages, ARIMA which are statistical techniques and LSTM, RNN, CNN, and FULL CNN which are deep learning models. It will discuss various models, including the Auto regression integration moving average model, the Recurrent neural network model, the long short-term model which is the type of RNN used for long dependency for data, the convolutional neural network model, and the full convolutional neural network model, in terms of error calculation or percentage of accuracy that how much it is accurate which measures by the function like Root mean square error, mean absolute error, mean squared error. The model can be used to predict the stock price by checking the low MAE value as lower the MAE value the difference between the predicting and the actual value will be less and this model will predict the price more accurately than other models.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Face Detection: Present State and Research Directions
Authors:
Purnendu Prabhat,
Himanshu Gupta,
Ajeet Kumar Vishwakarma
Abstract:
The majority of computer vision applications that handle images featuring humans use face detection as a core component. Face detection still has issues, despite much research on the topic. Face detection's accuracy and speed might yet be increased. This review paper shows the progress made in this area as well as the substantial issues that still need to be tackled. The paper provides research di…
▽ More
The majority of computer vision applications that handle images featuring humans use face detection as a core component. Face detection still has issues, despite much research on the topic. Face detection's accuracy and speed might yet be increased. This review paper shows the progress made in this area as well as the substantial issues that still need to be tackled. The paper provides research directions that can be taken up as research projects in the field of face detection.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Towards Interpretable Physical-Conceptual Catchment-Scale Hydrological Modeling using the Mass-Conserving-Perceptron
Authors:
Yuan-Heng Wang,
Hoshin V. Gupta
Abstract:
We investigate the applicability of machine learning technologies to the development of parsimonious, interpretable, catchment-scale hydrologic models using directed-graph architectures based on the mass-conserving perceptron (MCP) as the fundamental computational unit. Here, we focus on architectural complexity (depth) at a single location, rather than universal applicability (breadth) across lar…
▽ More
We investigate the applicability of machine learning technologies to the development of parsimonious, interpretable, catchment-scale hydrologic models using directed-graph architectures based on the mass-conserving perceptron (MCP) as the fundamental computational unit. Here, we focus on architectural complexity (depth) at a single location, rather than universal applicability (breadth) across large samples of catchments. The goal is to discover a minimal representation (numbers of cell-states and flow paths) that represents the dominant processes that can explain the input-state-output behaviors of a given catchment, with particular emphasis given to simulating the full range (high, medium, and low) of flow dynamics. We find that a HyMod-like architecture with three cell-states and two major flow pathways achieves such a representation at our study location, but that the additional incorporation of an input-bypass mechanism significantly improves the timing and shape of the hydrograph, while the inclusion of bi-directional groundwater mass exchanges significantly enhances the simulation of baseflow. Overall, our results demonstrate the importance of using multiple diagnostic metrics for model evaluation, while highlighting the need for designing training metrics that are better suited to extracting information across the full range of flow dynamics. Further, they set the stage for interpretable regional-scale MCP-based hydrological modeling (using large sample data) by using neural architecture search to determine appropriate minimal representations for catchments in different hydroclimatic regimes.
△ Less
Submitted 22 May, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Position Paper: Bridging the Gap Between Machine Learning and Sensitivity Analysis
Authors:
Christian A. Scholbeck,
Julia Moosbauer,
Giuseppe Casalicchio,
Hoshin Gupta,
Bernd Bischl,
Christian Heumann
Abstract:
We argue that interpretations of machine learning (ML) models or the model-building process can bee seen as a form of sensitivity analysis (SA), a general methodology used to explain complex systems in many fields such as environmental modeling, engineering, or economics. We address both researchers and practitioners, calling attention to the benefits of a unified SA-based view of explanations in…
▽ More
We argue that interpretations of machine learning (ML) models or the model-building process can bee seen as a form of sensitivity analysis (SA), a general methodology used to explain complex systems in many fields such as environmental modeling, engineering, or economics. We address both researchers and practitioners, calling attention to the benefits of a unified SA-based view of explanations in ML and the necessity to fully credit related work. We bridge the gap between both fields by formally describing how (a) the ML process is a system suitable for SA, (b) how existing ML interpretation methods relate to this perspective, and (c) how other SA techniques could be applied to ML.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Minimising Numbers of Losses and Abandonments in Small Call Centres Under a Transient Regime
Authors:
Hritika Gupta,
Peter G. Taylor
Abstract:
In this paper, we show how to calculate transient performance measures in models for small call centres that employ skills-based routing. In particular, we calculate the expected number of customer losses and call abandonments in a fixed time. We use the results to compare how call allocation policies can minimise the expected numbers of losses and abandonments, and make recommendations about whic…
▽ More
In this paper, we show how to calculate transient performance measures in models for small call centres that employ skills-based routing. In particular, we calculate the expected number of customer losses and call abandonments in a fixed time. We use the results to compare how call allocation policies can minimise the expected numbers of losses and abandonments, and make recommendations about which policies should be employed.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks
Authors:
Mihir Parmar,
Aakanksha Naik,
Himanshu Gupta,
Disha Agrawal,
Chitta Baral
Abstract:
Many large language models (LLMs) for medicine have largely been evaluated on short texts, and their ability to handle longer sequences such as a complete electronic health record (EHR) has not been systematically explored. Assessing these models on long sequences is crucial since prior work in the general domain has demonstrated performance degradation of LLMs on longer texts. Motivated by this,…
▽ More
Many large language models (LLMs) for medicine have largely been evaluated on short texts, and their ability to handle longer sequences such as a complete electronic health record (EHR) has not been systematically explored. Assessing these models on long sequences is crucial since prior work in the general domain has demonstrated performance degradation of LLMs on longer texts. Motivated by this, we introduce LongBoX, a collection of seven medical datasets in text-to-text format, designed to investigate model performance on long sequences. Preliminary experiments reveal that both medical LLMs (e.g., BioGPT) and strong general domain LLMs (e.g., FLAN-T5) struggle on this benchmark. We further evaluate two techniques designed for long-sequence handling: (i) local-global attention, and (ii) Fusion-in-Decoder (FiD). Our results demonstrate mixed results with long-sequence handling - while scores on some datasets increase, there is substantial room for improvement. We hope that LongBoX facilitates the development of more effective long-sequence techniques for the medical domain. Data and source code are available at https://github.com/Mihir3009/LongBoX.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Optimizing Electric Vehicle Efficiency with Real-Time Telemetry using Machine Learning
Authors:
Aryaman Rao,
Harshit Gupta,
Parth Singh,
Shivam Mittal,
Utkrash Singh,
Dinesh Kumar Vishwakarma
Abstract:
In the contemporary world with degrading natural resources, the urgency of energy efficiency has become imperative due to the conservation and environmental safeguarding. Therefore, it's crucial to look for advanced technology to minimize energy consumption. This research focuses on the optimization of battery-electric city style vehicles through the use of a real-time in-car telemetry system that…
▽ More
In the contemporary world with degrading natural resources, the urgency of energy efficiency has become imperative due to the conservation and environmental safeguarding. Therefore, it's crucial to look for advanced technology to minimize energy consumption. This research focuses on the optimization of battery-electric city style vehicles through the use of a real-time in-car telemetry system that communicates between components through the robust Controller Area Network (CAN) protocol. By harnessing real-time data from various sensors embedded within vehicles, our driving assistance system provides the driver with visual and haptic actionable feedback that guides the driver on using the optimum driving style to minimize power consumed by the vehicle. To develop the pace feedback mechanism for the driver, real-time data is collected through a Shell Eco Marathon Urban Concept vehicle platform and after pre-processing, it is analyzed using the novel machine learning algorithm TEMSL, that outperforms the existing baseline approaches across various performance metrics. This innovative method after numerous experimentation has proven effective in enhancing energy efficiency, guiding the driver along the track, and reducing human errors. The driving-assistance system offers a range of utilities, from cost savings and extended vehicle lifespan to significant contributions to environmental conservation and sustainable driving practices.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Reconfigurable Inspection in Manufacturing: State of the Art and Taxonomy
Authors:
Harshit Gupta,
Ashok Kumar Madan
Abstract:
This article provides an overview of the evolution of the product quality and measurement inspection procedure with emphasis on the Reconfigurable Inspection System and Machine. The major components of a reconfigurable manufacturing system have been examined, and the evolution of manufacturing processes has been briefly discussed. Different Reconfigurable Inspection Machines (RIMs) and their arran…
▽ More
This article provides an overview of the evolution of the product quality and measurement inspection procedure with emphasis on the Reconfigurable Inspection System and Machine. The major components of a reconfigurable manufacturing system have been examined, and the evolution of manufacturing processes has been briefly discussed. Different Reconfigurable Inspection Machines (RIMs) and their arrangement in an assembly line as an inspection system have been carefully studied and the modern inspection system equipped in RMS has been compared to the traditional techniques commonly used in inspection of product quality. A survey of evolving inspection techniques is offered from the standpoint of technological challenges and advancement affecting manufacturing over time. As per authors' knowledge, the review on Reconfigurable Inspection in Manufacturing and taxonomy of reconfigurable inspection systems is rare. Considering the studies done in this domain, there is still resourceful taxonomy for this paradigm. Therefore, different types of inspection procedures have been discussed, their features and applications have been compared to arrive at the taxonomy of the RIS based on the understanding of the nature of a RIS after a critical review.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Application of Response Surface Method and Genetic Algorithm in the Design of High-Efficiency Prototype Vehicle
Authors:
Paras Singh,
Harshit Gupta,
Ojas Vinayak,
Aryan Tyagi
Abstract:
Breakthroughs in aerodynamic optimization have made it possible to develop efficient modes of transport with lesser exploitation of valuable resources. This makes it crucial for technical professionals such as engineers and scientists to understand the methodologies behind carrying out such optimizations. A common approach towards improving the aerodynamic properties of a vehicle is to alter its p…
▽ More
Breakthroughs in aerodynamic optimization have made it possible to develop efficient modes of transport with lesser exploitation of valuable resources. This makes it crucial for technical professionals such as engineers and scientists to understand the methodologies behind carrying out such optimizations. A common approach towards improving the aerodynamic properties of a vehicle is to alter its physical shape, which has concurrently been a very strenuous process given the time consumed to remodel the vehicle for each simulation process. This research aims to tackle this problem by using intelligent techniques to automate the step-by-step process of remodeling the car and arriving at a final optimized solution with a significantly lower drag coefficient, a quantity used to measure the amount of drag force acting on a vehicle. This is achieved by assigning particular parameters to ensure guided improvement of the airfoil in a process known as parametrization, followed by implementing a response surface methodology primarily to circumvent the strenuous task of performing a large number of CFD simulations by employing surrogate models to generate a response surface between selected independent variables. Further, evolutionary algorithms such as Genetic Algorithm have gained momentum in the optimization studies carried out during product design by selecting the optimum parameters from the available design spaces on the basis of natural evolution. The proposed method of optimization has been successfully implemented on a prototype vehicle with an improvement of 26.6% and 51.1% in the drag coefficient and drag area respectively.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
TarGEN: Targeted Data Generation with Large Language Models
Authors:
Himanshu Gupta,
Kevin Scaria,
Ujjwala Anantheswaran,
Shreyas Verma,
Mihir Parmar,
Saurabh Arjun Sawant,
Chitta Baral,
Swaroop Mishra
Abstract:
The rapid advancement of large language models (LLMs) has sparked interest in data synthesis techniques, aiming to generate diverse and high-quality synthetic datasets. However, these synthetic datasets often suffer from a lack of diversity and added noise. In this paper, we present TarGEN, a multi-step prompting strategy for generating high-quality synthetic datasets utilizing a LLM. An advantage…
▽ More
The rapid advancement of large language models (LLMs) has sparked interest in data synthesis techniques, aiming to generate diverse and high-quality synthetic datasets. However, these synthetic datasets often suffer from a lack of diversity and added noise. In this paper, we present TarGEN, a multi-step prompting strategy for generating high-quality synthetic datasets utilizing a LLM. An advantage of TarGEN is its seedless nature; it does not require specific task instances, broadening its applicability beyond task replication. We augment TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances during dataset creation, ensuring reliable labels. To assess our technique's effectiveness, we emulate 8 tasks from the SuperGLUE benchmark and finetune various language models, including encoder-only, encoder-decoder, and decoder-only models on both synthetic and original training sets. Evaluation on the original test set reveals that models trained on datasets generated by TarGEN perform approximately 1-2% points better than those trained on original datasets (82.84% via syn. vs. 81.12% on og. using Flan-T5). When incorporating instruction tuning, the performance increases to 84.54% on synthetic data vs. 81.49% on original data by Flan-T5. A comprehensive analysis of the synthetic dataset compared to the original dataset reveals that the synthetic dataset demonstrates similar or higher levels of dataset complexity and diversity. Furthermore, the synthetic dataset displays a bias level that aligns closely with the original dataset. Finally, when pre-finetuned on our synthetic SuperGLUE dataset, T5-3B yields impressive results on the OpenLLM leaderboard, surpassing the model trained on the Self-Instruct dataset by 4.14% points. We hope that TarGEN can be helpful for quality data generation and reducing the human efforts to create complex benchmarks.
△ Less
Submitted 30 October, 2023; v1 submitted 26 October, 2023;
originally announced October 2023.
-
A Mass-Conserving-Perceptron for Machine Learning-Based Modeling of Geoscientific Systems
Authors:
Yuan-Heng Wang,
Hoshin V. Gupta
Abstract:
Although decades of effort have been devoted to building Physical-Conceptual (PC) models for predicting the time-series evolution of geoscientific systems, recent work shows that Machine Learning (ML) based Gated Recurrent Neural Network technology can be used to develop models that are much more accurate. However, the difficulty of extracting physical understanding from ML-based models complicate…
▽ More
Although decades of effort have been devoted to building Physical-Conceptual (PC) models for predicting the time-series evolution of geoscientific systems, recent work shows that Machine Learning (ML) based Gated Recurrent Neural Network technology can be used to develop models that are much more accurate. However, the difficulty of extracting physical understanding from ML-based models complicates their utility for enhancing scientific knowledge regarding system structure and function. Here, we propose a physically-interpretable Mass Conserving Perceptron (MCP) as a way to bridge the gap between PC-based and ML-based modeling approaches. The MCP exploits the inherent isomorphism between the directed graph structures underlying both PC models and GRNNs to explicitly represent the mass-conserving nature of physical processes while enabling the functional nature of such processes to be directly learned (in an interpretable manner) from available data using off-the-shelf ML technology. As a proof of concept, we investigate the functional expressivity (capacity) of the MCP, explore its ability to parsimoniously represent the rainfall-runoff (RR) dynamics of the Leaf River Basin, and demonstrate its utility for scientific hypothesis testing. To conclude, we discuss extensions of the concept to enable ML-based physical-conceptual representation of the coupled nature of mass-energy-information flows through geoscientific systems.
△ Less
Submitted 12 May, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Bound states in the continuum and long-range coupling of polaritons in hexagonal boron nitride nanoresonators
Authors:
Harsh Gupta,
Giacomo Venturi,
Tatiana Contino,
Eli Janzen,
James H. Edgar,
Francesco de Angelis,
Andrea Toma,
Antonio Ambrosio,
Michele Tamagnone
Abstract:
Bound states in the continuum (BICs) garnered significant for their potential to create new types of nanophotonic devices. Most prior demonstrations were based on arrays of dielectric resonators, which cannot be miniaturized beyond the diffraction limit, reducing the applicability of BICs for advanced functions. Here, we demonstrate BICs and quasi-BICs based on high-quality factor phonon-polariton…
▽ More
Bound states in the continuum (BICs) garnered significant for their potential to create new types of nanophotonic devices. Most prior demonstrations were based on arrays of dielectric resonators, which cannot be miniaturized beyond the diffraction limit, reducing the applicability of BICs for advanced functions. Here, we demonstrate BICs and quasi-BICs based on high-quality factor phonon-polariton resonances in isotopically pure h11BN and how these states can be supported by periodic arrays of nanoresonators with sizes much smaller than the wavelength. We theoretically illustrate how BICs emerge from the band structure of the arrays and verify both numerically and experimentally the presence of these states and enhanced quality factor. Furthermore, we identify and characterize simultaneously quasi-BICs and bright states. Our method can be generalized to create a large number of optical states and to tune their coupling with the environment, paving the way to miniaturized nanophotonic devices with more advanced functions.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Automated Assessment of Critical View of Safety in Laparoscopic Cholecystectomy
Authors:
Yunfan Li,
Himanshu Gupta,
Haibin Ling,
IV Ramakrishnan,
Prateek Prasanna,
Georgios Georgakis,
Aaron Sasson
Abstract:
Cholecystectomy (gallbladder removal) is one of the most common procedures in the US, with more than 1.2M procedures annually. Compared with classical open cholecystectomy, laparoscopic cholecystectomy (LC) is associated with significantly shorter recovery period, and hence is the preferred method. However, LC is also associated with an increase in bile duct injuries (BDIs), resulting in significa…
▽ More
Cholecystectomy (gallbladder removal) is one of the most common procedures in the US, with more than 1.2M procedures annually. Compared with classical open cholecystectomy, laparoscopic cholecystectomy (LC) is associated with significantly shorter recovery period, and hence is the preferred method. However, LC is also associated with an increase in bile duct injuries (BDIs), resulting in significant morbidity and mortality. The primary cause of BDIs from LCs is misidentification of the cystic duct with the bile duct. Critical view of safety (CVS) is the most effective of safety protocols, which is said to be achieved during the surgery if certain criteria are met. However, due to suboptimal understanding and implementation of CVS, the BDI rates have remained stable over the last three decades. In this paper, we develop deep-learning techniques to automate the assessment of CVS in LCs. An innovative aspect of our research is on develo** specialized learning techniques by incorporating domain knowledge to compensate for the limited training data available in practice. In particular, our CVS assessment process involves a fusion of two segmentation maps followed by an estimation of a certain region of interest based on anatomical structures close to the gallbladder, and then finally determination of each of the three CVS criteria via rule-based assessment of structural information. We achieved a gain of over 11.8% in mIoU on relevant classes with our two-stream semantic segmentation approach when compared to a single-model baseline, and 1.84% in mIoU with our proposed Sobel loss function when compared to a Transformer-based baseline model. For CVS criteria, we achieved up to 16% improvement and, for the overall CVS assessment, we achieved 5% improvement in balanced accuracy compared to DeepCVS under the same experiment settings.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Evaluating Homomorphic Operations on a Real-World Processing-In-Memory System
Authors:
Harshita Gupta,
Mayank Kabra,
Juan Gómez-Luna,
Konstantinos Kanellopoulos,
Onur Mutlu
Abstract:
Computing on encrypted data is a promising approach to reduce data security and privacy risks, with homomorphic encryption serving as a facilitator in achieving this goal. In this work, we accelerate homomorphic operations using the Processing-in- Memory (PIM) paradigm to mitigate the large memory capacity and frequent data movement requirements. Using a real-world PIM system, we accelerate the Br…
▽ More
Computing on encrypted data is a promising approach to reduce data security and privacy risks, with homomorphic encryption serving as a facilitator in achieving this goal. In this work, we accelerate homomorphic operations using the Processing-in- Memory (PIM) paradigm to mitigate the large memory capacity and frequent data movement requirements. Using a real-world PIM system, we accelerate the Brakerski-Fan-Vercauteren (BFV) scheme for homomorphic addition and multiplication. We evaluate the PIM implementations of these homomorphic operations with statistical workloads (arithmetic mean, variance, linear regression) and compare to CPU and GPU implementations. Our results demonstrate 50-100x speedup with a real PIM system (UPMEM) over the CPU and 2-15x over the GPU in vector addition. For vector multiplication, the real PIM system outperforms the CPU by 40-50x. However, it lags 10-15x behind the GPU due to the lack of native sufficiently wide multiplication support in the evaluated first-generation real PIM system. For mean, variance, and linear regression, the real PIM system performance improvements vary between 30x and 300x over the CPU and between 10x and 30x over the GPU, uncovering real PIM system trade-offs in terms of scalability of homomorphic operations for varying amounts of data. We plan to make our implementation open-source in the future.
△ Less
Submitted 3 October, 2023; v1 submitted 12 September, 2023;
originally announced September 2023.
-
A Dataset of Inertial Measurement Units for Handwritten English Alphabets
Authors:
Hari Prabhat Gupta,
Rahul Mishra
Abstract:
This paper presents an end-to-end methodology for collecting datasets to recognize handwritten English alphabets by utilizing Inertial Measurement Units (IMUs) and leveraging the diversity present in the Indian writing style. The IMUs are utilized to capture the dynamic movement patterns associated with handwriting, enabling more accurate recognition of alphabets. The Indian context introduces var…
▽ More
This paper presents an end-to-end methodology for collecting datasets to recognize handwritten English alphabets by utilizing Inertial Measurement Units (IMUs) and leveraging the diversity present in the Indian writing style. The IMUs are utilized to capture the dynamic movement patterns associated with handwriting, enabling more accurate recognition of alphabets. The Indian context introduces various challenges due to the heterogeneity in writing styles across different regions and languages. By leveraging this diversity, the collected dataset and the collection system aim to achieve higher recognition accuracy. Some preliminary experimental results demonstrate the effectiveness of the dataset in accurately recognizing handwritten English alphabet in the Indian context. This research can be extended and contributes to the field of pattern recognition and offers valuable insights for develo** improved systems for handwriting recognition, particularly in diverse linguistic and cultural contexts.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Utility-Aware Load Shedding for Real-time Video Analytics at the Edge
Authors:
Enrique Saurez,
Harshit Gupta,
Henriette Roger,
Sukanya Bhowmik,
Umakishore Ramachandran,
Kurt Rothermel
Abstract:
Real-time video analytics typically require video frames to be processed by a query to identify objects or activities of interest while adhering to an end-to-end frame processing latency constraint. Such applications impose a continuous and heavy load on backend compute and network infrastructure because of the need to stream and process all video frames. Video data has inherent redundancy and doe…
▽ More
Real-time video analytics typically require video frames to be processed by a query to identify objects or activities of interest while adhering to an end-to-end frame processing latency constraint. Such applications impose a continuous and heavy load on backend compute and network infrastructure because of the need to stream and process all video frames. Video data has inherent redundancy and does not always contain an object of interest for a given query. We leverage this property of video streams to propose a lightweight Load Shedder that can be deployed on edge servers or on inexpensive edge devices co-located with cameras and drop uninteresting video frames. The proposed Load Shedder uses pixel-level color-based features to calculate a utility score for each ingress video frame, which represents the frame's utility toward the query at hand. The Load Shedder uses a minimum utility threshold to select interesting frames to send for query processing. Drop** unnecessary frames enables the video analytics query in the backend to meet the end-to-end latency constraint with fewer compute and network resources. To guarantee a bounded end-to-end latency at runtime, we introduce a control loop that monitors the backend load for the given query and dynamically adjusts the utility threshold. Performance evaluations show that the proposed Load Shedder selects a large portion of frames containing each object of interest while meeting the end-to-end frame processing latency constraint. Furthermore, the Load Shedder does not impose a significant latency overhead when running on edge devices with modest compute resources.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Optimizing Initial State of Detector Sensors in Quantum Sensor Networks
Authors:
Caitao Zhan,
Himanshu Gupta,
Mark Hillery
Abstract:
In this paper, we consider a network of quantum sensors, where each sensor is a qubit detector that "fires," i.e., its state changes when an event occurs close by. The change in state due to the firing of a detector is given by a unitary operator which is the same for all sensors in the network. Such a network of detectors can be used to localize an event, using a protocol to determine the firing…
▽ More
In this paper, we consider a network of quantum sensors, where each sensor is a qubit detector that "fires," i.e., its state changes when an event occurs close by. The change in state due to the firing of a detector is given by a unitary operator which is the same for all sensors in the network. Such a network of detectors can be used to localize an event, using a protocol to determine the firing sensor which is presumably the one closest to the event. The determination of the firing sensor can be posed as a Quantum State Discrimination problem which incurs a probability of error depending on the initial state and the measurement operator used.
In this paper, we address the problem of determining the optimal initial global state of a network of detectors that incur a minimum probability of error in determining the firing sensor. For this problem, we derive necessary and sufficient conditions for the existence of an initial state that allows for perfect discrimination, i.e., zero probability of error. Using insights from this result, we derive a conjectured optimal solution for the initial state, provide a pathway to prove the conjecture, and validate the conjecture empirically using multiple search heuristics that seem to perform near-optimally.
△ Less
Submitted 7 March, 2024; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Neural models for Factual Inconsistency Classification with Explanations
Authors:
Tathagata Raha,
Mukund Choudhary,
Abhinav Menon,
Harshit Gupta,
KV Aditya Srivatsa,
Manish Gupta,
Vasudeva Varma
Abstract:
Factual consistency is one of the most important requirements when editing high quality documents. It is extremely important for automatic text generation systems like summarization, question answering, dialog modeling, and language modeling. Still, automated factual inconsistency detection is rather under-studied. Existing work has focused on (a) finding fake news kee** a knowledge base in cont…
▽ More
Factual consistency is one of the most important requirements when editing high quality documents. It is extremely important for automatic text generation systems like summarization, question answering, dialog modeling, and language modeling. Still, automated factual inconsistency detection is rather under-studied. Existing work has focused on (a) finding fake news kee** a knowledge base in context, or (b) detecting broad contradiction (as part of natural language inference literature). However, there has been no work on detecting and explaining types of factual inconsistencies in text, without any knowledge base in context. In this paper, we leverage existing work in linguistics to formally define five types of factual inconsistencies. Based on this categorization, we contribute a novel dataset, FICLE (Factual Inconsistency CLassification with Explanation), with ~8K samples where each sample consists of two sentences (claim and context) annotated with type and span of inconsistency. When the inconsistency relates to an entity type, it is labeled as well at two levels (coarse and fine-grained). Further, we leverage this dataset to train a pipeline of four neural models to predict inconsistency type with explanations, given a (claim, context) sentence pair. Explanations include inconsistent claim fact triple, inconsistent context span, inconsistent claim component, coarse and fine-grained inconsistent entity types. The proposed system first predicts inconsistent spans from claim and context; and then uses them to predict inconsistency types and inconsistent entity types (when inconsistency is due to entities). We experiment with multiple Transformer-based natural language classification as well as generative models, and find that DeBERTa performs the best. Our proposed methods provide a weighted F1 of ~87% for inconsistency type classification across the five classes.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Instruction Tuned Models are Quick Learners
Authors:
Himanshu Gupta,
Saurabh Arjun Sawant,
Swaroop Mishra,
Mutsumi Nakamura,
Arindam Mitra,
Santosh Mashetty,
Chitta Baral
Abstract:
Instruction tuning of language models has demonstrated the ability to enhance model generalization to unseen tasks via in-context learning using a few examples. However, typical supervised learning still requires a plethora of downstream training data for finetuning. Often in real-world situations, there is a scarcity of data available for finetuning, falling somewhere between few shot inference a…
▽ More
Instruction tuning of language models has demonstrated the ability to enhance model generalization to unseen tasks via in-context learning using a few examples. However, typical supervised learning still requires a plethora of downstream training data for finetuning. Often in real-world situations, there is a scarcity of data available for finetuning, falling somewhere between few shot inference and fully supervised finetuning. In this work, we demonstrate the sample efficiency of instruction tuned models over various tasks by estimating the minimal downstream training data required by them to perform transfer learning and match the performance of state-of-the-art (SOTA) supervised models. We conduct experiments on 119 tasks from Super Natural Instructions (SuperNI) in both the single task learning (STL) and multi task learning (MTL) settings. Our findings reveal that, in the STL setting, instruction tuned models equipped with 25% of the downstream train data surpass the SOTA performance on the downstream tasks. In the MTL setting, an instruction tuned model trained on only 6% of downstream training data achieve SOTA, while using 100% of the training data results in a 3.69% points improvement (ROUGE-L 74.68) over the previous SOTA. We conduct an analysis on T5 vs Tk-Instruct by develo** several baselines to demonstrate that instruction tuning aids in increasing both sample efficiency and transfer learning. Additionally, we observe a consistent ~4% performance increase in both settings when pre-finetuning is performed with instructions. Finally, we conduct a categorical study and find that contrary to previous results, tasks in the question rewriting and title generation categories suffer from instruction tuning.
△ Less
Submitted 17 May, 2023;
originally announced June 2023.
-
Resource Aware Clustering for Tackling the Heterogeneity of Participants in Federated Learning
Authors:
Rahul Mishra,
Hari Prabhat Gupta,
Garvit Banga
Abstract:
Federated Learning is a training framework that enables multiple participants to collaboratively train a shared model while preserving data privacy and minimizing communication overhead. The heterogeneity of devices and networking resources of the participants delay the training and aggregation in federated learning. This paper proposes a federated learning approach to manoeuvre the heterogeneity…
▽ More
Federated Learning is a training framework that enables multiple participants to collaboratively train a shared model while preserving data privacy and minimizing communication overhead. The heterogeneity of devices and networking resources of the participants delay the training and aggregation in federated learning. This paper proposes a federated learning approach to manoeuvre the heterogeneity among the participants using resource aware clustering. The approach begins with the server gathering information about the devices and networking resources of participants, after which resource aware clustering is performed to determine the optimal number of clusters using Dunn Indices. The mechanism of participant assignment is then introduced, and the expression of communication rounds required for model convergence in each cluster is mathematically derived. Furthermore, a master-slave technique is introduced to improve the performance of the lightweight models in the clusters using knowledge distillation. Finally, experimental evaluations are conducted to verify the feasibility and effectiveness of the approach and to compare it with state-of-the-art techniques.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Distributing Quantum Circuits Using Teleportations
Authors:
Ranjani G Sundaram,
Himanshu Gupta
Abstract:
Scalability is currently one of the most sought-after objectives in the field of quantum computing. Distributing a quantum circuit across a quantum network is one way to facilitate large computations using current quantum computers. In this paper, we consider the problem of distributing a quantum circuit across a network of heterogeneous quantum computers, while minimizing the number of teleportat…
▽ More
Scalability is currently one of the most sought-after objectives in the field of quantum computing. Distributing a quantum circuit across a quantum network is one way to facilitate large computations using current quantum computers. In this paper, we consider the problem of distributing a quantum circuit across a network of heterogeneous quantum computers, while minimizing the number of teleportations (the communication cost) needed to implement gates spanning multiple computers. We design two algorithms for this problem. The first, called Local- Best, initially distributes the qubits across the network, then tries to teleport qubits only when necessary, with teleportations being influenced by gates in the near future. The second, called Zero- Stitching, divides the given circuit into sub-circuits such that each sub-circuit can be executed using zero teleportations and the teleportation cost incurred at the borders of the sub-circuits is minimal. We evaluate our algorithms over a wide range of randomly-generated circuits as well as known benchmarks, and compare their performance to prior work. We observe that our techniques outperform the prior approach by a significant margin (up to 50%).
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
EDM3: Event Detection as Multi-task Text Generation
Authors:
Ujjwala Anantheswaran,
Himanshu Gupta,
Mihir Parmar,
Kuntal Kumar Pal,
Chitta Baral
Abstract:
Event detection refers to identifying event occurrences in a text and comprises of two subtasks; event identification and classification. We present EDM3, a novel approach for Event Detection that formulates three generative tasks: identification, classification, and combined detection. We show that EDM3 helps to learn transferable knowledge that can be leveraged to perform Event Detection and its…
▽ More
Event detection refers to identifying event occurrences in a text and comprises of two subtasks; event identification and classification. We present EDM3, a novel approach for Event Detection that formulates three generative tasks: identification, classification, and combined detection. We show that EDM3 helps to learn transferable knowledge that can be leveraged to perform Event Detection and its subtasks concurrently, mitigating the error propagation inherent in pipelined approaches. Unlike previous dataset- or domain-specific approaches, EDM3 utilizes the existing knowledge of language models, allowing it to be trained over any classification schema. We evaluate EDM3 on multiple event detection datasets: RAMS, WikiEvents, MAVEN, and MLEE, showing that EDM3 outperforms 1) single-task performance by 8.4% on average and 2) multi-task performance without instructional prompts by 2.4% on average. We obtain SOTA results on RAMS (71.3% vs. 65.1% F-1) and competitive performance on other datasets. We analyze our approach to demonstrate its efficacy in low-resource and multi-sentence settings. We also show the effectiveness of this approach on non-standard event configurations such as multi-word and multi-class event triggers. Overall, our results show that EDM3 is a promising approach for Event Detection that has the potential for real-world applications.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
A Unified Evaluation Framework for Novelty Detection and Accommodation in NLP with an Instantiation in Authorship Attribution
Authors:
Neeraj Varshney,
Himanshu Gupta,
Eric Robertson,
Bing Liu,
Chitta Baral
Abstract:
State-of-the-art natural language processing models have been shown to achieve remarkable performance in 'closed-world' settings where all the labels in the evaluation set are known at training time. However, in real-world settings, 'novel' instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research…
▽ More
State-of-the-art natural language processing models have been shown to achieve remarkable performance in 'closed-world' settings where all the labels in the evaluation set are known at training time. However, in real-world settings, 'novel' instances that do not belong to any known class are often observed. This renders the ability to deal with novelties crucial. To initiate a systematic research in this important area of 'dealing with novelties', we introduce 'NoveltyTask', a multi-stage task to evaluate a system's performance on pipelined novelty 'detection' and 'accommodation' tasks. We provide mathematical formulation of NoveltyTask and instantiate it with the authorship attribution task that pertains to identifying the correct author of a given text. We use Amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask. We conduct comprehensive experiments and explore several baseline methods for the task. Our results show that the methods achieve considerably low performance making the task challenging and leaving sufficient room for improvement. Finally, we believe our work will encourage research in this underexplored area of dealing with novelties, an important step en route to develo** robust systems.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Towards Realistic Generative 3D Face Models
Authors:
Aashish Rai,
Hiresh Gupta,
Ayush Pandey,
Francisco Vicente Carrasco,
Shingo Jason Takagi,
Amaury Aubel,
Daeil Kim,
Aayush Prakash,
Fernando de la Torre
Abstract:
In recent years, there has been significant progress in 2D generative face models fueled by applications such as animation, synthetic data generation, and digital avatars. However, due to the absence of 3D information, these 2D models often struggle to accurately disentangle facial attributes like pose, expression, and illumination, limiting their editing capabilities. To address this limitation,…
▽ More
In recent years, there has been significant progress in 2D generative face models fueled by applications such as animation, synthetic data generation, and digital avatars. However, due to the absence of 3D information, these 2D models often struggle to accurately disentangle facial attributes like pose, expression, and illumination, limiting their editing capabilities. To address this limitation, this paper proposes a 3D controllable generative face model to produce high-quality albedo and precise 3D shape leveraging existing 2D generative models. By combining 2D face generative models with semantic face manipulation, this method enables editing of detailed 3D rendered faces. The proposed framework utilizes an alternating descent optimization approach over shape and albedo. Differentiable rendering is used to train high-quality shapes and albedo without 3D supervision. Moreover, this approach outperforms the state-of-the-art (SOTA) methods in the well-known NoW benchmark for shape reconstruction. It also outperforms the SOTA reconstruction models in recovering rendered faces' identities across novel poses by an average of 10%. Additionally, the paper demonstrates direct control of expressions in 3D faces by exploiting latent space leading to text-based editing of 3D faces.
△ Less
Submitted 26 October, 2023; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Fog Computing& IoT: Overview, Architecture and Applications
Authors:
Harshit Gupta,
Dr. Ajay Kumar Bharti
Abstract:
Fog computing is an emerging technology in the field of network services where data transfer from one device to another to perform some kind of activity. Fog computing is an extended concept of cloud computing. It works in-between the Internet of Things (IoT) and cloud data centers and reduces the communication gaps. Fog computing has made possible to have decreased latency and low network congest…
▽ More
Fog computing is an emerging technology in the field of network services where data transfer from one device to another to perform some kind of activity. Fog computing is an extended concept of cloud computing. It works in-between the Internet of Things (IoT) and cloud data centers and reduces the communication gaps. Fog computing has made possible to have decreased latency and low network congestion. Fog computing is an on-going research trend in which the possibility of efficient network services exist. Fog computing can be described as a cloud type platform having similar services of data computation, data storage and application service but it is fundamentally different as it decentralized. In this paper, we have done a comprehensive survey on fog computing& IoT and described the fog computing architecture and analyze its different benefits and applications. We have also analyzed the security aspects of fog computing & IoT, which is necessary and an important part of any kind of technology used in data communication system.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Quantum Computing Toolkit From Nuts and Bolts to Sack of Tools
Authors:
Himanshu Sahu,
Hari Prabhat Gupta
Abstract:
Quantum computing has the potential to provide exponential performance benefits in processing over classical computing. It utilizes quantum mechanics phenomena (such as superposition, entanglement, and interference) to solve a computational problem. It can explore atypical patterns over data that classical computers can't perform efficiently. Quantum computers are in the nascent stage of developme…
▽ More
Quantum computing has the potential to provide exponential performance benefits in processing over classical computing. It utilizes quantum mechanics phenomena (such as superposition, entanglement, and interference) to solve a computational problem. It can explore atypical patterns over data that classical computers can't perform efficiently. Quantum computers are in the nascent stage of development and are noisy due to decoherence, i.e., quantum bits deteriorate with environmental interactions. It will take a long time for quantum computers to achieve fault tolerance although quantum algorithms can be developed in advance. Heavy investment in develo** quantum hardware, software development kits, and simulators has led to multiplicity of quantum development tools. Selection of a suitable development platform requires a proper understanding of the capabilities and limitations of these tools. Although a comprehensive comparison of the different quantum development tools would be of great value, to the best of our knowledge, no such extensive study is currently available.
△ Less
Submitted 6 March, 2023; v1 submitted 17 February, 2023;
originally announced February 2023.
-
InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis
Authors:
Kevin Scaria,
Himanshu Gupta,
Siddharth Goyal,
Saurabh Arjun Sawant,
Swaroop Mishra,
Chitta Baral
Abstract:
We introduce InstructABSA, an instruction learning paradigm for Aspect-Based Sentiment Analysis (ABSA) subtasks. Our method introduces positive, negative, and neutral examples to each training sample, and instruction tune the model (Tk-Instruct) for ABSA subtasks, yielding significant performance improvements. Experimental results on the Sem Eval 2014, 15, and 16 datasets demonstrate that Instruct…
▽ More
We introduce InstructABSA, an instruction learning paradigm for Aspect-Based Sentiment Analysis (ABSA) subtasks. Our method introduces positive, negative, and neutral examples to each training sample, and instruction tune the model (Tk-Instruct) for ABSA subtasks, yielding significant performance improvements. Experimental results on the Sem Eval 2014, 15, and 16 datasets demonstrate that InstructABSA outperforms the previous state-of-the-art (SOTA) approaches on Term Extraction (ATE), Sentiment Classification(ATSC) and Sentiment Pair Extraction (ASPE) subtasks. In particular, InstructABSA outperforms the previous state-of-the-art (SOTA) on the Rest14 ATE subtask by 5.69% points, the Rest15 ATSC subtask by 9.59% points, and the Lapt14 AOPE subtask by 3.37% points, surpassing 7x larger models. We also get competitive results on AOOE, AOPE, and AOSTE subtasks indicating strong generalization ability to all subtasks. Exploring sample efficiency reveals that just 50% train data is required to get competitive results with other instruction tuning approaches. Lastly, we assess the quality of instructions and observe that InstructABSA's performance experiences a decline of ~10% when adding misleading examples.
△ Less
Submitted 13 November, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Astronomical Detection of the Interstellar Anion C10H- towards TMC-1 from the GOTHAM Large Program on the GBT
Authors:
Anthony Remijan,
Haley N. Scolati,
Andrew M. Burkhardt,
P. Bryan Changala,
Steven B. Charnley,
Ilsa R. Cooke,
Martin A. Cordiner,
Harshal Gupta,
Eric Herbst,
Kin Long Kelvin Lee,
Ryan Loomis,
Christopher N. Shingledecker,
Mark A. Siebert,
Ci Xue,
Michael C. McCarthy,
Brett A. McGuire
Abstract:
Using data from the GOTHAM (GBT Observations of TMC-1: Hunting for Aromatic Molecules) survey, we report the first astronomical detection of the C10H- anion. The astronomical observations also provided the necessary data to refine the spectroscopic parameters of C10H-. From the velocity stacked data and the matched filter response, C10H- is detected at >9σ confidence level at a column density of 4…
▽ More
Using data from the GOTHAM (GBT Observations of TMC-1: Hunting for Aromatic Molecules) survey, we report the first astronomical detection of the C10H- anion. The astronomical observations also provided the necessary data to refine the spectroscopic parameters of C10H-. From the velocity stacked data and the matched filter response, C10H- is detected at >9σ confidence level at a column density of 4.04e11 cm-2. A dedicated search for the C10H radical was also conducted towards TMC-1. In this case, the stacked molecular emission of C10H was detected at a ~3.2σ confidence interval at a column density of 2.02e11 cm-2. However, since the determined confidence level is currently <5σ, we consider the identification of C10H as tentative. The full GOTHAM dataset was also used to better characterize the physical parameters including column density, excitation temperature, linewidth, and source size for the C4H, C6H and C8H radicals and their respective anions, and the measured column densities were compared to the predictions from a gas/grain chemical formation model and from a machine learning analysis. Given the measured values, the C10H-/C10H column density ratio is ~2.0 - the highest value measured between an anion and neutral species to date. Such a high ratio is at odds with current theories for interstellar anion chemistry. For the radical species, both models can reproduce the measured abundances found from the survey; however, the machine learning analysis matches the detected anion abundances much better than the gas/grain chemical model, suggesting that the current understanding of the formation chemistry of molecular anions is still highly uncertain.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Differentiable modeling to unify machine learning and physical models and advance Geosciences
Authors:
Chaopeng Shen,
Alison P. Appling,
Pierre Gentine,
Toshiyuki Bandai,
Hoshin Gupta,
Alexandre Tartakovsky,
Marco Baity-Jesi,
Fabrizio Fenicia,
Daniel Kifer,
Li Li,
Xiaofeng Liu,
Wei Ren,
Yi Zheng,
Ciaran J. Harman,
Martyn Clark,
Matthew Farthing,
Dapeng Feng,
Praveen Kumar,
Doaa Aboelyazeed,
Farshid Rahmani,
Hylke E. Beck,
Tadd Bindas,
Dipankar Dwivedi,
Kuai Fang,
Marvin Höge
, et al. (5 additional authors not shown)
Abstract:
Process-Based Modeling (PBM) and Machine Learning (ML) are often perceived as distinct paradigms in the geosciences. Here we present differentiable geoscientific modeling as a powerful pathway toward dissolving the perceived barrier between them and ushering in a paradigm shift. For decades, PBM offered benefits in interpretability and physical consistency but struggled to efficiently leverage lar…
▽ More
Process-Based Modeling (PBM) and Machine Learning (ML) are often perceived as distinct paradigms in the geosciences. Here we present differentiable geoscientific modeling as a powerful pathway toward dissolving the perceived barrier between them and ushering in a paradigm shift. For decades, PBM offered benefits in interpretability and physical consistency but struggled to efficiently leverage large datasets. ML methods, especially deep networks, presented strong predictive skills yet lacked the ability to answer specific scientific questions. While various methods have been proposed for ML-physics integration, an important underlying theme -- differentiable modeling -- is not sufficiently recognized. Here we outline the concepts, applicability, and significance of differentiable geoscientific modeling (DG). "Differentiable" refers to accurately and efficiently calculating gradients with respect to model variables, critically enabling the learning of high-dimensional unknown relationships. DG refers to a range of methods connecting varying amounts of prior knowledge to neural networks and training them together, capturing a different scope than physics-guided machine learning and emphasizing first principles. Preliminary evidence suggests DG offers better interpretability and causality than ML, improved generalizability and extrapolation capability, and strong potential for knowledge discovery, while approaching the performance of purely data-driven ML. DG models require less training data while scaling favorably in performance and efficiency with increasing amounts of data. With DG, geoscientists may be better able to frame and investigate questions, test hypotheses, and discover unrecognized linkages.
△ Less
Submitted 26 December, 2023; v1 submitted 10 January, 2023;
originally announced January 2023.
-
A Roadmap to Domain Knowledge Integration in Machine Learning
Authors:
Himel Das Gupta,
Victor S. Sheng
Abstract:
Many machine learning algorithms have been developed in recent years to enhance the performance of a model in different aspects of artificial intelligence. But the problem persists due to inadequate data and resources. Integrating knowledge in a machine learning model can help to overcome these obstacles up to a certain degree. Incorporating knowledge is a complex task though because of various fo…
▽ More
Many machine learning algorithms have been developed in recent years to enhance the performance of a model in different aspects of artificial intelligence. But the problem persists due to inadequate data and resources. Integrating knowledge in a machine learning model can help to overcome these obstacles up to a certain degree. Incorporating knowledge is a complex task though because of various forms of knowledge representation. In this paper, we will give a brief overview of these different forms of knowledge integration and their performance in certain machine learning tasks.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Gras** the Inconspicuous
Authors:
Hrishikesh Gupta,
Stefan Thalhammer,
Markus Leitner,
Markus Vincze
Abstract:
Transparent objects are common in day-to-day life and hence find many applications that require robot gras**. Many solutions toward object gras** exist for non-transparent objects. However, due to the unique visual properties of transparent objects, standard 3D sensors produce noisy or distorted measurements. Modern approaches tackle this problem by either refining the noisy depth measurements…
▽ More
Transparent objects are common in day-to-day life and hence find many applications that require robot gras**. Many solutions toward object gras** exist for non-transparent objects. However, due to the unique visual properties of transparent objects, standard 3D sensors produce noisy or distorted measurements. Modern approaches tackle this problem by either refining the noisy depth measurements or using some intermediate representation of the depth. Towards this, we study deep learning 6D pose estimation from RGB images only for transparent object gras**. To train and test the suitability of RGB-based object pose estimation, we construct a dataset of RGB-only images with 6D pose annotations. The experiments demonstrate the effectiveness of RGB image space for gras** transparent objects.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Quantum Sensor Network Algorithms for Transmitter Localization
Authors:
Caitao Zhan,
Himanshu Gupta
Abstract:
A quantum sensor (QS) is able to measure various physical phenomena with extreme sensitivity. QSs have been used in several applications such as atomic interferometers, but few applications of a quantum sensor network (QSN) have been proposed or developed. We look at a natural application of QSN -- localization of an event (in particular, of a wireless signal transmitter). In this paper, we develo…
▽ More
A quantum sensor (QS) is able to measure various physical phenomena with extreme sensitivity. QSs have been used in several applications such as atomic interferometers, but few applications of a quantum sensor network (QSN) have been proposed or developed. We look at a natural application of QSN -- localization of an event (in particular, of a wireless signal transmitter). In this paper, we develop effective quantum-based techniques for the localization of a transmitter using a QSN. Our approaches pose the localization problem as a well-studied quantum state discrimination (QSD) problem and address the challenges in its application to the localization problem. In particular, a quantum state discrimination solution can suffer from a high probability of error, especially when the number of states (i.e., the number of potential transmitter locations in our case) can be high. We address this challenge by develo** a two-level localization approach, which localizes the transmitter at a coarser granularity in the first level, and then, in a finer granularity in the second level. We address the additional challenge of the impracticality of general measurements by develo** new schemes that replace the QSD's measurement operator with a trained parameterized hybrid quantum-classical circuit. Our evaluation results using a custom-built simulator show that our best scheme is able to achieve meter-level (1-5m) localization accuracy; in the case of discrete locations, it achieves near-perfect (99-100\%) classification accuracy.
△ Less
Submitted 31 July, 2023; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Laboratory and astronomical discovery of magnesium dicarbide, MgC$_2$
Authors:
P. B. Changala,
H. Gupta,
J. Cernicharo,
J. R. Pardo,
M. Agúndez,
C. Cabezas,
B. Tercero,
M. Guélin,
M. C. McCarthy
Abstract:
We report the detection of magnesium dicarbide, MgC$_2$, in the laboratory at centimeter wavelengths and assign $^{24}$MgC$_2$, $^{25}$MgC$_2$, and $^{26}$MgC$_2$ to 14 unidentified lines in the radio spectrum of the circumstellar envelope of the evolved carbon star IRC+10216. The structure of MgC$_2$ is found to be T-shaped with a highly ionic bond between the metal atom and the C$_2$ unit, analo…
▽ More
We report the detection of magnesium dicarbide, MgC$_2$, in the laboratory at centimeter wavelengths and assign $^{24}$MgC$_2$, $^{25}$MgC$_2$, and $^{26}$MgC$_2$ to 14 unidentified lines in the radio spectrum of the circumstellar envelope of the evolved carbon star IRC+10216. The structure of MgC$_2$ is found to be T-shaped with a highly ionic bond between the metal atom and the C$_2$ unit, analogous to other dicarbides containing electropositive elements. A two-temperature excitation model of the MgC$_2$ emission lines observed in IRC+10216 yields a very low rotational temperature of $6\pm1$ K, a kinetic temperature of $22\pm13$ K, and a column density of $(1.0 \pm 0.3) \times 10^{12}$ cm$^{-2}$. The abundance of MgC$_2$ relative to the magnesium-carbon chains MgCCH, MgC$_4$H, and MgC$_6$H is $1{:}2{:}22{:}20$ and provides a new constraint on the sequential radiative association-dissociative recombination mechanisms implicated in the production of metal-bearing molecules in circumstellar environments.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Discrete outcome quantum sensor networks
Authors:
Mark Hillery,
Himanshu Gupta,
Caitao Zhan
Abstract:
We model a quantum sensor network using techniques from quantum state discrimination. The interaction between a qubit detector and the environment is described by a unitary operator, and we will assume that at most one detector does interact. The task is to determine which one does or if none do. This involves choosing an initial state of the detectors and a measurement. We consider global measure…
▽ More
We model a quantum sensor network using techniques from quantum state discrimination. The interaction between a qubit detector and the environment is described by a unitary operator, and we will assume that at most one detector does interact. The task is to determine which one does or if none do. This involves choosing an initial state of the detectors and a measurement. We consider global measurements in which all detectors are measured simultaneously. We find that an entangled initial state can improve the detection probability, but this advantage decreases as the number of detectors increases.
△ Less
Submitted 30 May, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Detecting Unintended Social Bias in Toxic Language Datasets
Authors:
Nihar Sahoo,
Himanshu Gupta,
Pushpak Bhattacharyya
Abstract:
With the rise of online hate speech, automatic detection of Hate Speech, Offensive texts as a natural language processing task is getting popular. However, very little research has been done to detect unintended social bias from these toxic language datasets. This paper introduces a new dataset ToxicBias curated from the existing dataset of Kaggle competition named "Jigsaw Unintended Bias in Toxic…
▽ More
With the rise of online hate speech, automatic detection of Hate Speech, Offensive texts as a natural language processing task is getting popular. However, very little research has been done to detect unintended social bias from these toxic language datasets. This paper introduces a new dataset ToxicBias curated from the existing dataset of Kaggle competition named "Jigsaw Unintended Bias in Toxicity Classification". We aim to detect social biases, their categories, and targeted groups. The dataset contains instances annotated for five different bias categories, viz., gender, race/ethnicity, religion, political, and LGBTQ. We train transformer-based models using our curated datasets and report baseline performance for bias identification, target generation, and bias implications. Model biases and their mitigation are also discussed in detail. Our study motivates a systematic extraction of social bias data from toxic language datasets. All the codes and dataset used for experiments in this work are publicly available
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
"John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility
Authors:
Himanshu Gupta,
Neeraj Varshney,
Swaroop Mishra,
Kuntal Kumar Pal,
Saurabh Arjun Sawant,
Kevin Scaria,
Siddharth Goyal,
Chitta Baral
Abstract:
In current NLP research, large-scale language models and their abilities are widely being discussed. Some recent works have also found notable failures of these models. Often these failure examples involve complex reasoning abilities. This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible. To this end, we introduce FeasibilityQA, a question-an…
▽ More
In current NLP research, large-scale language models and their abilities are widely being discussed. Some recent works have also found notable failures of these models. Often these failure examples involve complex reasoning abilities. This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible. To this end, we introduce FeasibilityQA, a question-answering dataset involving binary classification (BCQ) and multi-choice multi-correct questions (MCQ) that test understanding of feasibility. We show that even state-of-the-art models such as GPT-3, GPT-2, and T5 struggle to answer the feasibility questions correctly. Specifically, on MCQ and BCQ questions, GPT-3 achieves an accuracy of just (19%, 62%) and (25%, 64%) in zero-shot and few-shot settings, respectively. We also evaluate models by providing relevant knowledge statements required to answer the question. We find that the additional knowledge leads to a 7% gain in performance, but the overall performance still remains low. These results make one wonder how much commonsense knowledge about action feasibility is encoded in state-of-the-art models and how well they can reason about it.
△ Less
Submitted 2 February, 2023; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Cross-Geography Generalization of Machine Learning Methods for Classification of Flooded Regions in Aerial Images
Authors:
Sushant Lenka,
Pratyush Kerhalkar,
Pranav Shetty,
Harsh Gupta,
Bhavam Vidyarthi,
Ujjwal Verma
Abstract:
Identification of regions affected by floods is a crucial piece of information required for better planning and management of post-disaster relief and rescue efforts. Traditionally, remote sensing images are analysed to identify the extent of damage caused by flooding. The data acquired from sensors onboard earth observation satellites are analyzed to detect the flooded regions, which can be affec…
▽ More
Identification of regions affected by floods is a crucial piece of information required for better planning and management of post-disaster relief and rescue efforts. Traditionally, remote sensing images are analysed to identify the extent of damage caused by flooding. The data acquired from sensors onboard earth observation satellites are analyzed to detect the flooded regions, which can be affected by low spatial and temporal resolution. However, in recent years, the images acquired from Unmanned Aerial Vehicles (UAVs) have also been utilized to assess post-disaster damage. Indeed, a UAV based platform can be rapidly deployed with a customized flight plan and minimum dependence on the ground infrastructure. This work proposes two approaches for identifying flooded regions in UAV aerial images. The first approach utilizes texture-based unsupervised segmentation to detect flooded areas, while the second uses an artificial neural network on the texture features to classify images as flooded and non-flooded. Unlike the existing works where the models are trained and tested on images of the same geographical regions, this work studies the performance of the proposed model in identifying flooded regions across geographical regions. An F1-score of 0.89 is obtained using the proposed segmentation-based approach which is higher than existing classifiers. The robustness of the proposed approach demonstrates that it can be utilized to identify flooded regions of any region with minimum or no user intervention.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Designing and Training of Lightweight Neural Networks on Edge Devices using Early Halting in Knowledge Distillation
Authors:
Rahul Mishra,
Hari Prabhat Gupta
Abstract:
Automated feature extraction capability and significant performance of Deep Neural Networks (DNN) make them suitable for Internet of Things (IoT) applications. However, deploying DNN on edge devices becomes prohibitive due to the colossal computation, energy, and storage requirements. This paper presents a novel approach for designing and training lightweight DNN using large-size DNN. The approach…
▽ More
Automated feature extraction capability and significant performance of Deep Neural Networks (DNN) make them suitable for Internet of Things (IoT) applications. However, deploying DNN on edge devices becomes prohibitive due to the colossal computation, energy, and storage requirements. This paper presents a novel approach for designing and training lightweight DNN using large-size DNN. The approach considers the available storage, processing speed, and maximum allowable processing time to execute the task on edge devices. We present a knowledge distillation based training procedure to train the lightweight DNN to achieve adequate accuracy. During the training of lightweight DNN, we introduce a novel early halting technique, which preserves network resources; thus, speedups the training procedure. Finally, we present the empirically and real-world evaluations to verify the effectiveness of the proposed approach under different constraints using various edge devices.
△ Less
Submitted 30 September, 2022;
originally announced September 2022.