Search | arXiv e-print repository

DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs

Authors: Venktesh V. Deepali Prabhu, Avishek Anand

Abstract: Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning. The complexity of such questions could stem from questions being compositional, hybrid evidence, or ambiguity in questions. While retrieval performance for classical QA tasks is well explored, their capabilities for heterogeneous complex retrieval tasks, especially in an open-domain… ▽ More Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning. The complexity of such questions could stem from questions being compositional, hybrid evidence, or ambiguity in questions. While retrieval performance for classical QA tasks is well explored, their capabilities for heterogeneous complex retrieval tasks, especially in an open-domain setting, and the impact on downstream QA performance, are relatively unexplored. To address this, in this work, we propose a benchmark composing diverse complex QA tasks and provide a toolkit to evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting. We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models. In addition, since context-based reasoning is critical for solving complex QA tasks, we also evaluate the reasoning capabilities of LLMs and the impact of retrieval performance on their reasoning capabilities. Through experiments, we observe that much progress is to be made in retrieval for complex QA to improve downstream QA performance. Our software and related data can be accessed at https://github.com/VenkteshV/DEXTER △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: under submission, 22 pages

arXiv:2404.12628 [pdf, other]

Efficient infusion of self-supervised representations in Automatic Speech Recognition

Authors: Darshan Prabhu, Sai Ganesh Mirishkar, Pankaj Wasnik

Abstract: Self-supervised learned (SSL) models such as Wav2vec and HuBERT yield state-of-the-art results on speech-related tasks. Given the effectiveness of such models, it is advantageous to use them in conventional ASR systems. While some approaches suggest incorporating these models as a trainable encoder or a learnable frontend, training such systems is extremely slow and requires a lot of computation c… ▽ More Self-supervised learned (SSL) models such as Wav2vec and HuBERT yield state-of-the-art results on speech-related tasks. Given the effectiveness of such models, it is advantageous to use them in conventional ASR systems. While some approaches suggest incorporating these models as a trainable encoder or a learnable frontend, training such systems is extremely slow and requires a lot of computation cycles. In this work, we propose two simple approaches that use (1) framewise addition and (2) cross-attention mechanisms to efficiently incorporate the representations from the SSL model(s) into the ASR architecture, resulting in models that are comparable in size with standard encoder-decoder conformer systems while also avoiding the usage of SSL models during training. Our approach results in faster training and yields significant performance gains on the Librispeech and Tedlium datasets compared to baselines. We further provide detailed analysis and ablation studies that demonstrate the effectiveness of our approach. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted to ENLSP workshop, NeurIPS 2023

arXiv:2310.15970 [pdf, other]

Accented Speech Recognition With Accent-specific Codebooks

Authors: Darshan Prabhu, Preethi Jyothi, Sriram Ganapathy, Vinit Unni

Abstract: Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture… ▽ More Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture accent-specific information and are integrated within the ASR encoder layers. The model is trained on accented English speech, while the test data also contained accents which were not seen during training. On the Mozilla Common Voice multi-accented dataset, we show that our proposed approach yields significant performance gains not only on the seen English accents (up to $37\%$ relative improvement in word error rate) but also on the unseen accents (up to $5\%$ relative improvement in WER). Further, we illustrate benefits for a zero-shot transfer setup on the L2Artic dataset. We also compare the performance with other approaches based on accent adversarial training. △ Less

Submitted 26 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 Main Conference (Long Paper)

arXiv:2112.00448 [pdf, other]

On-Device Spatial Attention based Sequence Learning Approach for Scene Text Script Identification

Authors: Rutika Moharir, Arun D Prabhu, Sukumar Moharana, Gopi Ramena, Rachit S Munjal

Abstract: Automatic identification of script is an essential component of a multilingual OCR engine. In this paper, we present an efficient, lightweight, real-time and on-device spatial attention based CNN-LSTM network for scene text script identification, feasible for deployment on resource constrained mobile devices. Our network consists of a CNN, equipped with a spatial attention module which helps reduc… ▽ More Automatic identification of script is an essential component of a multilingual OCR engine. In this paper, we present an efficient, lightweight, real-time and on-device spatial attention based CNN-LSTM network for scene text script identification, feasible for deployment on resource constrained mobile devices. Our network consists of a CNN, equipped with a spatial attention module which helps reduce the spatial distortions present in natural images. This allows the feature extractor to generate rich image representations while ignoring the deformities and thereby, enhancing the performance of this fine grained classification task. The network also employs residue convolutional blocks to build a deep network to focus on the discriminative features of a script. The CNN learns the text feature representation by identifying each character as belonging to a particular script and the long term spatial dependencies within the text are captured using the sequence learning capabilities of the LSTM layers. Combining the spatial attention mechanism with the residue convolutional blocks, we are able to enhance the performance of the baseline CNN to build an end-to-end trainable network for script identification. The experimental results on several standard benchmarks demonstrate the effectiveness of our method. The network achieves competitive accuracy with state-of-the-art methods and is superior in terms of network size, with a total of just 1.1 million parameters and inference time of 2.7 milliseconds. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: Accepted for publication in CVIP 2021

arXiv:2111.15348 [pdf, other]

doi 10.1002/er.7081

Overcoming limited battery data challenges: A coupled neural network approach

Authors: Aniruddh Herle, Janamejaya Channegowda, Dinakar Prabhu

Abstract: The Electric Vehicle (EV) Industry has seen extraordinary growth in the last few years. This is primarily due to an ever increasing awareness of the detrimental environmental effects of fossil fuel powered vehicles and availability of inexpensive Lithium-ion batteries (LIBs). In order to safely deploy these LIBs in Electric Vehicles, certain battery states need to be constantly monitored to ensure… ▽ More The Electric Vehicle (EV) Industry has seen extraordinary growth in the last few years. This is primarily due to an ever increasing awareness of the detrimental environmental effects of fossil fuel powered vehicles and availability of inexpensive Lithium-ion batteries (LIBs). In order to safely deploy these LIBs in Electric Vehicles, certain battery states need to be constantly monitored to ensure safe and healthy operation. The use of Machine Learning to estimate battery states such as State-of-Charge and State-of-Health have become an extremely active area of research. However, limited availability of open-source diverse datasets has stifled the growth of this field, and is a problem largely ignored in literature. In this work, we propose a novel method of time-series battery data augmentation using deep neural networks. We introduce and analyze the method of using two neural networks working together to alternatively produce synthetic charging and discharging battery profiles. One model produces battery charging profiles, and another produces battery discharging profiles. The proposed approach is evaluated using few public battery datasets to illustrate its effectiveness, and our results show the efficacy of this approach to solve the challenges of limited battery data. We also test this approach on dynamic Electric Vehicle drive cycles as well. △ Less

Submitted 5 October, 2021; originally announced November 2021.

Comments: Published at International Journal of Energy Research

arXiv:2105.07795 [pdf, other]

doi 10.1109/IJCNN52387.2021.9534319

STRIDE : Scene Text Recognition In-Device

Authors: Rachit S Munjal, Arun D Prabhu, Nikhil Arora, Sukumar Moharana, Gopi Ramena

Abstract: Optical Character Recognition (OCR) systems have been widely used in various applications for extracting semantic information from images. To give the user more control over their privacy, an on-device solution is needed. The current state-of-the-art models are too heavy and complex to be deployed on-device. We develop an efficient lightweight scene text recognition (STR) system, which has only 0.… ▽ More Optical Character Recognition (OCR) systems have been widely used in various applications for extracting semantic information from images. To give the user more control over their privacy, an on-device solution is needed. The current state-of-the-art models are too heavy and complex to be deployed on-device. We develop an efficient lightweight scene text recognition (STR) system, which has only 0.88M parameters and performs real-time text recognition. Attention modules tend to boost the accuracy of STR networks but are generally slow and not optimized for device inference. So, we propose the use of convolution attention modules to the text recognition networks, which aims to provide channel and spatial attention information to the LSTM module by adding very minimal computational cost. It boosts our word accuracy on ICDAR 13 dataset by almost 2\%. We also introduce a novel orientation classifier module, to support the simultaneous recognition of both horizontal and vertical text. The proposed model surpasses on-device metrics of inference time and memory footprint and achieves comparable accuracy when compared to the leading commercial and other open-source OCR engines. We deploy the system on-device with an inference speed of 2.44 ms per word on the Exynos 990 chipset device and achieve an accuracy of 88.4\% on ICDAR-13 dataset. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: accepted in IJCNN 2021

arXiv:2012.02990 [pdf, other]

doi 10.1109/ICSC50631.2021.00030

Codeswitched Sentence Creation using Dependency Parsing

Authors: Dhruval Jain, Arun D Prabhu, Shubham Vatsal, Gopi Ramena, Naresh Purre

Abstract: Codeswitching has become one of the most common occurrences across multilingual speakers of the world, especially in countries like India which encompasses around 23 official languages with the number of bilingual speakers being around 300 million. The scarcity of Codeswitched data becomes a bottleneck in the exploration of this domain with respect to various Natural Language Processing (NLP) task… ▽ More Codeswitching has become one of the most common occurrences across multilingual speakers of the world, especially in countries like India which encompasses around 23 official languages with the number of bilingual speakers being around 300 million. The scarcity of Codeswitched data becomes a bottleneck in the exploration of this domain with respect to various Natural Language Processing (NLP) tasks. We thus present a novel algorithm which harnesses the syntactic structure of English grammar to develop grammatically sensible Codeswitched versions of English-Hindi, English-Marathi and English-Kannada data. Apart from maintaining the grammatical sanity to a great extent, our methodology also guarantees abundant generation of data from a minuscule snapshot of given data. We use multiple datasets to showcase the capabilities of our algorithm while at the same time we assess the quality of generated Codeswitched data using some qualitative metrics along with providing baseline results for couple of NLP tasks. △ Less

Submitted 5 December, 2020; originally announced December 2020.

arXiv:2012.02819 [pdf, other]

doi 10.1109/ICSC50631.2021.00033

On-Device Sentence Similarity for SMS Dataset

Authors: Arun D Prabhu, Nikhil Arora, Shubham Vatsal, Gopi Ramena, Sukumar Moharana, Naresh Purre

Abstract: Determining the sentence similarity between Short Message Service (SMS) texts/sentences plays a significant role in mobile device industry. Gauging the similarity between SMS data is thus necessary for various applications like enhanced searching and navigation, clubbing together SMS of similar type when given a custom label or tag is provided by user irrespective of their sender etc. The problem… ▽ More Determining the sentence similarity between Short Message Service (SMS) texts/sentences plays a significant role in mobile device industry. Gauging the similarity between SMS data is thus necessary for various applications like enhanced searching and navigation, clubbing together SMS of similar type when given a custom label or tag is provided by user irrespective of their sender etc. The problem faced with SMS data is its incomplete structure and grammatical inconsistencies. In this paper, we propose a unique pipeline for evaluating the text similarity between SMS texts. We use Part of Speech (POS) model for keyword extraction by taking advantage of the partial structure embedded in SMS texts and similarity comparisons are carried out using statistical methods. The proposed pipeline deals with major semantic variations across SMS data as well as makes it effective for its application on-device (mobile phone). To showcase the capabilities of our work, our pipeline has been designed with an inclination towards one of the possible applications of SMS text similarity discussed in one of the following sections but nonetheless guarantees scalability for other applications as well. △ Less

Submitted 4 December, 2020; originally announced December 2020.

arXiv:2011.10251 [pdf, other]

doi 10.1109/ICPR48806.2021.9412222

On-Device Text Image Super Resolution

Authors: Dhruval Jain, Arun D Prabhu, Gopi Ramena, Manoj Goyal, Debi Prasanna Mohanty, Sukumar Moharana, Naresh Purre

Abstract: Recent research on super-resolution (SR) has witnessed major developments with the advancements of deep convolutional neural networks. There is a need for information extraction from scenic text images or even document images on device, most of which are low-resolution (LR) images. Therefore, SR becomes an essential pre-processing step as Bicubic Upsampling, which is conventionally present in smar… ▽ More Recent research on super-resolution (SR) has witnessed major developments with the advancements of deep convolutional neural networks. There is a need for information extraction from scenic text images or even document images on device, most of which are low-resolution (LR) images. Therefore, SR becomes an essential pre-processing step as Bicubic Upsampling, which is conventionally present in smartphones, performs poorly on LR images. To give the user more control over his privacy, and to reduce the carbon footprint by reducing the overhead of cloud computing and hours of GPU usage, executing SR models on the edge is a necessity in the recent times. There are various challenges in running and optimizing a model on resource-constrained platforms like smartphones. In this paper, we present a novel deep neural network that reconstructs sharper character edges and thus boosts OCR confidence. The proposed architecture not only achieves significant improvement in PSNR over bicubic upsampling on various benchmark datasets but also runs with an average inference time of 11.7 ms per image. We have outperformed state-of-the-art on the Text330 dataset. We also achieve an OCR accuracy of 75.89% on the ICDAR 2015 TextSR dataset, where ground truth has an accuracy of 78.10%. △ Less

Submitted 20 November, 2020; originally announced November 2020.

Comments: Accepted to the International Conference on Pattern Recognition(ICPR), 2020

arXiv:2011.09775 [pdf, other]

A Temporal Convolution Network Approach to State-of-Charge Estimation in Li-ion Batteries

Authors: Aniruddh Herle, Janamejaya Channegowda, Dinakar Prabhu

Abstract: Electric Vehicle (EV) fleets have dramatically expanded over the past several years. There has been significant increase in interest to electrify all modes of transportation. EVs are primarily powered by Energy Storage Systems such as Lithium-ion Battery packs. Total battery pack capacity translates to the available range in an EV. State of Charge (SOC) is the ratio of available battery capacity t… ▽ More Electric Vehicle (EV) fleets have dramatically expanded over the past several years. There has been significant increase in interest to electrify all modes of transportation. EVs are primarily powered by Energy Storage Systems such as Lithium-ion Battery packs. Total battery pack capacity translates to the available range in an EV. State of Charge (SOC) is the ratio of available battery capacity to total capacity and is expressed in percentages. It is crucial to accurately estimate SOC to determine the available range in an EV while it is in use. In this paper, a Temporal Convolution Network (TCN) approach is taken to estimate SOC. This is the first implementation of TCNs for the SOC estimation task. Estimation is carried out on various drive cycles such as HWFET, LA92, UDDS and US06 drive cycles at 1 C and 25 °Celsius. It was found that TCN architecture achieved an accuracy of 99.1%. △ Less

Submitted 19 November, 2020; originally announced November 2020.

Comments: 17th IEEE India Council International Conference 2020

arXiv:2010.00401 [pdf, ps, other]

doi 10.1109/CONECCT50063.2020.9198529

Quasar Detection using Linear Support Vector Machine with Learning From Mistakes Methodology

Authors: Aniruddh Herle, Janamejaya Channegowda, Dinakar Prabhu

Abstract: The field of Astronomy requires the collection and assimilation of vast volumes of data. The data handling and processing problem has become severe as the sheer volume of data produced by scientific instruments each night grows exponentially. This problem becomes extensive for conventional methods of processing the data, which was mostly manual, but is the perfect setting for the use of Machine Le… ▽ More The field of Astronomy requires the collection and assimilation of vast volumes of data. The data handling and processing problem has become severe as the sheer volume of data produced by scientific instruments each night grows exponentially. This problem becomes extensive for conventional methods of processing the data, which was mostly manual, but is the perfect setting for the use of Machine Learning approaches. While building classifiers for Astronomy, the cost of losing a rare object like supernovae or quasars to detection losses is far more severe than having many false positives, given the rarity and scientific value of these objects. In this paper, a Linear Support Vector Machine (LSVM) is explored to detect Quasars, which are extremely bright objects in which a supermassive black hole is surrounded by a luminous accretion disk. In Astronomy, it is vital to correctly identify quasars, as they are very rare in nature. Their rarity creates a class-imbalance problem that needs to be taken into consideration. The class-imbalance problem and high cost of misclassification are taken into account while designing the classifier. To achieve this detection, a novel classifier is explored, and its performance is evaluated. It was observed that LSVM along with Ensemble Bagged Trees (EBT) achieved a 10x reduction in the False Negative Rate, using the Learning from Mistakes methodology. △ Less

Submitted 2 October, 2020; v1 submitted 1 October, 2020; originally announced October 2020.

Comments: Published in 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)

arXiv:1101.2577 [pdf]

Bi-serial DNA Encryption Algorithm(BDEA)

Authors: D. Prabhu, M. Adimoolam

Abstract: The vast parallelism, exceptional energy efficiency and extraordinary information inherent in DNA molecules are being explored for computing, data storage and cryptography. DNA cryptography is a emerging field of cryptography. In this paper a novel encryption algorithm is devised based on number conversion, DNA digital coding, PCR amplification, which can effectively prevent attack. Data treatment… ▽ More The vast parallelism, exceptional energy efficiency and extraordinary information inherent in DNA molecules are being explored for computing, data storage and cryptography. DNA cryptography is a emerging field of cryptography. In this paper a novel encryption algorithm is devised based on number conversion, DNA digital coding, PCR amplification, which can effectively prevent attack. Data treatment is used to transform the plain text into cipher text which provides excellent security △ Less

Submitted 13 January, 2011; originally announced January 2011.

Showing 1–12 of 12 results for author: Prabhu, D