Search | arXiv e-print repository

arXiv:2406.16383 [pdf, other]

Context-augmented Retrieval: A Novel Framework for Fast Information Retrieval based Response Generation using Large Language Model

Authors: Sai Ganesh, Anupam Purwar, Gautam B

Abstract: Generating high-quality answers consistently by providing contextual information embedded in the prompt passed to the Large Language Model (LLM) is dependent on the quality of information retrieval. As the corpus of contextual information grows, the answer/inference quality of Retrieval Augmented Generation (RAG) based Question Answering (QA) systems declines. This work solves this problem by comb… ▽ More Generating high-quality answers consistently by providing contextual information embedded in the prompt passed to the Large Language Model (LLM) is dependent on the quality of information retrieval. As the corpus of contextual information grows, the answer/inference quality of Retrieval Augmented Generation (RAG) based Question Answering (QA) systems declines. This work solves this problem by combining classical text classification with the Large Language Model (LLM) to enable quick information retrieval from the vector store and ensure the relevancy of retrieved information. For the same, this work proposes a new approach Context Augmented retrieval (CAR), where partitioning of vector database by real-time classification of information flowing into the corpus is done. CAR demonstrates good quality answer generation along with significant reduction in information retrieval and answer generation time. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.11424 [pdf, other]

Evaluating the Efficacy of Open-Source LLMs in Enterprise-Specific RAG Systems: A Comparative Study of Performance and Scalability

Authors: Gautam B, Anupam Purwar

Abstract: This paper presents an analysis of open-source large language models (LLMs) and their application in Retrieval-Augmented Generation (RAG) tasks, specific for enterprise-specific data sets scraped from their websites. With the increasing reliance on LLMs in natural language processing, it is crucial to evaluate their performance, accessibility, and integration within specific organizational context… ▽ More This paper presents an analysis of open-source large language models (LLMs) and their application in Retrieval-Augmented Generation (RAG) tasks, specific for enterprise-specific data sets scraped from their websites. With the increasing reliance on LLMs in natural language processing, it is crucial to evaluate their performance, accessibility, and integration within specific organizational contexts. This study examines various open-source LLMs, explores their integration into RAG frameworks using enterprise-specific data, and assesses the performance of different open-source embeddings in enhancing the retrieval and generation process. Our findings indicate that open-source LLMs, combined with effective embedding techniques, can significantly improve the accuracy and efficiency of RAG systems, offering a viable alternative to proprietary solutions for enterprises. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.00638 [pdf, other]

COS-Mix: Cosine Similarity and Distance Fusion for Improved Information Retrieval

Authors: Kush Juvekar, Anupam Purwar

Abstract: This study proposes a novel hybrid retrieval strategy for Retrieval-Augmented Generation (RAG) that integrates cosine similarity and cosine distance measures to improve retrieval performance, particularly for sparse data. The traditional cosine similarity measure is widely used to capture the similarity between vectors in high-dimensional spaces. However, it has been shown that this measure can yi… ▽ More This study proposes a novel hybrid retrieval strategy for Retrieval-Augmented Generation (RAG) that integrates cosine similarity and cosine distance measures to improve retrieval performance, particularly for sparse data. The traditional cosine similarity measure is widely used to capture the similarity between vectors in high-dimensional spaces. However, it has been shown that this measure can yield arbitrary results in certain scenarios. To address this limitation, we incorporate cosine distance measures to provide a complementary perspective by quantifying the dissimilarity between vectors. Our approach is experimented on proprietary data, unlike recent publications that have used open-source datasets. The proposed method demonstrates enhanced retrieval performance and provides a more comprehensive understanding of the semantic relationships between documents or items. This hybrid strategy offers a promising solution for efficiently and accurately retrieving relevant information in knowledge-intensive applications, leveraging techniques such as BM25 (sparse) retrieval , vector (Dense) retrieval, and cosine distance based retrieval to facilitate efficient information retrieval. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2310.04205 [pdf, other]

Keyword Augmented Retrieval: Novel framework for Information Retrieval integrated with speech interface

Authors: Anupam Purwar, Rahul Sundar

Abstract: Retrieving answers in a quick and low cost manner without hallucinations from a combination of structured and unstructured data using Language models is a major hurdle. This is what prevents employment of Language models in knowledge retrieval automation. This becomes accentuated when one wants to integrate a speech interface on top of a text based knowledge retrieval system. Besides, for commerci… ▽ More Retrieving answers in a quick and low cost manner without hallucinations from a combination of structured and unstructured data using Language models is a major hurdle. This is what prevents employment of Language models in knowledge retrieval automation. This becomes accentuated when one wants to integrate a speech interface on top of a text based knowledge retrieval system. Besides, for commercial search and chat-bot applications, complete reliance on commercial large language models (LLMs) like GPT 3.5 etc. can be very costly. In the present study, the authors have addressed the aforementioned problem by first develo** a keyword based search framework which augments discovery of the context from the document to be provided to the LLM. The keywords in turn are generated by a relatively smaller LLM and cached for comparison with keywords generated by the same smaller LLM against the query raised. This significantly reduces time and cost to find the context within documents. Once the context is set, a larger LLM uses that to provide answers based on a prompt tailored for Q\&A. This research work demonstrates that use of keywords in context identification reduces the overall inference time and cost of information retrieval. Given this reduction in inference time and cost with the keyword augmented retrieval framework, a speech based interface for user input and response readout was integrated. This allowed a seamless interaction with the language model. △ Less

Submitted 29 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2305.00577 [pdf, other]

doi 10.1145/3543873.3587657

Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research

Authors: Harshita Sahijwani, Kaustubh Dhole, Ankur Purwar, Venugopal Vasudevan, Eugene Agichtein

Abstract: Structured interviews are used in many settings, importantly in market research on topics such as brand perception, customer habits, or preferences, which are critical to product development, marketing, and e-commerce at large. Such interviews generally consist of a series of questions that are asked to a participant. These interviews are typically conducted by skilled interviewers, who interpret… ▽ More Structured interviews are used in many settings, importantly in market research on topics such as brand perception, customer habits, or preferences, which are critical to product development, marketing, and e-commerce at large. Such interviews generally consist of a series of questions that are asked to a participant. These interviews are typically conducted by skilled interviewers, who interpret the responses from the participants and can adapt the interview accordingly. Using automated conversational agents to conduct such interviews would enable reaching a much larger and potentially more diverse group of participants than currently possible. However, the technical challenges involved in building such a conversational system are relatively unexplored. To learn more about these challenges, we convert a market research multiple-choice questionnaire to a conversational format and conduct a user study. We address the key task of conducting structured interviews, namely interpreting the participant's response, for example, by matching it to one or more predefined options. Our findings can be applied to improve response interpretation for the information elicitation phase of conversational recommender systems. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: ISIR 2023

arXiv:2203.16056 [pdf, other]

Automatic Facial Skin Feature Detection for Everyone

Authors: Qian Zheng, Ankur Purwar, Heng Zhao, Guang Liang Lim, Ling Li, Debasish Behera, Qian Wang, Min Tan, Rizhao Cai, Jennifer Werner, Dennis Sng, Maurice van Steensel, Weisi Lin, Alex C Kot

Abstract: Automatic assessment and understanding of facial skin condition have several applications, including the early detection of underlying health problems, lifestyle and dietary treatment, skin-care product recommendation, etc. Selfies in the wild serve as an excellent data resource to democratize skin quality assessment, but suffer from several data collection challenges.The key to guaranteeing an ac… ▽ More Automatic assessment and understanding of facial skin condition have several applications, including the early detection of underlying health problems, lifestyle and dietary treatment, skin-care product recommendation, etc. Selfies in the wild serve as an excellent data resource to democratize skin quality assessment, but suffer from several data collection challenges.The key to guaranteeing an accurate assessment is accurate detection of different skin features. We present an automatic facial skin feature detection method that works across a variety of skin tones and age groups for selfies in the wild. To be specific, we annotate the locations of acne, pigmentation, and wrinkle for selfie images with different skin tone colors, severity levels, and lighting conditions. The annotation is conducted in a two-phase scheme with the help of a dermatologist to train volunteers for annotation. We employ Unet++ as the network architecture for feature detection. This work shows that the two-phase annotation scheme can robustly detect the accurate locations of acne, pigmentation, and wrinkle for selfie images with different ethnicities, skin tone colors, severity levels, age groups, and lighting conditions. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: Accepted by the conference of Electronic Imaging (EI) 2022

arXiv:1107.4417 [pdf]

doi 10.1109/IEMBS.2008.4649357

Frequency Domain Approach for Activity Classification using Accelerometer

Authors: Wan-Young Chung, Amit Purwar, Annapurna Sharma

Abstract: Activity classification was performed using MEMS accelerometer and wireless sensor node for wireless sensor network environment. Three axes MEMS accelerometer measures body's acceleration and transmits measured data with the help of sensor node to base station attached to PC. On the PC, real time accelerometer data is processed for movement classifications. In this paper, Rest, walking and running… ▽ More Activity classification was performed using MEMS accelerometer and wireless sensor node for wireless sensor network environment. Three axes MEMS accelerometer measures body's acceleration and transmits measured data with the help of sensor node to base station attached to PC. On the PC, real time accelerometer data is processed for movement classifications. In this paper, Rest, walking and running are the classified activities of the person. Both time and frequency analysis was performed to classify running and walking. The classification of rest and movement is done using Signal magnitude area (SMA). The classification accuracy for rest and movement is 100%. For the classification of walk and Run two parameters i.e. SMA and Median frequency were used. The classification accuracy for walk and running was detected as 81.25% in the experiments performed by the test persons. △ Less

Submitted 22 July, 2011; originally announced July 2011.

Comments: 30th Annual International IEEE EMBS Conference, Vancouver, British Columbia, Canada, August 20-24, 2008

arXiv:1107.4414 [pdf]

doi 10.1109/MFI.2008.4648056

Frequency based Classification of Activities using Accelerometer Data

Authors: Annapurna Sharma, Amit Purwar, Young-Dong Lee Young-Sook Lee Wan-Young Chung

Abstract: This work presents, the classification of user activities such as Rest, Walk and Run, on the basis of frequency component present in the acceleration data in a wireless sensor network environment. As the frequencies of the above mentioned activities differ slightly for different person, so it gives a more accurate result. The algorithm uses just one parameter i.e. the frequency of the body acceler… ▽ More This work presents, the classification of user activities such as Rest, Walk and Run, on the basis of frequency component present in the acceleration data in a wireless sensor network environment. As the frequencies of the above mentioned activities differ slightly for different person, so it gives a more accurate result. The algorithm uses just one parameter i.e. the frequency of the body acceleration data of the three axes for classifying the activities in a set of data. The algorithm includes a normalization step and hence there is no need to set a different value of threshold value for magnitude for different test person. The classification is automatic and done on a block by block basis. △ Less

Submitted 22 July, 2011; originally announced July 2011.

Comments: IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2008. MFI 2008

Showing 1–8 of 8 results for author: Purwar, A