Skip to main content

Showing 1–19 of 19 results for author: Singla, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.00942  [pdf, other

    cs.CV cs.CL

    LLaVA Finds Free Lunch: Teaching Human Behavior Improves Content Understanding Abilities Of LLMs

    Authors: Somesh Singh, Harini S I, Yaman K Singla, Veeky Baths, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

    Abstract: Communication is defined as "Who says what to whom with what effect." A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about it. Even after carrying signals about the message, the behavior data is often ignored while training large language models. We show that training LLM… ▽ More

    Submitted 16 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  2. arXiv:2402.18060  [pdf, other

    cs.CL

    Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions

    Authors: Hanjie Chen, Zhouxiang Fang, Yash Singla, Mark Dredze

    Abstract: LLMs have demonstrated impressive performance in answering medical questions, such as achieving passing scores on medical licensing examinations. However, medical board exam or general clinical questions do not capture the complexity of realistic clinical cases. Moreover, the lack of reference explanations means we cannot easily evaluate the reasoning of model decisions, a crucial component of sup… ▽ More

    Submitted 25 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  3. arXiv:2311.10995  [pdf, other

    cs.CV cs.CL

    Behavior Optimized Image Generation

    Authors: Varun Khurana, Yaman K Singla, Jayakumar Subramanian, Rajiv Ratn Shah, Changyou Chen, Zhiqiang Xu, Balaji Krishnamurthy

    Abstract: The last few years have witnessed great success on image generation, which has crossed the acceptance thresholds of aesthetics, making it directly applicable to personal and commercial applications. However, images, especially in marketing and advertising applications, are often created as a means to an end as opposed to just aesthetic concerns. The goal can be increasing sales, getting more click… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  4. arXiv:2309.00378  [pdf, other

    cs.CL cs.CV cs.HC

    Long-Term Ad Memorability: Understanding and Generating Memorable Ads

    Authors: Harini S I, Somesh Singh, Yaman K Singla, Aanisha Bhattacharyya, Veeky Baths, Changyou Chen, Rajiv Ratn Shah, Balaji Krishnamurthy

    Abstract: Marketers spend billions of dollars on advertisements, but to what end? At purchase time, if customers cannot recognize the brand for which they saw an ad, the money spent on the ad is essentially wasted. Despite its importance in marketing, until now, there has been no study on the memorability of ads in the ML literature. All previous memorability studies have been conducted on short-term recall… ▽ More

    Submitted 16 February, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

  5. arXiv:2309.00359  [pdf, other

    cs.CL cs.CV

    Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior

    Authors: Ashmit Khandelwal, Aditya Agrawal, Aanisha Bhattacharyya, Yaman K Singla, Somesh Singh, Uttaran Bhattacharya, Ishita Dasgupta, Stefano Petrangeli, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

    Abstract: Shannon and Weaver's seminal information theory divides communication into three levels: technical, semantic, and effectiveness. While the technical level deals with the accurate reconstruction of transmitted symbols, the semantic and effectiveness levels deal with the inferred meaning and its effect on the receiver. Large Language Models (LLMs), with their wide generalizability, make some progres… ▽ More

    Submitted 16 March, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

  6. arXiv:2305.09758  [pdf, other

    cs.CV cs.CL

    A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot

    Authors: Aanisha Bhattacharya, Yaman K Singla, Balaji Krishnamurthy, Rajiv Ratn Shah, Changyou Chen

    Abstract: Multimedia content, such as advertisements and story videos, exhibit a rich blend of creativity and multiple modalities. They incorporate elements like text, visuals, audio, and storytelling techniques, employing devices like emotions, symbolism, and slogans to convey meaning. There is a dearth of large annotated training datasets in the multimedia domain hindering the development of supervised le… ▽ More

    Submitted 26 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP-23 TL;DR: Video understanding lags far behind NLP; LLMs excel in zero-shot. Our approach utilizes LLMs to verbalize videos, creating stories for zero-shot video understanding. This yields state-of-the-art results across five datasets, covering fifteen tasks

  7. arXiv:2302.05721  [pdf, other

    cs.HC cs.CL

    Synthesizing Human Gaze Feedback for Improved NLP Performance

    Authors: Varun Khurana, Yaman Kumar Singla, Nora Hollenstein, Rajesh Kumar, Balaji Krishnamurthy

    Abstract: Integrating human feedback in models can improve the performance of natural language processing (NLP) models. Feedback can be either explicit (e.g. ranking used in training language models) or implicit (e.g. using human cognitive signals in the form of eyetracking). Prior eye tracking and NLP research reveal that cognitive processes, such as human scanpaths, gleaned from human gaze patterns aid in… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

    Comments: Accepted at European Chapter of the Association for Computational Linguistics (EACL)

  8. arXiv:2208.09626  [pdf, other

    cs.CL cs.CV

    Persuasion Strategies in Advertisements

    Authors: Yaman Kumar Singla, Rajat Jha, Arunim Gupta, Milan Aggarwal, Aditya Garg, Tushar Malyan, Ayush Bhardwaj, Rajiv Ratn Shah, Balaji Krishnamurthy, Changyou Chen

    Abstract: Modeling what makes an advertisement persuasive, i.e., eliciting the desired response from consumer, is critical to the study of propaganda, social psychology, and marketing. Despite its importance, computational modeling of persuasion in computer vision is still in its infancy, primarily due to the lack of benchmark datasets that can provide persuasion-strategy labels associated with ads. Motivat… ▽ More

    Submitted 6 May, 2023; v1 submitted 20 August, 2022; originally announced August 2022.

    Comments: Accepted at AAAI-23

  9. arXiv:2203.16028  [pdf, other

    cs.CL cs.MM cs.SD eess.AS

    Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

    Authors: Sreyan Ghosh, Sonal Kumar, Yaman Kumar Singla, Rajiv Ratn Shah, S. Umesh

    Abstract: Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear sequences in text, thus ignoring the structured information in text which is efficiently captured by dependency trees. In this paper, building on the span classif… ▽ More

    Submitted 18 April, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  10. arXiv:2203.15349  [pdf, other

    cs.CL

    LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

    Authors: Debanjan Mahata, Navneet Agarwal, Dibya Gautam, Amardeep Kumar, Swapnil Parekh, Yaman Kumar Singla, Anish Acharya, Rajiv Ratn Shah

    Abstract: Identifying keyphrases (KPs) from text documents is a fundamental task in natural language processing and information retrieval. Vast majority of the benchmark datasets for this task are from the scientific domain containing only the document title and abstract information. This limits keyphrase extraction (KPE) and keyphrase generation (KPG) algorithms to identify keyphrases from human-written su… ▽ More

    Submitted 1 April, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  11. arXiv:2111.15156  [pdf, other

    cs.CL cs.SD eess.AS

    Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

    Authors: Pakhi Bamdev, Manraj Singh Grover, Yaman Kumar Singla, Payman Vafaee, Mika Hama, Rajiv Ratn Shah

    Abstract: English proficiency assessments have become a necessary metric for filtering and selecting prospective candidates for both academia and industry. With the rise in demand for such assessments, it has become increasingly necessary to have the automated human-interpretable results to prevent inconsistencies and ensure meaningful feedback to the second language learners. Feature-based classical approa… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

    Comments: Accepted for publication in the International Journal of Artificial Intelligence in Education (IJAIED)

  12. arXiv:2111.08906  [pdf, other

    cs.CL stat.AP

    Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees

    Authors: Yaman Kumar Singla, Sriram Krishna, Rajiv Ratn Shah, Changyou Chen

    Abstract: Automated Scoring (AS), the natural language processing task of scoring essays and speeches in an educational testing setting, is growing in popularity and being deployed across contexts from government examinations to companies providing language proficiency services. However, existing systems either forgo human raters entirely, thus harming the reliability of the test, or score every response by… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

  13. arXiv:2110.06507  [pdf, other

    cs.CL

    Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

    Authors: Anuj Saraswat, Mehar Bhatia, Yaman Kumar Singla, Changyou Chen, Rajiv Ratn Shah

    Abstract: Recent studies in speech perception have been closely linked to fields of cognitive psychology, phonology, and phonetics in linguistics. During perceptual attunement, a critical and sensitive developmental trajectory has been examined in bilingual and monolingual infants where they can best discriminate common phonemes. In this paper, we compare and identify these cognitive aspects on deep neural-… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: 9 pages, 6 figures, 2 tables

  14. arXiv:2109.11728  [pdf, other

    cs.CL cs.AI cs.CY

    AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

    Authors: Yaman Kumar Singla, Swapnil Parekh, Somesh Singh, Junyi Jessy Li, Rajiv Ratn Shah, Changyou Chen

    Abstract: Deep-learning based Automatic Essay Scoring (AES) systems are being actively used by states and language testing agencies alike to evaluate millions of candidates for life-changing decisions ranging from college applications to visa approvals. However, little research has been put to understand and interpret the black-box nature of deep-learning based scoring algorithms. Previous studies indicate… ▽ More

    Submitted 14 October, 2021; v1 submitted 23 September, 2021; originally announced September 2021.

    Comments: arXiv admin note: text overlap with arXiv:2012.13872

  15. arXiv:2109.00928  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

    Authors: Yaman Kumar Singla, Avykat Gupta, Shaurya Bagga, Changyou Chen, Balaji Krishnamurthy, Rajiv Ratn Shah

    Abstract: Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a candidate's speaking proficiency in a language. ASS systems face many challenges like open grammar, variable pronunciations, and unstructured or semi-structured content. Recent deep learning approaches have shown some promise in this domain. However, most of these approaches focus on extracting features from a single audio, ma… ▽ More

    Submitted 30 August, 2021; originally announced September 2021.

    Comments: Published in CIKM 2021

  16. arXiv:2101.00387  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure

    Authors: Jui Shah, Yaman Kumar Singla, Changyou Chen, Rajiv Ratn Shah

    Abstract: In recent times, BERT based transformer models have become an inseparable part of the 'tech stack' of text processing models. Similar progress is being observed in the speech domain with a multitude of models observing state-of-the-art results by using audio transformer models to encode speech. This begs the question of what are these audio transformer models learning. Moreover, although the stand… ▽ More

    Submitted 12 July, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

  17. arXiv:2101.00056  [pdf, other

    cs.CL

    Towards Modelling Coherence in Spoken Discourse

    Authors: Rajaswa Patil, Yaman Kumar Singla, Rajiv Ratn Shah, Mika Hama, Roger Zimmermann

    Abstract: While there has been significant progress towards modelling coherence in written discourse, the work in modelling spoken discourse coherence has been quite limited. Unlike the coherence in text, coherence in spoken discourse is also dependent on the prosodic and acoustic patterns in speech. In this paper, we model coherence in spoken discourse with audio-based coherence models. We perform experime… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.

    Comments: 12 pages

  18. arXiv:2012.13872  [pdf, other

    cs.CL cs.AI cs.CY

    My Teacher Thinks The World Is Flat! Interpreting Automatic Essay Scoring Mechanism

    Authors: Swapnil Parekh, Yaman Kumar Singla, Changyou Chen, Junyi Jessy Li, Rajiv Ratn Shah

    Abstract: Significant progress has been made in deep-learning based Automatic Essay Scoring (AES) systems in the past two decades. However, little research has been put to understand and interpret the black-box nature of these deep-learning based scoring models. Recent work shows that automated scoring systems are prone to even common-sense adversarial samples. Their lack of natural language understanding c… ▽ More

    Submitted 27 December, 2020; originally announced December 2020.

  19. arXiv:1904.09072  [pdf, other

    cs.CL

    Identifying Offensive Posts and Targeted Offense from Twitter

    Authors: Haimin Zhang, Debanjan Mahata, Simra Shahid, Laiba Mehnaz, Sarthak Anand, Yaman Singla, Rajiv Ratn Shah, Karan Uppal

    Abstract: In this paper we present our approach and the system description for Sub-task A and Sub Task B of SemEval 2019 Task 6: Identifying and Categorizing Offensive Language in Social Media. Sub-task A involves identifying if a given tweet is offensive or not, and Sub Task B involves detecting if an offensive tweet is targeted towards someone (group or an individual). Our models for Sub-task A is based o… ▽ More

    Submitted 19 April, 2019; originally announced April 2019.