Skip to main content

Showing 1–9 of 9 results for author: Sunkara, S

.
  1. arXiv:2402.04615  [pdf, other

    cs.CV cs.AI

    ScreenAI: A Vision-Language Model for UI and Infographics Understanding

    Authors: Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Cărbune, Jason Lin, **dong Chen, Abhanshu Sharma

    Abstract: Screen user interfaces (UIs) and infographics, sharing similar visual language and design principles, play important roles in human communication and human-machine interaction. We introduce ScreenAI, a vision-language model that specializes in UI and infographics understanding. Our model improves upon the PaLI architecture with the flexible patching strategy of pix2struct and is trained on a uniqu… ▽ More

    Submitted 4 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted to International Joint Conference on Artificial Intelligence (IJCAI), 2024. Revision Notes: full version of the paper, including 1) Camera-ready version for IJCAI-24; 2) Appendices that are mentioned, but not included in 1)

  2. arXiv:2210.02663  [pdf, other

    cs.HC cs.CL cs.CV cs.LG

    Towards Better Semantic Understanding of Mobile Interfaces

    Authors: Srinivas Sunkara, Maria Wang, Lijuan Liu, Gilles Baechler, Yu-Chung Hsiao, **dong, Chen, Abhanshu Sharma, James Stout

    Abstract: Improving the accessibility and automation capabilities of mobile devices can have a significant positive impact on the daily lives of countless users. To stimulate research in this direction, we release a human-annotated dataset with approximately 500k unique annotations aimed at increasing the understanding of the functionality of UI elements. This dataset augments images and view hierarchies fr… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: This paper is to be published at COLING 2022

  3. arXiv:2201.12409  [pdf, other

    cs.CL cs.AI

    A Unified Approach to Entity-Centric Context Tracking in Social Conversations

    Authors: Ulrich Rückert, Srinivas Sunkara, Abhinav Rastogi, Sushant Prakash, Pranav Khaitan

    Abstract: In human-human conversations, Context Tracking deals with identifying important entities and kee** track of their properties and relationships. This is a challenging problem that encompasses several subtasks such as slot tagging, coreference resolution, resolving plural mentions and entity linking. We approach this problem as an end-to-end modeling task where the conversational context is repres… ▽ More

    Submitted 26 April, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: Published at LREC 2022

  4. arXiv:2107.13731  [pdf, other

    cs.CV cs.AI

    UIBert: Learning Generic Multimodal Representations for UI Understanding

    Authors: Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, **dong Chen, Blaise Aguera y Arcas

    Abstract: To improve the accessibility of smart devices and to simplify their usage, building models which understand user interfaces (UIs) and assist users to complete their tasks is critical. However, unique challenges are proposed by UI-specific characteristics, such as how to effectively leverage multimodal UI features that involve image, text, and structural metadata and how to achieve good performance… ▽ More

    Submitted 10 August, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 8 pages, IJCAI 2021

  5. arXiv:2012.12350  [pdf, other

    cs.CL cs.AI

    ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces

    Authors: Zecheng He, Srinivas Sunkara, Xiaoxue Zang, Ying Xu, Lijuan Liu, Nevan Wichers, Gabriel Schubiner, Ruby Lee, **dong Chen, Blaise Agüera y Arcas

    Abstract: As mobile devices are becoming ubiquitous, regularly interacting with a variety of user interfaces (UIs) is a common aspect of daily life for many people. To improve the accessibility of these devices and to enable their usage in a variety of settings, building models that can assist users and accomplish tasks through the UI is vitally important. However, there are several challenges to achieve th… ▽ More

    Submitted 25 January, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: Accepted to AAAI Conference on Artificial Intelligence (AAAI-21)

  6. arXiv:2007.12720  [pdf, other

    cs.CL cs.AI

    MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines

    Authors: Xiaoxue Zang, Abhinav Rastogi, Srinivas Sunkara, Raghav Gupta, Jianguo Zhang, **dong Chen

    Abstract: MultiWOZ is a well-known task-oriented dialogue dataset containing over 10,000 annotated dialogues spanning 8 domains. It is extensively used as a benchmark for dialogue state tracking. However, recent works have reported presence of substantial noise in the dialogue state annotations. MultiWOZ 2.1 identified and fixed many of these erroneous annotations and user utterances, resulting in an improv… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

    Journal ref: Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI (2020) 109-117

  7. arXiv:2002.01359  [pdf, other

    cs.CL

    Schema-Guided Dialogue State Tracking Task at DSTC8

    Authors: Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav Khaitan

    Abstract: This paper gives an overview of the Schema-Guided Dialogue State Tracking task of the 8th Dialogue System Technology Challenge. The goal of this task is to develop dialogue state tracking models suitable for large-scale virtual assistants, with a focus on data-efficient joint modeling across domains and zero-shot generalization to new APIs. This task provided a new dataset consisting of over 16000… ▽ More

    Submitted 2 February, 2020; originally announced February 2020.

    Comments: Presented at DSTC workshop, AAAI 2020. arXiv admin note: text overlap with arXiv:1909.05855

  8. arXiv:1911.06394  [pdf, other

    cs.CL

    The Eighth Dialog System Technology Challenge

    Authors: Seokhwan Kim, Michel Galley, Chulaka Gunasekara, Sung** Lee, Adam Atkinson, Baolin Peng, Hannes Schulz, Jianfeng Gao, **chao Li, Mahmoud Adada, Minlie Huang, Luis Lastras, Jonathan K. Kummerfeld, Walter S. Lasecki, Chiori Hori, Anoop Cherian, Tim K. Marks, Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta

    Abstract: This paper introduces the Eighth Dialog System Technology Challenge. In line with recent challenges, the eighth edition focuses on applying end-to-end dialog technologies in a pragmatic way for multi-domain task-completion, noetic response selection, audio visual scene-aware dialog, and schema-guided dialog state tracking tasks. This paper describes the task definition, provided datasets, and eval… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

    Comments: Submitted to NeurIPS 2019 3rd Conversational AI Workshop

  9. arXiv:1909.05855  [pdf, other

    cs.CL

    Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset

    Authors: Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav Khaitan

    Abstract: Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlap** functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue… ▽ More

    Submitted 29 January, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

    Comments: To appear at AAAI 2020