Skip to main content

Showing 1–17 of 17 results for author: Akula, A

.
  1. arXiv:2309.12768  [pdf, other

    cs.CV

    WiCV@CVPR2023: The Eleventh Women In Computer Vision Workshop at the Annual CVPR Conference

    Authors: Doris Antensteiner, Marah Halawa, Asra Aslam, Ivaxi Sheth, Sachini Herath, Ziqi Huang, Sunnie S. Y. Kim, Aparna Akula, Xin Wang

    Abstract: In this paper, we present the details of Women in Computer Vision Workshop - WiCV 2023, organized alongside the hybrid CVPR 2023 in Vancouver, Canada. WiCV aims to amplify the voices of underrepresented women in the computer vision community, fostering increased visibility in both academia and industry. We believe that such events play a vital role in addressing gender imbalances within the field.… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  2. arXiv:2305.18373  [pdf, other

    cs.CV cs.CL

    KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models

    Authors: Zhiwei Jia, Pradyumna Narayana, Arjun R. Akula, Garima Pruthi, Hao Su, Sugato Basu, Varun Jampani

    Abstract: Image ad understanding is a crucial task with wide real-world applications. Although highly challenging with the involvement of diverse atypical scenes, real-world entities, and reasoning over scene-texts, how to interpret image ads is relatively under-explored, especially in the era of foundational vision-language models (VLMs) featuring impressive generalizability and adaptability. In this paper… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  3. arXiv:2305.15393  [pdf, other

    cs.CV cs.AI

    LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

    Authors: Weixi Feng, Wanrong Zhu, Tsu-jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

    Abstract: Attaining a high degree of user controllability in visual generation often requires intricate, fine-grained inputs like layouts. However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we study how Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions, and thus collaborate with visual genera… ▽ More

    Submitted 28 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  4. arXiv:2305.10722  [pdf, other

    cs.CV

    Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

    Authors: Xuehai He, Weixi Feng, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

    Abstract: Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified in text prompts, can we leverage the powerful representations learned by pre-trained diffusion models for discriminative tasks such as image-text matching? To an… ▽ More

    Submitted 24 April, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

  5. arXiv:2212.09898  [pdf, other

    cs.CV

    MetaCLUE: Towards Comprehensive Visual Metaphors Research

    Authors: Arjun R. Akula, Brendan Driscoll, Pradyumna Narayana, Soravit Changpinyo, Zhiwei Jia, Suyash Damle, Garima Pruthi, Sugato Basu, Leonidas Guibas, William T. Freeman, Yuanzhen Li, Varun Jampani

    Abstract: Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, met… ▽ More

    Submitted 2 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted in CVPR 2023. Project page: https://metaclue.github.io/ , Video summary: https://youtu.be/V3TmeNETL-o

  6. arXiv:2212.05032  [pdf, other

    cs.CV cs.CL

    Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

    Authors: Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

    Abstract: Large-scale diffusion models have achieved state-of-the-art results on text-to-image synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we observe that attribution-binding and compositional capabilities are still considered major challenging issues, especially when involving multiple objects. In this work, we improve the compositional skills of T2I models, s… ▽ More

    Submitted 28 February, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: ICLR 2023 Camera Ready version

  7. arXiv:2210.10362  [pdf, other

    cs.CV cs.AI cs.CL

    CPL: Counterfactual Prompt Learning for Vision and Language Models

    Authors: Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

    Abstract: Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP. However, existing prompt tuning methods tend to learn spurious or entangled representations, which leads to poor generalization to unseen concepts. Towards non-spurious and efficient prompt learning from limited examples, this paper presents a no… ▽ More

    Submitted 4 November, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

  8. arXiv:2207.06552  [pdf, other

    math.NT

    New Formulas for the Riemann Zeta Function

    Authors: Aditya Akula, Ghaith Hiary

    Abstract: A new method for continuing the usual Dirichlet series that defines the Riemann zeta function $ζ(s)$ is presented. Numerical experiments demonstrating the computational efficacy of the resulting continuation are discussed.

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 13 pages, 2 figures

    MSC Class: 11Y35 (Primary) 11M06 (Secondary)

  9. arXiv:2201.11194  [pdf, other

    cs.HC cs.LG

    Attention cannot be an Explanation

    Authors: Arjun R Akula, Song-Chun Zhu

    Abstract: Attention based explanations (viz. saliency maps), by providing interpretability to black box models such as deep neural networks, are assumed to improve human trust and reliance in the underlying models. Recently, it has been shown that attention weights are frequently uncorrelated with gradient-based measures of feature importance. Motivated by this, we ask a follow-up question: "Assuming that w… ▽ More

    Submitted 26 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2109.01401, arXiv:1909.06907

  10. arXiv:2201.09639  [pdf, other

    cs.CV

    Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding

    Authors: Arjun R. Akula

    Abstract: Visual question answering (VQA) is the multi-modal task of answering natural language questions about an input image. Through cross-dataset adaptation methods, it is possible to transfer knowledge from a source dataset with larger train samples to a target dataset where training set is limited. Suppose a VQA model trained on one dataset train set fails in adapting to another, it is hard to identif… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  11. arXiv:2201.06207  [pdf, other

    cs.CV

    Discourse Analysis for Evaluating Coherence in Video Paragraph Captions

    Authors: Arjun R Akula, Song-Chun Zhu

    Abstract: Video paragraph captioning is the task of automatically generating a coherent paragraph description of the actions in a video. Previous linguistic studies have demonstrated that coherence of a natural language text is reflected by its discourse structure and relations. However, existing video captioning methods evaluate the coherence of generated paragraphs by comparing them merely against human p… ▽ More

    Submitted 16 January, 2022; originally announced January 2022.

  12. arXiv:2201.03147  [pdf, other

    cs.HC

    Effective Representation to Capture Collaboration Behaviors between Explainer and User

    Authors: Arjun Akula, Song-Chun Zhu

    Abstract: An explainable AI (XAI) model aims to provide transparency (in the form of justification, explanation, etc) for its predictions or actions made by it. Recently, there has been a lot of focus on building XAI models, especially to provide explanations for understanding and interpreting the predictions made by deep learning models. At UCLA, we propose a generic framework to interact with an XAI model… ▽ More

    Submitted 9 January, 2022; originally announced January 2022.

  13. arXiv:2109.01401  [pdf, other

    cs.AI cs.CV cs.LG

    CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models

    Authors: Arjun R. Akula, Keze Wang, Changsong Liu, Sari Saba-Sadiya, Hong**g Lu, Sinisa Todorovic, Joyce Chai, Song-Chun Zhu

    Abstract: We propose CX-ToM, short for counterfactual explanations with theory-of mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More… ▽ More

    Submitted 2 December, 2021; v1 submitted 3 September, 2021; originally announced September 2021.

    Comments: Accepted by iScience Cell Press Journal 2021. arXiv admin note: text overlap with arXiv:1909.06907

  14. arXiv:2005.01655  [pdf, other

    cs.CL cs.CV

    Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions

    Authors: Arjun R Akula, Spandana Gella, Yaser Al-Onaizan, Song-Chun Zhu, Siva Reddy

    Abstract: Visual referring expression recognition is a challenging task that requires natural language understanding in the context of an image. We critically examine RefCOCOg, a standard benchmark for this task, using a human study and show that 83.7% of test instances do not require reasoning on linguistic structure, i.e., words are enough to identify the target object, the word order doesn't matter. To m… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  15. arXiv:1909.06907  [pdf, other

    cs.AI cs.CV cs.HC cs.LG

    X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust

    Authors: Arjun R. Akula, Changsong Liu, Sari Saba-Sadiya, Hong**g Lu, Sinisa Todorovic, Joyce Y. Chai, Song-Chun Zhu

    Abstract: We present a new explainable AI (XAI) framework aimed at increasing justified human trust and reliance in the AI machine through explanations. We pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, the machine generates sequence of explanations in a dialog which takes into account three important aspects at each dialog turn: (a)… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: A short version of this was presented at CVPR 2019 Workshop on Explainable AI

  16. arXiv:1903.05720  [pdf, other

    cs.AI

    Natural Language Interaction with Explainable AI Models

    Authors: Arjun R Akula, Sinisa Todorovic, Joyce Y Chai, Song-Chun Zhu

    Abstract: This paper presents an explainable AI (XAI) system that provides explanations for its predictions. The system consists of two key components -- namely, the prediction And-Or graph (AOG) model for recognizing and localizing concepts of interest in input data, and the XAI model for providing explanations to the user about the AOG's predictions. In this work, we focus on the XAI model specified to in… ▽ More

    Submitted 7 July, 2019; v1 submitted 13 March, 2019; originally announced March 2019.

    Journal ref: CVPR 2019 Workshop on Explainable AI

  17. arXiv:1903.02252  [pdf, other

    cs.CV

    Discourse Parsing in Videos: A Multi-modal Appraoch

    Authors: Arjun R. Akula, Song-Chun Zhu

    Abstract: Text-level discourse parsing aims to unmask how two sentences in the text are related to each other. We propose the task of Visual Discourse Parsing, which requires understanding discourse relations among scenes in a video. Here we use the term scene to refer to a subset of video frames that can better summarize the video. In order to collect a dataset for learning discourse cues from videos, one… ▽ More

    Submitted 22 January, 2022; v1 submitted 6 March, 2019; originally announced March 2019.

    Comments: Accepted in CVPR 2019 Workshop on Language and Vision (Oral Presentation)

    Journal ref: CVPR 2019 Workshop on Language and Vision (Oral Presentation)