Skip to main content

Showing 1–9 of 9 results for author: Boggust, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.03214  [pdf, other

    cs.CV

    LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

    Authors: Walid Bousselham, Angie Boggust, Sofian Chaybouti, Hendrik Strobelt, Hilde Kuehne

    Abstract: Vision Transformers (ViTs), with their ability to model long-range dependencies through self-attention mechanisms, have become a standard architecture in computer vision. However, the interpretability of these models remains a challenge. To address this, we propose LeGrad, an explainability method specifically designed for ViTs. LeGrad computes the gradient with respect to the attention maps of Vi… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Code available at https://github.com/WalBouss/LeGrad

  2. arXiv:2309.09944  [pdf, other

    cs.LG cs.AI cs.CV cs.CY

    DiffusionWorldViewer: Exposing and Broadening the Worldview Reflected by Generative Text-to-Image Models

    Authors: Zoe De Simone, Angie Boggust, Arvind Satyanarayan, Ashia Wilson

    Abstract: Generative text-to-image (TTI) models produce high-quality images from short textual descriptions and are widely used in academic and creative domains. Like humans, TTI models have a worldview, a conception of the world learned from their training data and task that influences the images they generate for a given prompt. However, the worldviews of TTI models are often hidden from users, making it… ▽ More

    Submitted 5 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: 20 pages, 8 figures

  3. arXiv:2307.05356  [pdf, other

    cs.CV cs.HC cs.LG

    VisText: A Benchmark for Semantically Rich Chart Captioning

    Authors: Benny J. Tang, Angie Boggust, Arvind Satyanarayan

    Abstract: Captions that describe or explain charts help improve recall and comprehension of the depicted data and provide a more accessible medium for people with visual disabilities. However, current approaches for automatically generating such captions struggle to articulate the perceptual or cognitive features that are the hallmark of charts (e.g., complex trends and patterns). In response, we introduce… ▽ More

    Submitted 28 June, 2023; originally announced July 2023.

    Comments: Published at ACL 2023, 29 pages, 10 figures

  4. Saliency Cards: A Framework to Characterize and Compare Saliency Methods

    Authors: Angie Boggust, Harini Suresh, Hendrik Strobelt, John V. Guttag, Arvind Satyanarayan

    Abstract: Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a model's output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in e… ▽ More

    Submitted 30 May, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Published at FAccT 2023, 19 pages, 8 figures, 2 tables

  5. arXiv:2111.04823  [pdf, other

    cs.CL cs.CV cs.MM cs.SD eess.AS eess.IV

    Cascaded Multilingual Audio-Visual Learning from Videos

    Authors: Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass

    Abstract: In this paper, we explore self-supervised audio-visual models that learn from instructional videos. Prior work has shown that these models can relate spoken words and sounds to visual content after training on a large-scale dataset of videos, but they were only trained and evaluated on videos in English. To learn multilingual audio-visual representations, we propose a cascaded approach that levera… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: Presented at Interspeech 2021. This version contains updated results using the YouCook-Japanese dataset

  6. arXiv:2107.09234  [pdf, other

    cs.LG

    Shared Interest: Measuring Human-AI Alignment to Identify Recurring Patterns in Model Behavior

    Authors: Angie Boggust, Benjamin Hoover, Arvind Satyanarayan, Hendrik Strobelt

    Abstract: Saliency methods -- techniques to identify the importance of input features on a model's output -- are a common step in understanding neural network behavior. However, interpreting saliency requires tedious manual inspection to identify and aggregate patterns in model behavior, resulting in ad hoc or cherry-picked analysis. To address these concerns, we present Shared Interest: metrics for compari… ▽ More

    Submitted 24 March, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: 17 pages, 10 figures. Published in CHI 2022. For more details, see http://shared-interest.csail.mit.edu

  7. arXiv:2104.12671  [pdf, other

    cs.CV

    Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

    Authors: Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

    Abstract: Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities. In this context, this paper proposes a self-supervised training framework that learns a common multimodal embedding space that, in addition to sharing representations across different modalitie… ▽ More

    Submitted 3 September, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

    Comments: To be presented at ICCV 2021

    Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 8012-8021

  8. arXiv:2006.09199  [pdf, other

    cs.CV cs.CL cs.MM cs.SD eess.AS

    AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

    Authors: Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass

    Abstract: Current methods for learning visually grounded language from videos often rely on text annotation, such as human generated captions or machine generated automatic speech recognition (ASR) transcripts. In this work, we introduce the Audio-Video Language Network (AVLnet), a self-supervised network that learns a shared audio-visual embedding space directly from raw video inputs. To circumvent the nee… ▽ More

    Submitted 29 June, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: A version of this work has been accepted to Interspeech 2021

  9. arXiv:1912.04853  [pdf, other

    cs.HC cs.CL cs.LG

    Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small Multiples

    Authors: Angie Boggust, Brandon Carter, Arvind Satyanarayan

    Abstract: Embeddings map** high-dimensional discrete input to lower-dimensional continuous vector spaces have been widely adopted in machine learning applications as a way to capture domain semantics. Interviewing 13 embedding users across disciplines, we find comparing embeddings is a key task for deployment or downstream analysis but unfolds in a tedious fashion that poorly supports systematic explorati… ▽ More

    Submitted 4 March, 2022; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Published in IUI 2022; Equal contribution by first two authors