Skip to main content

Showing 1–3 of 3 results for author: Unell, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01791  [pdf, other

    cs.CV cs.AI

    μ-Bench: A Vision-Language Benchmark for Microscopy Understanding

    Authors: Alejandro Lozano, Jeffrey Nirschl, James Burgess, Sanket Rajan Gupte, Yuhui Zhang, Alyssa Unell, Serena Yeung-Levy

    Abstract: Recent advances in microscopy have enabled the rapid generation of terabytes of image data in cell biology and biomedical research. Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis, enhancing researchers' efficiency, identifying new image biomarkers, and accelerating hypothesis generation and scientific discovery. However, there is a lack of standa… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.09366  [pdf, other

    cs.LG cs.CV q-bio.NC

    Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

    Authors: Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo

    Abstract: Maximum Manifold Capacity Representations (MMCR) is a recent multi-view self-supervised learning (MVSSL) method that matches or surpasses other leading MVSSL methods. MMCR is intriguing because it does not fit neatly into any of the commonplace MVSSL lineages, instead originating from a statistical mechanical perspective on the linear separability of data manifolds. In this paper, we seek to impro… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2405.18415  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Why are Visually-Grounded Language Models Bad at Image Classification?

    Authors: Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy

    Abstract: Image classification is one of the most fundamental capabilities of machine vision intelligence. In this work, we revisit the image classification task using visually-grounded language models (VLMs) such as GPT-4V and LLaVA. We find that existing proprietary and public VLMs, despite often using CLIP as a vision encoder and having many more parameters, significantly underperform CLIP on standard im… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.