-
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
Authors:
Alejandro Lozano,
Jeffrey Nirschl,
James Burgess,
Sanket Rajan Gupte,
Yuhui Zhang,
Alyssa Unell,
Serena Yeung-Levy
Abstract:
Recent advances in microscopy have enabled the rapid generation of terabytes of image data in cell biology and biomedical research. Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis, enhancing researchers' efficiency, identifying new image biomarkers, and accelerating hypothesis generation and scientific discovery. However, there is a lack of standa…
▽ More
Recent advances in microscopy have enabled the rapid generation of terabytes of image data in cell biology and biomedical research. Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis, enhancing researchers' efficiency, identifying new image biomarkers, and accelerating hypothesis generation and scientific discovery. However, there is a lack of standardized, diverse, and large-scale vision-language benchmarks to evaluate VLMs' perception and cognition capabilities in biological image understanding. To address this gap, we introduce μ-Bench, an expert-curated benchmark encompassing 22 biomedical tasks across various scientific disciplines (biology, pathology), microscopy modalities (electron, fluorescence, light), scales (subcellular, cellular, tissue), and organisms in both normal and abnormal states. We evaluate state-of-the-art biomedical, pathology, and general VLMs on μ-Bench and find that: i) current models struggle on all categories, even for basic tasks such as distinguishing microscopy modalities; ii) current specialist models fine-tuned on biomedical data often perform worse than generalist models; iii) fine-tuning in specific microscopy domains can cause catastrophic forgetting, eroding prior biomedical knowledge encoded in their base model. iv) weight interpolation between fine-tuned and pre-trained models offers one solution to forgetting and improves general performance across biomedical tasks. We release μ-Bench under a permissive license to accelerate the research and development of microscopy foundation models.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Revisiting Active Learning in the Era of Vision Foundation Models
Authors:
Sanket Rajan Gupte,
Josiah Aklilu,
Jeffrey J. Nirschl,
Serena Yeung-Levy
Abstract:
Foundation vision or vision-language models are trained on large unlabeled or noisy data and learn robust representations that can achieve impressive zero- or few-shot performance on diverse tasks. Given these properties, they are a natural fit for active learning (AL), which aims to maximize labeling efficiency. However, the full potential of foundation models has not been explored in the context…
▽ More
Foundation vision or vision-language models are trained on large unlabeled or noisy data and learn robust representations that can achieve impressive zero- or few-shot performance on diverse tasks. Given these properties, they are a natural fit for active learning (AL), which aims to maximize labeling efficiency. However, the full potential of foundation models has not been explored in the context of AL, specifically in the low-budget regime. In this work, we evaluate how foundation models influence three critical components of effective AL, namely, 1) initial labeled pool selection, 2) ensuring diverse sampling, and 3) the trade-off between representative and uncertainty sampling. We systematically study how the robust representations of foundation models (DINOv2, OpenCLIP) challenge existing findings in active learning. Our observations inform the principled construction of a new simple and elegant AL strategy that balances uncertainty estimated via dropout with sample diversity. We extensively test our strategy on many challenging image classification benchmarks, including natural images as well as out-of-domain biomedical images that are relatively understudied in the AL literature. We also provide a highly performant and efficient implementation of modern AL strategies (including our method) at https://github.com/sanketx/AL-foundation-models.
△ Less
Submitted 24 June, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Optimization of utility-based shortfall risk: A non-asymptotic viewpoint
Authors:
Sumedh Gupte,
Prashanth L. A.,
Sanjay P. Bhat
Abstract:
We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR. Next, in the context of UBSR optimization, we derive an expression for the UBSR gradient under a smooth p…
▽ More
We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR. Next, in the context of UBSR optimization, we derive an expression for the UBSR gradient under a smooth parameterization. This expression is a ratio of expectations, both of which involve the UBSR. We use SAA for the numerator as well as denominator in the UBSR gradient expression to arrive at a biased gradient estimator. We derive non-asymptotic bounds on the estimation error, which show that our gradient estimator is asymptotically unbiased. We incorporate the aforementioned gradient estimator into a stochastic gradient (SG) algorithm for UBSR optimization. Finally, we derive non-asymptotic bounds that quantify the rate of convergence of our SG algorithm for UBSR optimization.
△ Less
Submitted 30 March, 2024; v1 submitted 28 October, 2023;
originally announced October 2023.
-
Divergence Based Quadrangle and Applications
Authors:
Anton Malandii,
Siddhartha Gupte,
Cheng Peng,
Stan Uryasev
Abstract:
This paper introduces a novel framework for assessing risk and decision-making in the presence of uncertainty, the \emph{$\varphi$-Divergence Quadrangle}. This approach expands upon the traditional Risk Quadrangle, a model that quantifies uncertainty through four key components: \emph{risk, deviation, regret}, and \emph{error}. The $\varphi$-Divergence Quadrangle incorporates the $\varphi$-diverge…
▽ More
This paper introduces a novel framework for assessing risk and decision-making in the presence of uncertainty, the \emph{$\varphi$-Divergence Quadrangle}. This approach expands upon the traditional Risk Quadrangle, a model that quantifies uncertainty through four key components: \emph{risk, deviation, regret}, and \emph{error}. The $\varphi$-Divergence Quadrangle incorporates the $\varphi$-divergence as a measure of the difference between probability distributions, thereby providing a more nuanced understanding of risk. Importantly, the $\varphi$-Divergence Quadrangle is closely connected with the distributionally robust optimization based on the $\varphi$-divergence approach through the duality theory of convex functionals. To illustrate its practicality and versatility, several examples of the $\varphi$-Divergence Quadrangle are provided, including the Quantile Quadrangle. The final portion of the paper outlines a case study implementing regression with the Entropic Value-at-Risk Quadrangle. The proposed $\varphi$-Divergence Quadrangle presents a refined methodology for understanding and managing risk, contributing to the ongoing development of risk assessment and management strategies.
△ Less
Submitted 12 July, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments
Authors:
Tharindu Ranasinghe,
Sarthak Gupte,
Marcos Zampieri,
Ifeoma Nwogu
Abstract:
This paper describes the WLV-RIT entry to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) shared task 2020. The HASOC 2020 organizers provided participants with annotated datasets containing social media posts of code-mixed in Dravidian languages (Malayalam-English and Tamil-English). We participated in task 1: Offensive comment identification in Code-mixed…
▽ More
This paper describes the WLV-RIT entry to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) shared task 2020. The HASOC 2020 organizers provided participants with annotated datasets containing social media posts of code-mixed in Dravidian languages (Malayalam-English and Tamil-English). We participated in task 1: Offensive comment identification in Code-mixed Malayalam Youtube comments. In our methodology, we take advantage of available English data by applying cross-lingual contextual word embeddings and transfer learning to make predictions to Malayalam data. We further improve the results using various fine tuning strategies. Our system achieved 0.89 weighted average F1 score for the test set and it ranked 5th place out of 12 participants.
△ Less
Submitted 1 November, 2020;
originally announced November 2020.
-
TutorialVQA: Question Answering Dataset for Tutorial Videos
Authors:
Anthony Colas,
Seokhwan Kim,
Franck Dernoncourt,
Siddhesh Gupte,
Daisy Zhe Wang,
Doo Soon Kim
Abstract:
Despite the number of currently available datasets on video question answering, there still remains a need for a dataset involving multi-step and non-factoid answers. Moreover, relying on video transcripts remains an under-explored topic. To adequately address this, We propose a new question answering task on instructional videos, because of their verbose and narrative nature. While previous studi…
▽ More
Despite the number of currently available datasets on video question answering, there still remains a need for a dataset involving multi-step and non-factoid answers. Moreover, relying on video transcripts remains an under-explored topic. To adequately address this, We propose a new question answering task on instructional videos, because of their verbose and narrative nature. While previous studies on video question answering have focused on generating a short text as an answer, given a question and video clip, our task aims to identify a span of a video segment as an answer which contains instructional details with various granularities. This work focuses on screencast tutorial videos pertaining to an image editing program. We introduce a dataset, TutorialVQA, consisting of about 6,000manually collected triples of (video, question, answer span). We also provide experimental results with several baselines algorithms using the video transcripts. The results indicate that the task is challenging and call for the investigation of new algorithms.
△ Less
Submitted 30 May, 2020; v1 submitted 2 December, 2019;
originally announced December 2019.
-
Learning fashion compatibility across apparel categories for outfit recommendation
Authors:
Luisa F. Polania,
Satyajit Gupte
Abstract:
This paper addresses the problem of generating recommendations for completing the outfit given that a user is interested in a particular apparel item. The proposed method is based on a siamese network used for feature extraction followed by a fully-connected network used for learning a fashion compatibility metric. The embeddings generated by the siamese network are augmented with color histogram…
▽ More
This paper addresses the problem of generating recommendations for completing the outfit given that a user is interested in a particular apparel item. The proposed method is based on a siamese network used for feature extraction followed by a fully-connected network used for learning a fashion compatibility metric. The embeddings generated by the siamese network are augmented with color histogram features motivated by the important role that color plays in determining fashion compatibility. The training of the network is formulated as a maximum a posteriori (MAP) problem where Laplacian distributions are assumed for the filters of the siamese network to promote sparsity and matrix-variate normal distributions are assumed for the weights of the metric network to efficiently exploit correlations between the input units of each fully-connected layer.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
On Optimizing Human-Machine Task Assignments
Authors:
Andreas Veit,
Michael Wilber,
Rajan Vaish,
Serge Belongie,
James Davis,
Vishal Anand,
Anshu Aviral,
Prithvijit Chakrabarty,
Yash Chandak,
Sidharth Chaturvedi,
Chinmaya Devaraj,
Ankit Dhall,
Utkarsh Dwivedi,
Sanket Gupte,
Sharath N. Sridhar,
Karthik Paga,
Anuj Pahuja,
Aditya Raisinghani,
Ayush Sharma,
Shweta Sharma,
Darpana Sinha,
Nisarg Thakkar,
K. Bala Vignesh,
Utkarsh Verma,
Kanniganti Abhishek
, et al. (26 additional authors not shown)
Abstract:
When crowdsourcing systems are used in combination with machine inference systems in the real world, they benefit the most when the machine system is deeply integrated with the crowd workers. However, if researchers wish to integrate the crowd with "off-the-shelf" machine classifiers, this deep integration is not always possible. This work explores two strategies to increase accuracy and decrease…
▽ More
When crowdsourcing systems are used in combination with machine inference systems in the real world, they benefit the most when the machine system is deeply integrated with the crowd workers. However, if researchers wish to integrate the crowd with "off-the-shelf" machine classifiers, this deep integration is not always possible. This work explores two strategies to increase accuracy and decrease cost under this setting. First, we show that reordering tasks presented to the human can create a significant accuracy improvement. Further, we show that greedily choosing parameters to maximize machine accuracy is sub-optimal, and joint optimization of the combined system improves performance.
△ Less
Submitted 24 September, 2015;
originally announced September 2015.