Search | arXiv e-print repository

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

Authors: Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi

Abstract: The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a… ▽ More The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a single framework. In this work, we introduce VistaLLM, a powerful visual system that addresses coarse- and fine-grained VL tasks over single and multiple input images using a unified framework. VistaLLM utilizes an instruction-guided image tokenizer that filters global embeddings using task descriptions to extract compressed and refined features from numerous images. Moreover, VistaLLM employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences, significantly improving over previously used uniform sampling. To bolster the desired capability of VistaLLM, we curate CoinIt, a comprehensive coarse-to-fine instruction tuning dataset with 6.8M samples. We also address the lack of multi-image grounding datasets by introducing a novel task, AttCoSeg (Attribute-level Co-Segmentation), which boosts the model's reasoning and grounding capability over multiple input images. Extensive experiments on a wide range of V- and VL tasks demonstrate the effectiveness of VistaLLM by achieving consistent state-of-the-art performance over strong baselines across all downstream tasks. Our project page can be found at https://shramanpramanick.github.io/VistaLLM/. △ Less

Submitted 19 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: CVPR 2024 Highlight

arXiv:2311.18259 [pdf, other]

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, **g Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, **g Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community. Project page: http://ego-exo4d-data.org/ △ Less

Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

arXiv:2311.04536 [pdf, other]

Uniform Partitioning of a Bounded Region using Opaque ASYNC Luminous Mobile Robots

Authors: Subhajit Pramanick, Saswata Jana, Adri Bhattacharya, Partha Sarathi Mandal

Abstract: We are given $N$ autonomous mobile robots inside a bounded region. The robots are opaque which means that three collinear robots are unable to see each other as one of the robots acts as an obstruction for the other two. They operate in classical \emph{Look-Compute-Move} (LCM) activation cycles. Moreover, the robots are oblivious except for a persistent light (which is why they are called \emph{Lu… ▽ More We are given $N$ autonomous mobile robots inside a bounded region. The robots are opaque which means that three collinear robots are unable to see each other as one of the robots acts as an obstruction for the other two. They operate in classical \emph{Look-Compute-Move} (LCM) activation cycles. Moreover, the robots are oblivious except for a persistent light (which is why they are called \emph{Luminous robots}) that can determine a color from a fixed color set. Obliviousness does not allow the robots to remember any information from past activation cycles. The Uniform Partitioning problem requires the robots to partition the whole region into sub-regions of equal area, each of which contains exactly one robot. Due to application-oriented motivation, we, in this paper consider the region to be well-known geometric shapes such as rectangle, square and circle. We investigate the problem in \emph{asynchronous} setting where there is no notion of common time and any robot gets activated at any time with a fair assumption that every robot needs to get activated infinitely often. To the best of our knowledge, this is the first attempt to study the Uniform Partitioning problem using oblivious opaque robots working under asynchronous settings. We propose three algorithms considering three different regions: rectangle, square and circle. The algorithms proposed for rectangular and square regions run in $O(N)$ epochs whereas the algorithm for circular regions runs in $O(N^2)$ epochs, where an epoch is the smallest unit of time in which all robots are activated at least once and execute their LCM cycles. The algorithms for the rectangular, square and circular regions require $2$ (which is optimal), $5$ and $8$ colors, respectively. △ Less

Submitted 1 May, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: This paper recently got accepted in ICDCN 2024

arXiv:2307.16715 [pdf, other]

UniVTG: Towards Unified Video-Language Temporal Grounding

Authors: Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex **peng Wang, Rui Yan, Mike Zheng Shou

Abstract: Video Temporal Grounding (VTG), which aims to ground target clips from videos (such as consecutive intervals or disjoint shots) according to custom language queries (e.g., sentences or words), is key for video browsing on social media. Most methods in this direction develop taskspecific models that are trained with type-specific labels, such as moment retrieval (time interval) and highlight detect… ▽ More Video Temporal Grounding (VTG), which aims to ground target clips from videos (such as consecutive intervals or disjoint shots) according to custom language queries (e.g., sentences or words), is key for video browsing on social media. Most methods in this direction develop taskspecific models that are trained with type-specific labels, such as moment retrieval (time interval) and highlight detection (worthiness curve), which limits their abilities to generalize to various VTG tasks and labels. In this paper, we propose to Unify the diverse VTG labels and tasks, dubbed UniVTG, along three directions: Firstly, we revisit a wide range of VTG labels and tasks and define a unified formulation. Based on this, we develop data annotation schemes to create scalable pseudo supervision. Secondly, we develop an effective and flexible grounding model capable of addressing each task and making full use of each label. Lastly, thanks to the unified framework, we are able to unlock temporal grounding pretraining from large-scale diverse labels and develop stronger grounding abilities e.g., zero-shot grounding. Extensive experiments on three tasks (moment retrieval, highlight detection and video summarization) across seven datasets (QVHighlights, Charades-STA, TACoS, Ego4D, YouTube Highlights, TVSum, and QFVS) demonstrate the effectiveness and flexibility of our proposed framework. The codes are available at https://github.com/showlab/UniVTG. △ Less

Submitted 18 August, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV 2023. 16 pages, 10 figures, 13 tables. Code: https://github.com/showlab/UniVTG

arXiv:2307.05463 [pdf, other]

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

Authors: Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang

Abstract: Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks. However, existing egocentric VLP frameworks utilize separate video and language encoders and learn task-specific cross-modal information only during fine-tuning, limiting the development of a unified system. In this work, we introduce the second generation of e… ▽ More Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks. However, existing egocentric VLP frameworks utilize separate video and language encoders and learn task-specific cross-modal information only during fine-tuning, limiting the development of a unified system. In this work, we introduce the second generation of egocentric video-language pre-training (EgoVLPv2), a significant improvement from the previous generation, by incorporating cross-modal fusion directly into the video and language backbones. EgoVLPv2 learns strong video-text representation during pre-training and reuses the cross-modal attention modules to support different downstream tasks in a flexible and efficient manner, reducing fine-tuning costs. Moreover, our proposed fusion in the backbone strategy is more lightweight and compute-efficient than stacking additional fusion-specific layers. Extensive experiments on a wide range of VL tasks demonstrate the effectiveness of EgoVLPv2 by achieving consistent state-of-the-art performance over strong baselines across all downstream. Our project page can be found at https://shramanpramanick.github.io/EgoVLPv2/. △ Less

Submitted 18 August, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

Comments: Published in ICCV 2023

arXiv:2210.04135 [pdf, other]

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

Authors: Shraman Pramanick, Li **g, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann LeCun, Rama Chellappa

Abstract: Vision-language pre-training (VLP) has recently proven highly effective for various uni- and multi-modal downstream applications. However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression comprehension. Unfortunately, such high-resolution images with accu… ▽ More Vision-language pre-training (VLP) has recently proven highly effective for various uni- and multi-modal downstream applications. However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression comprehension. Unfortunately, such high-resolution images with accurate bounding box annotations are expensive to collect and use for supervision at scale. In this work, we propose VoLTA (Vision-Language Transformer with weakly-supervised local-feature Alignment), a new VLP paradigm that only utilizes image-caption data but achieves fine-grained region-level image understanding, eliminating the use of expensive box annotations. VoLTA adopts graph optimal transport-based weakly-supervised alignment on local image patches and text tokens to germinate an explicit, self-normalized, and interpretable low-level matching criterion. In addition, VoLTA pushes multi-modal fusion deep into the uni-modal backbones during pre-training and removes fusion-specific transformer layers, further reducing memory requirements. Extensive experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA on fine-grained applications without compromising the coarse-grained downstream performance, often outperforming methods using significantly more caption and box annotations. △ Less

Submitted 29 October, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

Comments: Published in TMLR 2023

arXiv:2204.13861 [pdf, other]

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Authors: Shraman Pramanick, Ewa M. Nowara, Joshua Gleason, Carlos D. Castillo, Rama Chellappa

Abstract: Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a… ▽ More Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a single image possibly having only a few geo-locating cues. For these reasons, most existing works are restricted to specific cities, imagery, or worldwide landmarks. In this work, we focus on develo** an efficient solution to planet-scale single-image geo-localization. To this end, we propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image and produces robust feature representation under extreme appearance variations. TransLocator takes an RGB image and its semantic segmentation map as inputs, interacts between its two parallel branches after each transformer layer, and simultaneously performs geo-localization and scene recognition in a multi-task fashion. We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the state-of-the-art. TransLocator is also validated on real-world test images and found to be more effective than previous methods. △ Less

Submitted 25 July, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: Accepted in ECCV 2022

arXiv:2110.10949 [pdf, other]

Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection

Authors: Shraman Pramanick, Aniket Roy, Vishal M. Patel

Abstract: Multimodal learning is an emerging yet challenging research area. In this paper, we deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs. Being a fleeting action, which is reflected across the modalities, sarcasm detection is challenging since large datasets are not available for this task in the literature. Therefore, we primarily focus on resource-cons… ▽ More Multimodal learning is an emerging yet challenging research area. In this paper, we deal with multimodal sarcasm and humor detection from conversational videos and image-text pairs. Being a fleeting action, which is reflected across the modalities, sarcasm detection is challenging since large datasets are not available for this task in the literature. Therefore, we primarily focus on resource-constrained training, where the number of training samples is limited. To this end, we propose a novel multimodal learning system, MuLOT (Multimodal Learning using Optimal Transport), which utilizes self-attention to exploit intra-modal correspondence and optimal transport for cross-modal correspondence. Finally, the modalities are combined with multimodal attention fusion to capture the inter-dependencies across modalities. We test our approach for multimodal sarcasm and humor detection on three benchmark datasets - MUStARD (video, audio, text), UR-FUNNY (video, audio, text), MST (image, text) and obtain 2.1%, 1.54%, and 2.34% accuracy improvements over state-of-the-art. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: Accepted to WACV 2022

arXiv:2110.00413 [pdf, other]

Detecting Harmful Memes and Their Targets

Authors: Shraman Pramanick, Dimitar Dimitrov, Rituparna Mukherjee, Shivam Sharma, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

Abstract: Among the various modes of communication in social media, the use of Internet memes has emerged as a powerful means to convey political, psychological, and socio-cultural opinions. Although memes are typically humorous in nature, recent days have witnessed a proliferation of harmful memes targeted to abuse various social entities. As most harmful memes are highly satirical and abstruse without app… ▽ More Among the various modes of communication in social media, the use of Internet memes has emerged as a powerful means to convey political, psychological, and socio-cultural opinions. Although memes are typically humorous in nature, recent days have witnessed a proliferation of harmful memes targeted to abuse various social entities. As most harmful memes are highly satirical and abstruse without appropriate contexts, off-the-shelf multimodal models may not be adequate to understand their underlying semantics. In this work, we propose two novel problem formulations: detecting harmful memes and the social entities that these harmful memes target. To this end, we present HarMeme, the first benchmark dataset, containing 3,544 memes related to COVID-19. Each meme went through a rigorous two-stage annotation process. In the first stage, we labeled a meme as very harmful, partially harmful, or harmless; in the second stage, we further annotated the type of target(s) that each harmful meme points to: individual, organization, community, or society/general public/other. The evaluation results using ten unimodal and multimodal models highlight the importance of using multimodal signals for both tasks. We further discuss the limitations of these models and we argue that more research is needed to address these problems. △ Less

Submitted 24 September, 2021; originally announced October 2021.

Comments: harmful memes, multimodality, social media

MSC Class: 68T50 ACM Class: F.2.2; I.2.7

Journal ref: ACL-2021 (Findings)

arXiv:2109.05184 [pdf, other]

MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets

Authors: Shraman Pramanick, Shivam Sharma, Dimitar Dimitrov, Md Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

Abstract: Internet memes have become powerful means to transmit political, psychological, and socio-cultural ideas. Although memes are typically humorous, recent days have witnessed an escalation of harmful memes used for trolling, cyberbullying, and abuse. Detecting such memes is challenging as they can be highly satirical and cryptic. Moreover, while previous work has focused on specific aspects of memes… ▽ More Internet memes have become powerful means to transmit political, psychological, and socio-cultural ideas. Although memes are typically humorous, recent days have witnessed an escalation of harmful memes used for trolling, cyberbullying, and abuse. Detecting such memes is challenging as they can be highly satirical and cryptic. Moreover, while previous work has focused on specific aspects of memes such as hate speech and propaganda, there has been little work on harm in general. Here, we aim to bridge this gap. We focus on two tasks: (i)detecting harmful memes, and (ii)identifying the social entities they target. We further extend a recently released HarMeme dataset, which covered COVID-19, with additional memes and a new topic: US politics. To solve these tasks, we propose MOMENTA (MultimOdal framework for detecting harmful MemEs aNd Their tArgets), a novel multimodal deep neural network that uses global and local perspectives to detect harmful memes. MOMENTA systematically analyzes the local and the global perspective of the input meme (in both modalities) and relates it to the background context. MOMENTA is interpretable and generalizable, and our experiments show that it outperforms several strong rivaling approaches. △ Less

Submitted 22 September, 2021; v1 submitted 11 September, 2021; originally announced September 2021.

Comments: The paper has been accepted in the Findings of Empirical Methods in Natural Language Processing (EMNLP), 2021

arXiv:2107.04885 [pdf, other]

Filling MIS Vertices by Myopic Luminous Robots

Authors: Subhajit Pramanick, Sai Vamshi Samala, Debasish Pattanayak, Partha Sarathi Mandal

Abstract: We present the problem of finding a maximal independent set (MIS) (named as \emph{MIS Filling problem}) of an arbitrary connected graph having $n$ vertices with luminous myopic mobile robots. The robots enter the graph one after another from a particular vertex called the \emph{Door} and disperse along the edges of the graph without collision to occupy vertices such that the set of vertices occupi… ▽ More We present the problem of finding a maximal independent set (MIS) (named as \emph{MIS Filling problem}) of an arbitrary connected graph having $n$ vertices with luminous myopic mobile robots. The robots enter the graph one after another from a particular vertex called the \emph{Door} and disperse along the edges of the graph without collision to occupy vertices such that the set of vertices occupied by the robots is a maximal independent set. We assume the robots have knowledge only about the maximum degree of the graph, denoted by $Δ$. In this paper, we explore two versions of the problem: the solution to the first version, named as \emph{MIS Filling with Single Door}, works under an asynchronous scheduler using robots with 3 hops of visibility range, $Δ+ 6$ number of colors and $O(\log Δ)$ bits of persistent storage. The time complexity is measured in terms of epochs and it can be solved in $O(n^2)$ epochs. An epoch is the smallest time interval in which each participating robot gets activated and executes the algorithm at least once. For the second version with $k~ ( > 1)$ \textit{Doors}, named as \emph{MIS Filling with Multiple Doors}, the solution works under a semi-synchronous scheduler using robots with 5 hops of visibility range, $Δ+ k + 6$ number of colors and $O(\log (Δ+ k))$ bits of persistent storage. The problem with multiple Doors can be solved in $O(n^2)$ epochs. △ Less

Submitted 15 October, 2022; v1 submitted 10 July, 2021; originally announced July 2021.

Comments: A version of this paper appears in the Proceedings of ICDCIT'23

arXiv:2105.09601 [pdf, other]

See, Hear, Read: Leveraging Multimodality with Guided Attention for Abstractive Text Summarization

Authors: Yash Kumar Atri, Shraman Pramanick, Vikram Goyal, Tanmoy Chakraborty

Abstract: In recent years, abstractive text summarization with multimodal inputs has started drawing attention due to its ability to accumulate information from different source modalities and generate a fluent textual summary. However, existing methods use short videos as the visual modality and short summary as the ground-truth, therefore, perform poorly on lengthy videos and long ground-truth summary. Ad… ▽ More In recent years, abstractive text summarization with multimodal inputs has started drawing attention due to its ability to accumulate information from different source modalities and generate a fluent textual summary. However, existing methods use short videos as the visual modality and short summary as the ground-truth, therefore, perform poorly on lengthy videos and long ground-truth summary. Additionally, there exists no benchmark dataset to generalize this task on videos of varying lengths. In this paper, we introduce AVIATE, the first large-scale dataset for abstractive text summarization with videos of diverse duration, compiled from presentations in well-known academic conferences like NDSS, ICML, NeurIPS, etc. We use the abstract of corresponding research papers as the reference summaries, which ensure adequate quality and uniformity of the ground-truth. We then propose FLORAL, a factorized multi-modal Transformer based decoder-only language model, which inherently captures the intra-modal and inter-modal dynamics within various input modalities for the text summarization task. FLORAL utilizes an increasing number of self-attentions to capture multimodality and performs significantly better than traditional encoder-decoder based networks. Extensive experiments illustrate that FLORAL achieves significant improvement over the baselines in both qualitative and quantitative evaluations on the existing How2 dataset for short videos and newly introduced AVIATE dataset for videos with diverse duration, beating the best baseline on the two datasets by $1.39$ and $2.74$ ROUGE-L points respectively. △ Less

Submitted 15 September, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: Journal paper accepted in Knowledge Based Systems

arXiv:2103.12377 [pdf, other]

Exercise? I thought you said 'Extra Fries': Leveraging Sentence Demarcations and Multi-hop Attention for Meme Affect Analysis

Authors: Shraman Pramanick, Md Shad Akhtar, Tanmoy Chakraborty

Abstract: Today's Internet is awash in memes as they are humorous, satirical, or ironic which make people laugh. According to a survey, 33% of social media users in age bracket [13-35] send memes every day, whereas more than 50% send every week. Some of these memes spread rapidly within a very short time-frame, and their virality depends on the novelty of their (textual and visual) content. A few of them co… ▽ More Today's Internet is awash in memes as they are humorous, satirical, or ironic which make people laugh. According to a survey, 33% of social media users in age bracket [13-35] send memes every day, whereas more than 50% send every week. Some of these memes spread rapidly within a very short time-frame, and their virality depends on the novelty of their (textual and visual) content. A few of them convey positive messages, such as funny or motivational quotes; while others are meant to mock/hurt someone's feelings through sarcastic or offensive messages. Despite the appealing nature of memes and their rapid emergence on social media, effective analysis of memes has not been adequately attempted to the extent it deserves. In this paper, we attempt to solve the same set of tasks suggested in the SemEval'20-Memotion Analysis competition. We propose a multi-hop attention-based deep neural network framework, called MHA-MEME, whose prime objective is to leverage the spatial-domain correspondence between the visual modality (an image) and various textual segments to extract fine-grained feature representations for classification. We evaluate MHA-MEME on the 'Memotion Analysis' dataset for all three sub-tasks - sentiment classification, affect classification, and affect class quantification. Our comparative study shows sota performances of MHA-MEME for all three tasks compared to the top systems that participated in the competition. Unlike all the baselines which perform inconsistently across all three tasks, MHA-MEME outperforms baselines in all the tasks on average. Moreover, we validate the generalization of MHA-MEME on another set of manually annotated test samples and observe it to be consistent. Finally, we establish the interpretability of MHA-MEME. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: Accepted for publication in ICWSM-2021

Showing 1–13 of 13 results for author: Pramanick, S