Search | arXiv e-print repository

Visualizing and Understanding Patch Interactions in Vision Transformer

Authors: Jie Ma, Yalong Bai, Bineng Zhong, Wei Zhang, Ting Yao, Tao Mei

Abstract: Vision Transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions. Despite having good success, the literature seldom explores the explainability of vision transformer, and there is no clear picture of how the attention mechanism with respect to… ▽ More Vision Transformer (ViT) has become a leading tool in various computer vision tasks, owing to its unique self-attention mechanism that learns visual representations explicitly through cross-patch information interactions. Despite having good success, the literature seldom explores the explainability of vision transformer, and there is no clear picture of how the attention mechanism with respect to the correlation across comprehensive patches will impact the performance and what is the further potential. In this work, we propose a novel explainable visualization approach to analyze and interpret the crucial attention interactions among patches for vision transformer. Specifically, we first introduce a quantification indicator to measure the impact of patch interaction and verify such quantification on attention window design and indiscriminative patches removal. Then, we exploit the effective responsive field of each patch in ViT and devise a window-free transformer architecture accordingly. Extensive experiments on ImageNet demonstrate that the exquisitely designed quantitative method is shown able to facilitate ViT model learning, leading the top-1 accuracy by 4.28% at most. Moreover, the results on downstream fine-grained recognition tasks further validate the generalization of our proposal. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: 15 pages, 14 figures

arXiv:2202.00087 [pdf, other]

Holistic Fine-grained GGS Characterization: From Detection to Unbalanced Classification

Authors: Yuzhe Lu, Haichun Yang, Zuhayr Asad, Zheyu Zhu, Tianyuan Yao, Jiachen Xu, Agnes B. Fogo, Yuankai Huo

Abstract: Recent studies have demonstrated the diagnostic and prognostic values of global glomerulosclerosis (GGS) in IgA nephropathy, aging, and end-stage renal disease. However, the fine-grained quantitative analysis of multiple GGS subtypes (e.g., obsolescent, solidified, and disappearing glomerulosclerosis) is typically a resource extensive manual process. Very few automatic methods, if any, have been d… ▽ More Recent studies have demonstrated the diagnostic and prognostic values of global glomerulosclerosis (GGS) in IgA nephropathy, aging, and end-stage renal disease. However, the fine-grained quantitative analysis of multiple GGS subtypes (e.g., obsolescent, solidified, and disappearing glomerulosclerosis) is typically a resource extensive manual process. Very few automatic methods, if any, have been developed to bridge this gap for such analytics. In this paper, we present a holistic pipeline to quantify GGS (with both detection and classification) from a whole slide image in a fully automatic manner. In addition, we conduct the fine-grained classification for the sub-types of GGS. Our study releases the open-source quantitative analytical tool for fine-grained GGS characterization while tackling the technical challenges in unbalanced classification and integrating detection and classification. △ Less

Submitted 31 January, 2022; originally announced February 2022.

arXiv:2201.09371 [pdf, other]

doi 10.1214/22-AOAS1696

Probabilistic Learning of Treatment Trees in Cancer

Authors: Tsung-Hung Yao, Zhenke Wu, Karthik Bharath, **ju Li, Veerabhadran Baladandayuthapan

Abstract: Accurate identification of synergistic treatment combinations and their underlying biological mechanisms is critical across many disease domains, especially cancer. In translational oncology research, preclinical systems such as patient-derived xenografts (PDX) have emerged as a unique study design evaluating multiple treatments administered to samples from the same human tumor implanted into gene… ▽ More Accurate identification of synergistic treatment combinations and their underlying biological mechanisms is critical across many disease domains, especially cancer. In translational oncology research, preclinical systems such as patient-derived xenografts (PDX) have emerged as a unique study design evaluating multiple treatments administered to samples from the same human tumor implanted into genetically identical mice. In this paper, we propose a novel Bayesian probabilistic tree-based framework for PDX data to investigate the hierarchical relationships between treatments by inferring treatment cluster trees, referred to as treatment trees (Rx-tree). The framework motivates a new metric of mechanistic similarity between two or more treatments accounting for inherent uncertainty in tree estimation; treatments with a high estimated similarity have potentially high mechanistic synergy. Building upon Dirichlet Diffusion Trees, we derive a closed-form marginal likelihood encoding the tree structure, which facilitates computationally efficient posterior inference via a new two-stage algorithm. Simulation studies demonstrate superior performance of the proposed method in recovering the tree structure and treatment similarities. Our analyses of a recently collated PDX dataset produce treatment similarity estimates that show a high degree of concordance with known biological mechanisms across treatments in five different cancers. More importantly, we uncover new and potentially effective combination therapies that confer synergistic regulation of specific downstream biological pathways for future clinical investigations. Our accompanying code, data, and shiny application for visualization of results are available at: https://github.com/bayesrx/RxTree. △ Less

Submitted 23 January, 2022; originally announced January 2022.

arXiv:2201.08632 [pdf, ps, other]

Cross $t$-intersecting families for symplectic polar spaces

Authors: Tian Yao, Kaishun Wang

Abstract: Let $\mathscr{P}$ be a symplectic polar space over a finite field $\mathbb{F}_q$, and $\mathscr{P}_m$ denote the collection of all $k$-dimensional totally isotropic subspace in $\mathscr{P}$. Let $\mathscr{F}_1\subset\mathscr{P}_{m_1}$ and $\mathscr{F}_2\subset\mathscr{P}_{m_2}$ satisfy $\dim(F_1\cap F_2)\ge t$ for any $F_1\in\mathscr{F}_1$ and $F_2\in\mathscr{F}_2$. We say they are cross $t$-inte… ▽ More Let $\mathscr{P}$ be a symplectic polar space over a finite field $\mathbb{F}_q$, and $\mathscr{P}_m$ denote the collection of all $k$-dimensional totally isotropic subspace in $\mathscr{P}$. Let $\mathscr{F}_1\subset\mathscr{P}_{m_1}$ and $\mathscr{F}_2\subset\mathscr{P}_{m_2}$ satisfy $\dim(F_1\cap F_2)\ge t$ for any $F_1\in\mathscr{F}_1$ and $F_2\in\mathscr{F}_2$. We say they are cross $t$-intersecting families. Moreover, we say they are trivial if each member of them contains a fixed $t$-dimensional totally isotropic subspace. In this paper, we show that cross $t$-intersecting families with maximum product of sizes are trivial. We also describe the structure of non-trivial $t$-intersecting families with maximum product of sizes. △ Less

Submitted 24 February, 2022; v1 submitted 21 January, 2022; originally announced January 2022.

Comments: Some typos corrected, a reference added, some details of proof added. arXiv admin note: substantial text overlap with arXiv:2201.08084

MSC Class: 05D05; 05A30; 51A50

arXiv:2201.08084 [pdf, ps, other]

Cross $t$-intersecting families for finite affine spaces

Authors: Tian Yao, Kaishun Wang

Abstract: Denote the collection of all $k$-flats in $AG(n,\mathbb{F}_q)$ by $\mathscr{M}(k,n)$. Let $\mathscr{F}_1\subset\mathscr{M}(k_1,n)$ and $\mathscr{F}_2\subset\mathscr{M}(k_2,n)$ satisfy $\dim(F_1\cap F_2)\ge t$ for any $F_1\in\mathscr{F}_1$ and $F_2\in\mathscr{F}_2$. We say they are cross $t$-intersecting families. Moreover, we say they are trivial if each member of them contains a fixed $t$-flats i… ▽ More Denote the collection of all $k$-flats in $AG(n,\mathbb{F}_q)$ by $\mathscr{M}(k,n)$. Let $\mathscr{F}_1\subset\mathscr{M}(k_1,n)$ and $\mathscr{F}_2\subset\mathscr{M}(k_2,n)$ satisfy $\dim(F_1\cap F_2)\ge t$ for any $F_1\in\mathscr{F}_1$ and $F_2\in\mathscr{F}_2$. We say they are cross $t$-intersecting families. Moreover, we say they are trivial if each member of them contains a fixed $t$-flats in $AG(n,\mathbb{F}_q)$. In this paper, we show that cross $t$-intersecting families with maximum product of sizes are trivial. We also describe the structure of non-trivial $t$-intersecting families with maximum product of sizes. △ Less

Submitted 15 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Some typos are corrected

MSC Class: 05D05

arXiv:2201.04029 [pdf, other]

Motion-Focused Contrastive Learning of Video Representations

Authors: Rui Li, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei

Abstract: Motion, as the most distinct phenomenon in a video to involve the changes over time, has been unique and critical to the development of video representation learning. In this paper, we ask the question: how important is the motion particularly for self-supervised video representation learning. To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the… ▽ More Motion, as the most distinct phenomenon in a video to involve the changes over time, has been unique and critical to the development of video representation learning. In this paper, we ask the question: how important is the motion particularly for self-supervised video representation learning. To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning. Specifically, we present a Motion-focused Contrastive Learning (MCL) method that regards such duet as the foundation. On one hand, MCL capitalizes on optical flow of each frame in a video to temporally and spatially sample the tubelets (i.e., sequences of associated frame patches across time) as data augmentations. On the other hand, MCL further aligns gradient maps of the convolutional layers to optical flow maps from spatial, temporal and spatio-temporal perspectives, in order to ground motion information in feature learning. Extensive experiments conducted on R(2+1)D backbone demonstrate the effectiveness of our MCL. On UCF101, the linear classifier trained on the representations learnt by MCL achieves 81.91% top-1 accuracy, outperforming ImageNet supervised pre-training by 6.78%. On Kinetics-400, MCL achieves 66.62% top-1 accuracy under the linear protocol. Code is available at https://github.com/YihengZhang-CV/MCL-Motion-Focused-Contrastive-Learning. △ Less