Skip to main content

Showing 1–50 of 191 results for author: Yao, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00596  [pdf, other

    eess.IV cs.CV

    HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

    Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

    Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.19286

  2. arXiv:2407.00247  [pdf, other

    cs.CV

    Prompt Refinement with Image Pivot for Text-to-Image Generation

    Authors: **gtao Zhan, Qingyao Ai, Yiqun Liu, Yingwei Pan, Ting Yao, Jiaxin Mao, Shao** Ma, Tao Mei

    Abstract: For text-to-image generation, automatically refining user-provided natural language prompts into the keyword-enriched prompts favored by systems is essential for the user experience. Such a prompt refinement process is analogous to translating the prompt from "user languages" into "system languages". However, the scarcity of such parallel corpora makes it difficult to train a prompt refinement mod… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024

  3. MuGSI: Distilling GNNs with Multi-Granularity Structural Information for Graph Classification

    Authors: Tianjun Yao, Jiaqi Sun, Defu Cao, Kun Zhang, Guangyi Chen

    Abstract: Recent works have introduced GNN-to-MLP knowledge distillation (KD) frameworks to combine both GNN's superior performance and MLP's fast inference speed. However, existing KD frameworks are primarily designed for node classification within single graphs, leaving their applicability to graph classification largely unexplored. Two main challenges arise when extending KD for node classification to gr… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures. Accepted by TheWebConf2024

    ACM Class: I.2.6

  4. arXiv:2406.19540  [pdf, other

    cs.CV

    Weighted Circle Fusion: Ensembling Circle Representation from Different Object Detection Results

    Authors: Jialin Yue, Tianyuan Yao, Ruining Deng, Quan Liu, Juming Xiong, Haichun Yang, Yuankai Huo

    Abstract: Recently, the use of circle representation has emerged as a method to improve the identification of spherical objects (such as glomeruli, cells, and nuclei) in medical imaging studies. In traditional bounding box-based object detection, combining results from multiple models improves accuracy, especially when real-time processing isn't crucial. Unfortunately, this widely adopted strategy is not re… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  5. Improving the Expressiveness of $K$-hop Message-Passing GNNs by Injecting Contextualized Substructure Information

    Authors: Tianjun Yao, Yiongxu Wang, Kun Zhang, Shangsong Liang

    Abstract: Graph neural networks (GNNs) have become the \textit{de facto} standard for representational learning in graphs, and have achieved state-of-the-art performance in many graph-related tasks; however, it has been shown that the expressive power of standard GNNs are equivalent maximally to 1-dimensional Weisfeiler-Lehman (1-WL) Test. Recently, there is a line of works aiming to enhance the expressive… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 13 pages, published in Research track of KDD2023

    ACM Class: I.2.6

  6. arXiv:2406.13495  [pdf, other

    cs.CV

    DF40: Toward Next-Generation Deepfake Detection

    Authors: Zhiyuan Yan, Tai** Yao, Shen Chen, Yandan Zhao, Xinghe Fu, Junwei Zhu, Donghao Luo, Li Yuan, Chengjie Wang, Shouhong Ding, Yunsheng Wu

    Abstract: We propose a new comprehensive benchmark to revolutionize the current deepfake detection field to the next generation. Predominantly, existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset (e.g., FF++) and testing them on other prevalent deepfake datasets. This protocol is often regarded as a "golden compass"… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.01884  [pdf, other

    cs.CV

    Rank-based No-reference Quality Assessment for Face Swap**

    Authors: Xinghui Zhou, Wenbo Zhou, Tianyi Wei, Shen Chen, Tai** Yao, Shouhong Ding, Weiming Zhang, Nenghai Yu

    Abstract: Face swap** has become a prominent research area in computer vision and image processing due to rapid technological advancements. The metric of measuring the quality in most face swap** methods relies on several distances between the manipulated images and the source image, or the target image, i.e., there are suitable known reference face images. Therefore, there is still a gap in accurately… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  8. arXiv:2405.17824  [pdf, other

    cs.CV

    mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis

    Authors: Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Vishwesh Nath, Yucheng Tang, Yuankai Huo

    Abstract: Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g.,… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  9. arXiv:2405.09113  [pdf, ps, other

    cs.LG

    Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

    Authors: Kai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Yining Li, Kai Chen, Zhiqiang Shen, Matt Fredrikson

    Abstract: Recent research indicates that large language models (LLMs) are susceptible to jailbreaking attacks that can generate harmful content. This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which effectively jailbreaks several open-source LLMs. Our approach relaxes the discrete jailbreak optimization into a continuous optimization and prog… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  10. arXiv:2405.03652  [pdf

    cs.CV

    Field-of-View Extension for Diffusion MRI via Deep Generative Models

    Authors: Chenyu Gao, Shunxing Bao, Michael Kim, Nancy Newlin, Praitayini Kanakaraj, Tianyuan Yao, Gaurav Rudravaram, Yuankai Huo, Daniel Moyer, Kurt Schilling, Walter Kukull, Arthur Toga, Derek Archer, Timothy Hohman, Bennett Landman, Zhiyuan Li

    Abstract: Purpose: In diffusion MRI (dMRI), the volumetric and bundle analyses of whole-brain tissue microstructure and connectivity can be severely impeded by an incomplete field-of-view (FOV). This work aims to develop a method for imputing the missing slices directly from existing dMRI scans with an incomplete FOV. We hypothesize that the imputed image with complete FOV can improve the whole-brain tracto… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 20 pages, 11 figures

  11. arXiv:2403.19334  [pdf, other

    cs.CV

    Test-Time Domain Generalization for Face Anti-Spoofing

    Authors: Qianyu Zhou, Ke-Yue Zhang, Tai** Yao, Xuequan Lu, Shouhong Ding, Lizhuang Ma

    Abstract: Face Anti-Spoofing (FAS) is pivotal in safeguarding facial recognition systems against presentation attacks. While domain generalization (DG) methods have been developed to enhance FAS performance, they predominantly focus on learning domain-invariant features during training, which may not guarantee generalizability to unseen data that differs largely from the source distributions. Our insight is… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  12. arXiv:2403.17870  [pdf, other

    cs.CV cs.MM

    Boosting Diffusion Models with Moving Average Sampling in Frequency Domain

    Authors: Yurui Qian, Qi Cai, Yingwei Pan, Yehao Li, Ting Yao, Qibin Sun, Tao Mei

    Abstract: Diffusion models have recently brought a powerful revolution in image generation. Despite showing impressive generative capabilities, most of these models rely on the current sample to denoise the next one, possibly resulting in denoising instability. In this paper, we reinterpret the iterative denoising process as model optimization and leverage a moving average mechanism to ensemble all the prio… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  13. arXiv:2403.17005  [pdf, other

    cs.CV cs.MM

    TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

    Authors: Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, Tao Mei

    Abstract: Recent advances in text-to-video generation have demonstrated the utility of powerful diffusion models. Nevertheless, the problem is not trivial when sha** diffusion models to animate static image (i.e., image-to-video generation). The difficulty originates from the aspect that the diffusion process of subsequent animated frames should not only preserve the faithful alignment with the given imag… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: CVPR 2024; Project page: https://trip-i2v.github.io/TRIP/

  14. arXiv:2403.17004  [pdf, other

    cs.CV cs.MM

    SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

    Authors: Rui Zhu, Yingwei Pan, Yehao Li, Ting Yao, Zhenglong Sun, Tao Mei, Chang Wen Chen

    Abstract: Diffusion Transformer (DiT) has emerged as the new trend of generative diffusion models on image generation. In view of extremely slow convergence in typical DiT, recent breakthroughs have been driven by mask strategy that significantly improves the training efficiency of DiT with additional intra-image contextual learning. Despite this progress, mask strategy still suffers from two inherent limit… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  15. arXiv:2403.17001  [pdf, other

    cs.CV cs.MM

    VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation

    Authors: Yang Chen, Yingwei Pan, Haibo Yang, Ting Yao, Tao Mei

    Abstract: Recent innovations on text-to-3D generation have featured Score Distillation Sampling (SDS), which enables the zero-shot learning of implicit 3D models (NeRF) by directly distilling prior knowledge from 2D diffusion models. However, current SDS-based models still struggle with intricate text prompts and commonly result in distorted 3D models with unrealistic textures or cross-view inconsistency is… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: CVPR 2024; Project page: https://vp3d-cvpr24.github.io

  16. arXiv:2403.17000  [pdf, other

    cs.CV cs.MM

    Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

    Authors: Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei

    Abstract: Diffusion models are just at a tip** point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance from low-resolution to high-resolution videos, but also the temporal consistency across video frames. In this paper, we propose a novel approach, pursuing Spatial… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  17. arXiv:2403.11999  [pdf, other

    cs.CV cs.MM

    HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs

    Authors: Ting Yao, Yehao Li, Yingwei Pan, Tao Mei

    Abstract: The hybrid deep models of Vision Transformer (ViT) and Convolution Neural Network (CNN) have emerged as a powerful class of backbones for vision tasks. Scaling up the input resolution of such hybrid backbones naturally strengthes model capacity, but inevitably suffers from heavy computational cost that scales quadratically. Instead, we present a new hybrid backbone with HIgh-Resolution Inputs (nam… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  18. arXiv:2402.19286  [pdf, other

    eess.IV cs.CV

    PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

    Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jialin Yue, Juming Xiong, Lining Yu, Yifei Wu, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

    Abstract: Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intrica… ▽ More

    Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference 2024

  19. arXiv:2402.18236  [pdf

    cs.CV

    Image2Flow: A hybrid image and graph convolutional neural network for rapid patient-specific pulmonary artery segmentation and CFD flow field calculation from 3D cardiac MRI data

    Authors: Tina Yao, Endrit Pajaziti, Michael Quail, Silvia Schievano, Jennifer A Steeden, Vivek Muthurangu

    Abstract: Computational fluid dynamics (CFD) can be used for evaluation of hemodynamics. However, its routine use is limited by labor-intensive manual segmentation, CFD mesh creation, and time-consuming simulation. This study aims to train a deep learning model to both generate patient-specific volume-meshes of the pulmonary artery from 3D cardiac MRI data and directly estimate CFD flow fields. This study… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 22 pages, 7 figures, 3 tables

  20. arXiv:2402.09807  [pdf, other

    math.OC cs.LG stat.ML

    Two trust region type algorithms for solving nonconvex-strongly concave minimax problems

    Authors: Tongliang Yao, Zi Xu

    Abstract: In this paper, we propose a Minimax Trust Region (MINIMAX-TR) algorithm and a Minimax Trust Region Algorithm with Contractions and Expansions(MINIMAX-TRACE) algorithm for solving nonconvex-strongly concave minimax problems. Both algorithms can find an $(ε, \sqrtε)$-second order stationary point(SSP) within $\mathcal{O}(ε^{-1.5})$ iterations, which matches the best well known iteration complexity.

    Submitted 15 February, 2024; originally announced February 2024.

    MSC Class: 90C47; 90C26; 90C30

  21. arXiv:2401.01256  [pdf, other

    cs.CV cs.CL

    VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM

    Authors: Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei

    Abstract: The recent innovations and breakthroughs in diffusion models have significantly expanded the possibilities of generating high-quality videos for the given prompts. Most existing works tackle the single-scene scenario with only one video event occurring in a single background. Extending to generate multi-scene videos nevertheless is not trivial and necessitates to nicely manage the logic in between… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Project website: https://videodrafter.github.io

  22. arXiv:2311.05464  [pdf, other

    cs.CV cs.MM

    3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models

    Authors: Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Tao Mei

    Abstract: 3D content creation via text-driven stylization has played a fundamental challenge to multimedia and graphics community. Recent advances of cross-modal foundation models (e.g., CLIP) have made this problem feasible. Those approaches commonly leverage CLIP to align the holistic semantics of stylized mesh with the given text prompt. Nevertheless, it is not trivial to enable more controllable styliza… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: ACM Multimedia 2023

  23. arXiv:2311.05463  [pdf, other

    cs.CV cs.MM

    ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

    Authors: **gwen Chen, Yingwei Pan, Ting Yao, Tao Mei

    Abstract: Recently, the multimedia community has witnessed the rise of diffusion models trained on large-scale multi-modal data for visual content creation, particularly in the field of text-to-image generation. In this paper, we propose a new task for ``stylizing'' text-to-image models, namely text-driven stylized image generation, that further enhances editability in content creation. Given input text pro… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: ACM Multimedia 2023

  24. arXiv:2311.05461  [pdf, other

    cs.CV cs.MM

    Control3D: Towards Controllable Text-to-3D Generation

    Authors: Yang Chen, Yingwei Pan, Yehao Li, Ting Yao, Tao Mei

    Abstract: Recent remarkable advances in large-scale text-to-image diffusion models have inspired a significant breakthrough in text-to-3D generation, pursuing 3D content creation solely from a given text prompt. However, existing text-to-3D techniques lack a crucial ability in the creative process: interactively control and shape the synthetic 3D contents according to users' desired specifications (e.g., sk… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: ACM Multimedia 2023

  25. arXiv:2311.02495  [pdf

    cs.LG cond-mat.mtrl-sci

    Uncertainty Quantification in Multivariable Regression for Material Property Prediction with Bayesian Neural Networks

    Authors: Longze Li, Jiang Chang, Aleksandar Vakanski, Yachun Wang, Tiankai Yao, Min Xian

    Abstract: With the increased use of data-driven approaches and machine learning-based methods in material science, the importance of reliable uncertainty quantification (UQ) of the predicted variables for informed decision-making cannot be overstated. UQ in material property prediction poses unique challenges, including the multi-scale and multi-physics nature of advanced materials, intricate interactions b… ▽ More

    Submitted 14 May, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: 24 pages, 4 figures

    ACM Class: I.2.6

    Journal ref: Scientific Reports, 14(1):10543, 2024

  26. arXiv:2310.05185  [pdf, other

    cs.AI cs.CL

    Text2NKG: Fine-Grained N-ary Relation Extraction for N-ary relational Knowledge Graph Construction

    Authors: Haoran Luo, Haihong E, Yuhao Yang, Tianyu Yao, Yikai Guo, Zichen Tang, Wentai Zhang, Kaiyang Wan, Shiyao Peng, Meina Song, Wei Lin

    Abstract: Beyond traditional binary relational facts, n-ary relational knowledge graphs (NKGs) are comprised of n-ary relational facts containing more than two entities, which are closer to real-world facts with broader applications. However, the construction of NKGs still significantly relies on manual labor, and n-ary relation extraction still remains at a course-grained level, which is always in a single… ▽ More

    Submitted 12 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Preprint

  27. Bidirectional Knowledge Reconfiguration for Lightweight Point Cloud Analysis

    Authors: Peipei Li, Xing Cui, Yibo Hu, Man Zhang, Ting Yao, Tao Mei

    Abstract: Point cloud analysis faces computational system overhead, limiting its application on mobile or edge devices. Directly employing small models may result in a significant drop in performance since it is difficult for a small model to adequately capture local structure and global shape information simultaneously, which are essential clues for point cloud analysis. This paper explores feature distill… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted by IEEE Transactions on Multimedia (TMM)

    Journal ref: IEEE Transactions on Multimedia ( Early Access ), 02 October 2023

  28. arXiv:2309.11132  [pdf, other

    cs.CV cs.AI cs.LG

    Contrastive Pseudo Learning for Open-World DeepFake Attribution

    Authors: Zhimin Sun, Shen Chen, Tai** Yao, Bangjie Yin, Ran Yi, Shouhong Ding, Lizhuang Ma

    Abstract: The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swap** or expression transferring are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 16 pages, 7 figures, ICCV 2023

  29. arXiv:2309.09534  [pdf, other

    cs.CV

    Selective Volume Mixup for Video Action Recognition

    Authors: Yi Tan, Zhaofan Qiu, Yanbin Hao, Ting Yao, Xiangnan He, Tao Mei

    Abstract: The recent advances in Convolutional Neural Networks (CNNs) and Vision Transformers have convincingly demonstrated high learning capability for video action recognition on large datasets. Nevertheless, deep models often suffer from the overfitting effect on small-scale datasets with a limited number of training videos. A common solution is to exploit the existing image augmentation strategies for… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  30. arXiv:2309.08884  [pdf, other

    cs.LG cs.CR eess.SP eess.SY

    Robust Online Covariance and Sparse Precision Estimation Under Arbitrary Data Corruption

    Authors: Tong Yao, Shreyas Sundaram

    Abstract: Gaussian graphical models are widely used to represent correlations among entities but remain vulnerable to data corruption. In this work, we introduce a modified trimmed-inner-product algorithm to robustly estimate the covariance in an online scenario even in the presence of arbitrary and adversarial data attacks. At each time step, data points, drawn nominally independently and identically from… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: 9 pages, 4 figures, 62nd IEEE Conference on Decision and Control (CDC)

  31. arXiv:2309.02049  [pdf, other

    cs.CV

    Diffusion-based 3D Object Detection with Random Boxes

    Authors: Xin Zhou, **ghua Hou, Tingting Yao, Dingkang Liang, Zhe Liu, Zhikang Zou, Xiaoqing Ye, Jianwei Cheng, Xiang Bai

    Abstract: 3D object detection is an essential task for achieving autonomous driving. Existing anchor-based detection methods rely on empirical heuristics setting of anchors, which makes the algorithms lack elegance. In recent years, we have witnessed the rise of several generative models, among which diffusion models show great potential for learning the transformation of two distributions. Our proposed Dif… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted by PRCV 2023

  32. arXiv:2308.06288  [pdf, other

    q-bio.QM cs.CV eess.IV

    Spatial Pathomics Toolkit for Quantitative Analysis of Podocyte Nuclei with Histology and Spatial Transcriptomics Data in Renal Pathology

    Authors: Jiayuan Chen, Yu Wang, Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Yilin Liu, Jianyong Zhong, Agnes B. Fogo, Haichun Yang, Shilin Zhao, Yuankai Huo

    Abstract: Podocytes, specialized epithelial cells that envelop the glomerular capillaries, play a pivotal role in maintaining renal health. The current description and quantification of features on pathology slides are limited, prompting the need for innovative solutions to comprehensively assess diverse phenotypic attributes within Whole Slide Images (WSIs). In particular, understanding the morphological c… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  33. arXiv:2308.06217  [pdf, other

    cs.CV

    Continual Face Forgery Detection via Historical Distribution Preserving

    Authors: Ke Sun, Shen Chen, Tai** Yao, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji

    Abstract: Face forgery techniques have advanced rapidly and pose serious security threats. Existing face forgery detection methods try to learn generalizable features, but they still fall short of practical application. Additionally, finetuning these methods on historical training data is resource-intensive in terms of time and storage. In this paper, we focus on a novel and challenging problem: Continual F… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  34. arXiv:2307.16545  [pdf, other

    cs.CV

    Towards General Visual-Linguistic Face Forgery Detection

    Authors: Ke Sun, Shen Chen, Tai** Yao, Haozhe Yang, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji

    Abstract: Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We argue that such supervisions lack semantic information and interpretability. To address this issues, in this paper, we propose a novel paradigm named Vis… ▽ More

    Submitted 7 February, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

  35. arXiv:2307.11464  [pdf, other

    cs.CY

    Supporting Post-disaster Recovery with Agent-based Modeling in Multilayer Socio-physical Networks

    Authors: Jiawei Xue, Sangung Park, Washim Uddin Mondal, Sandro Martinelli Reia, Tong Yao, Satish V. Ukkusuri

    Abstract: The examination of post-disaster recovery (PDR) in a socio-physical system enables us to elucidate the complex relationships between humans and infrastructures. Although existing studies have identified many patterns in the PDR process, they fall short of describing how individual recoveries contribute to the overall recovery of the system. To enhance the understanding of individual return behavio… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: 28 pages, 10 figures

  36. Learning and Evaluating Human Preferences for Conversational Head Generation

    Authors: Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei

    Abstract: A reliable and comprehensive evaluation metric that aligns with manual preference assessments is crucial for conversational head video synthesis methods development. Existing quantitative evaluations often fail to capture the full complexity of human preference, as they only consider limited evaluation dimensions. Qualitative evaluations and user studies offer a solution but are time-consuming and… ▽ More

    Submitted 2 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted by ACM Multimedia 2023

  37. arXiv:2307.02090  [pdf, other

    cs.CV

    Interactive Conversational Head Generation

    Authors: Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao

    Abstract: We introduce a new conversation head generation benchmark for synthesizing behaviors of a single interlocutor in a face-to-face conversation. The capability to automatically synthesize interlocutors which can participate in long and multi-turn conversations is vital and offer benefits for various applications, including digital humans, virtual agents, and social robots. While existing research pri… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: text overlap with arXiv:2112.13548

  38. arXiv:2307.00290  [pdf, other

    cs.CV cs.LG

    All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning

    Authors: Can Cui, Ruining Deng, Quan Liu, Tianyuan Yao, Shunxing Bao, Lucas W. Remedios, Yucheng Tang, Yuankai Huo

    Abstract: The Segment Anything Model (SAM) is a recently proposed prompt-based segmentation model in a generic zero-shot segmentation approach. With the zero-shot segmentation capacity, SAM achieved impressive flexibility and precision on various segmentation tasks. However, the current pipeline requires manual prompts during the inference stage, which is still resource intensive for biomedical image segmen… ▽ More

    Submitted 28 August, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

  39. arXiv:2306.16645  [pdf, other

    cs.CV cs.MM

    Deep Equilibrium Multimodal Fusion

    Authors: **hong Ni, Yalong Bai, Wei Zhang, Ting Yao, Tao Mei

    Abstract: Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently. Most existing fusion approaches either learn a fixed fusion strategy during training and inference, or are only capable of fusing the information to a certain extent. Such solutions may fail to fully capture the dynamics of interactions across modalities especially when… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  40. Visual-Aware Text-to-Speech

    Authors: Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei

    Abstract: Dynamically synthesizing talking speech that actively responds to a listening head is critical during the face-to-face interaction. For example, the speaker could take advantage of the listener's facial expression to adjust the tones, stressed syllables, or pauses. In this work, we present a new visual-aware text-to-speech (VA-TTS) task to synthesize speech conditioned on both textual inputs and s… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: accepted as oral and top 3% paper by ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023, 1-5

  41. arXiv:2306.02900  [pdf, other

    cs.CV

    Robust Fiber ODF Estimation Using Deep Constrained Spherical Deconvolution for Diffusion MRI

    Authors: Tianyuan Yao, Francois Rheault, Leon Y Cai, Vishwesh nath, Zuhayr Asad, Nancy Newlin, Can Cui, Ruining Deng, Karthik Ramadass, Andrea Shafer, Susan Resnick, Kurt Schilling, Bennett A. Landman, Yuankai Huo

    Abstract: Diffusion-weighted magnetic resonance imaging (DW-MRI) is a critical imaging method for capturing and modeling tissue microarchitecture at a millimeter scale. A common practice to model the measured DW-MRI signal is via fiber orientation distribution function (fODF). This function is the essential first step for the downstream tractography and connectivity analyses. With recent advantages in data… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: 33 pages, 7 figures

  42. arXiv:2305.15428  [pdf, other

    cs.SI cs.LG

    Online Influence Maximization under Decreasing Cascade Model

    Authors: Fang Kong, Jize Xie, Baoxiang Wang, Tao Yao, Shuai Li

    Abstract: We study online influence maximization (OIM) under a new model of decreasing cascade (DC). This model is a generalization of the independent cascade (IC) model by considering the common phenomenon of market saturation. In DC, the chance of an influence attempt being successful reduces with previous failures. The effect is neglected by previous OIM works under IC and linear threshold models. We pro… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  43. HAHE: Hierarchical Attention for Hyper-Relational Knowledge Graphs in Global and Local Level

    Authors: Haoran Luo, Haihong E, Yuhao Yang, Yikai Guo, Mingzhi Sun, Tianyu Yao, Zichen Tang, Kaiyang Wan, Meina Song, Wei Lin

    Abstract: Link Prediction on Hyper-relational Knowledge Graphs (HKG) is a worthwhile endeavor. HKG consists of hyper-relational facts (H-Facts), composed of a main triple and several auxiliary attribute-value qualifiers, which can effectively represent factually comprehensive information. The internal structure of HKG can be represented as a hypergraph-based representation globally and a semantic sequence-b… ▽ More

    Submitted 15 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023 main conference

    Report number: 3810

    Journal ref: ACL 2023

  44. Instance-Aware Domain Generalization for Face Anti-Spoofing

    Authors: Qianyu Zhou, Ke-Yue Zhang, Tai** Yao, Xuequan Lu, Ran Yi, Shouhong Ding, Lizhuang Ma

    Abstract: Face anti-spoofing (FAS) based on domain generalization (DG) has been recently studied to improve the generalization on unseen scenarios. Previous methods typically rely on domain labels to align the distribution of each domain for learning domain-invariant representations. However, artificial domain labels are coarse-grained and subjective, which cannot reflect real domain distributions accuratel… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  45. arXiv:2304.04155  [pdf, other

    eess.IV cs.CV

    Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging

    Authors: Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lucas W. Remedios, Shunxing Bao, Bennett A. Landman, Lee E. Wheless, Lori A. Coburn, Keith T. Wilson, Yaohong Wang, Shilin Zhao, Agnes B. Fogo, Haichun Yang, Yucheng Tang, Yuankai Huo

    Abstract: The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

  46. arXiv:2304.03679  [pdf, other

    cs.IR

    T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

    Authors: Xiaohui Xie, Qian Dong, Bingning Wang, Feiyang Lv, Ting Yao, Weinan Gan, Zhi**g Wu, Xiangsheng Li, Haitao Li, Yiqun Liu, ** Ma

    Abstract: Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: This Resource paper has been accepted by the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

  47. arXiv:2303.16376  [pdf, other

    cs.LG

    A Unified Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

    Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo

    Abstract: Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture… ▽ More

    Submitted 29 January, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

  48. arXiv:2303.13232  [pdf, other

    cs.CV cs.AI

    Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization

    Authors: Zicheng Zhang, Yinglu Liu, Congying Han, Yingwei Pan, Tiande Guo, Ting Yao

    Abstract: Recent advances in 3D scene representation and novel view synthesis have witnessed the rise of Neural Radiance Fields (NeRFs). Nevertheless, it is not trivial to exploit NeRF for the photorealistic 3D scene stylization task, which aims to generate visually consistent and photorealistic stylized scenes from novel views. Simply coupling NeRF with photorealistic style transfer (PST) will result in cr… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: CVPR 2023, Highlight

  49. arXiv:2303.12512  [pdf, other

    cs.CV

    Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition

    Authors: Zexin Li, Bangjie Yin, Tai** Yao, Juefeng Guo, Shouhong Ding, Simin Chen, Cong Liu

    Abstract: A hard challenge in develo** practical face recognition (FR) attacks is due to the black-box nature of the target FR model, i.e., inaccessible gradient and parameter information to attackers. While recent research took an important step towards attacking black-box FR models through leveraging transferability, their performance is still limited, especially against online commercial FR systems tha… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 8 pages, 5 fivures, accepted by CVPR 2023 as a poster paper

  50. arXiv:2303.11676  [pdf

    cs.CV

    Deep Learning Pipeline for Preprocessing and Segmenting Cardiac Magnetic Resonance of Single Ventricle Patients from an Image Registry

    Authors: Tina Yao, Nicole St. Clair, Gabriel F. Miller, Adam L. Dorfman, Mark A. Fogel, Sunil Ghelani, Rajesh Krishnamurthy, Christopher Z. Lam, Joshua D. Robinson, David Schidlow, Timothy C. Slesnick, Justin Weigand, Michael Quail, Rahul Rathod, Jennifer A. Steeden, Vivek Muthurangu

    Abstract: Purpose: To develop and evaluate an end-to-end deep learning pipeline for segmentation and analysis of cardiac magnetic resonance images to provide core-lab processing for a multi-centre registry of Fontan patients. Materials and Methods: This retrospective study used training (n = 175), validation (n = 25) and testing (n = 50) cardiac magnetic resonance image exams collected from 13 institution… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: 17 pages, 6 figures