Skip to main content

Showing 1–50 of 136 results for author: Metaxas, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14449  [pdf, other

    cs.AI

    APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

    Authors: Can **, Hongwu Peng, Shiyu Zhao, Zhenting Wang, Wujiang Xu, Ligong Han, Jiahui Zhao, Kai Zhong, Sanguthevar Rajasekaran, Dimitris N. Metaxas

    Abstract: Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. Existing automatic prompt engineering algorithms primarily focus on language modeling and classification tasks, leaving the domain of IR, particularly… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.11675  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

    Authors: Yibin Wang, Haizhou Shi, Ligong Han, Dimitris Metaxas, Hao Wang

    Abstract: Large Language Models (LLMs) often suffer from overconfidence during inference, particularly when adapted to downstream domain-specific tasks with limited data. Previous work addresses this issue by employing approximate Bayesian estimation after the LLMs are trained, enabling them to quantify uncertainty. However, such post-training approaches' performance is severely limited by the parameters le… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 27 pages, 3 figures, 9 tables; preprint, work in progress

  3. arXiv:2406.05596  [pdf, other

    cs.CV cs.LG

    Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

    Authors: Yunhe Gao, Difei Gu, Mu Zhou, Dimitris Metaxas

    Abstract: Although explainability is essential in the clinical diagnosis, most deep learning models still function as black boxes without elucidating their decision-making process. In this study, we investigate the explainable model development that can mimic the decision-making process of human experts by fusing the domain knowledge of explicit diagnostic criteria. We introduce a simple yet effective frame… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 Early Accept

  4. arXiv:2406.04324  [pdf, other

    cs.CV eess.IV

    SF-V: Single Forward Video Generation Model

    Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

    Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/SF-V

  5. arXiv:2406.01062  [pdf, other

    cs.CV

    SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models

    Authors: Qilong Zhangli, **dong Jiang, Di Liu, Licheng Yu, Xiaoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, Praveen Krishnan

    Abstract: While diffusion models have significantly advanced the quality of image generation, their capability to accurately and coherently render text within these images remains a substantial challenge. Conventional diffusion-based methods for scene text generation are typically limited by their reliance on an intermediate layout output. This dependency often results in a constrained diversity of text sty… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2405.21050  [pdf, other

    cs.CV cs.LG

    Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models

    Authors: Xinxi Zhang, Song Wen, Ligong Han, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Dimitris N. Metaxas

    Abstract: Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and the… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  7. arXiv:2405.14660  [pdf, other

    cs.LG cs.AI cs.CL

    Implicit In-context Learning

    Authors: Zhuowei Li, Zihao Xu, Ligong Han, Yunhe Gao, Song Wen, Di Liu, Hao Wang, Dimitris N. Metaxas

    Abstract: In-context Learning (ICL) empowers large language models (LLMs) to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is susceptible to the selection and order of demonstration examples. In this work, we introduce Implicit In-con… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  8. arXiv:2405.13360  [pdf, other

    cs.CV cs.AI cs.LG

    How to Trace Latent Generative Model Generated Images without Artificial Watermark?

    Authors: Zhenting Wang, Vikash Sehwag, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, Shiqing Ma

    Abstract: Latent generative models (e.g., Stable Diffusion) have become more and more popular, but concerns have arisen regarding potential misuse related to images generated by these models. It is, therefore, necessary to analyze the origin of images by inferring if a particular image was generated by a specific latent generative model. Most existing methods (e.g., image watermark and model fingerprinting)… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  9. arXiv:2405.02781  [pdf, other

    cs.CV

    Instantaneous Perception of Moving Objects in 3D

    Authors: Di Liu, Bingbing Zhuang, Dimitris N. Metaxas, Manmohan Chandraker

    Abstract: The perception of 3D motion of surrounding traffic participants is crucial for driving safety. While existing works primarily focus on general large motions, we contend that the instantaneous detection and quantification of subtle motions is equally important as they indicate the nuances in driving behavior that may be safety critical, such as behaviors near a stop sign of parking positions. We de… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  10. arXiv:2403.09623  [pdf, other

    cs.CV

    Score-Guided Diffusion for 3D Human Recovery

    Authors: Anastasis Stathopoulos, Ligong Han, Dimitris Metaxas

    Abstract: We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. These inverse problems involve fitting a human body model to image observations, traditionally solved through optimization techniques. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: CVPR 2024 (project page: https://statho.github.io/ScoreHMR)

  11. arXiv:2401.00094  [pdf, other

    cs.CV

    Generating Enhanced Negatives for Training Language-Based Object Detectors

    Authors: Shiyu Zhao, Long Zhao, Vijay Kumar B. G, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter

    Abstract: The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has proven successful, but requires good positive and negative samples. However, the free-form nature and the open vocabulary of object descriptions make… ▽ More

    Submitted 12 April, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

    Comments: Accepted to CVPR 2024. The supplementary document included

  12. arXiv:2312.03816  [pdf, other

    cs.CV

    AVID: Any-Length Video Inpainting with Diffusion Model

    Authors: Zhixing Zhang, Bichen Wu, Xiaoyan Wang, Yaqiao Luo, Luxin Zhang, Yinan Zhao, Peter Vajda, Dimitris Metaxas, Licheng Yu

    Abstract: Recent advances in diffusion models have successfully enabled text-guided image inpainting. While it seems straightforward to extend such editing capability into the video domain, there have been fewer works regarding text-guided video inpainting. Given a video, a masked region at its initial frame, and an editing prompt, it requires a model to do infilling at each frame following the editing guid… ▽ More

    Submitted 29 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Project website: https://zhang-zx.github.io/AVID/

  13. arXiv:2311.16060  [pdf, other

    cs.CV

    DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization

    Authors: Zhaoyang Xia, Carol Neidle, Dimitris N. Metaxas

    Abstract: Since American Sign Language (ASL) has no standard written form, Deaf signers frequently share videos in order to communicate in their native language. However, since both hands and face convey critical linguistic information in signed languages, sign language videos cannot preserve signer privacy. While signers have expressed interest, for a variety of applications, in sign language video anonymi… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Project webpage: https://github.com/Jeffery9707/DiffSLVA

  14. arXiv:2310.06311  [pdf, other

    cs.CV cs.MM

    Improving Compositional Text-to-image Generation with Large Vision-Language Models

    Authors: Song Wen, Guian Fang, Renrui Zhang, Peng Gao, Hao Dong, Dimitris Metaxas

    Abstract: Recent advancements in text-to-image models, particularly diffusion models, have shown significant promise. However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately align with input texts describing multiple objects, variable attributes, and intricate spatial relationships. To address this limitation, we employ large vision-lan… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  15. arXiv:2309.13839  [pdf, other

    eess.IV cs.CV

    Fill the K-Space and Refine the Image: Prompting for Dynamic and Multi-Contrast MRI Reconstruction

    Authors: Bingyu Xin, Meng Ye, Leon Axel, Dimitris N. Metaxas

    Abstract: The key to dynamic or multi-contrast magnetic resonance imaging (MRI) reconstruction lies in exploring inter-frame or inter-contrast information. Currently, the unrolled model, an approach combining iterative MRI reconstruction steps with learnable neural network layers, stands as the best-performing method for MRI reconstruction. However, there are two main limitations to overcome: firstly, the u… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: STACOM 2023; Code is available at https://github.com/hellopipu/PromptMR

  16. arXiv:2309.12594  [pdf, other

    cs.CV

    DeFormer: Integrating Transformers with Deformable Models for 3D Shape Abstraction from a Single Image

    Authors: Di Liu, Xiang Yu, Meng Ye, Qilong Zhangli, Zhuowei Li, Zhixing Zhang, Dimitris N. Metaxas

    Abstract: Accurate 3D shape abstraction from a single 2D image is a long-standing problem in computer vision and graphics. By leveraging a set of primitives to represent the target shape, recent methods have achieved promising results. However, these methods either use a relatively large number of primitives or lack geometric flexibility due to the limited expressibility of the primitives. In this paper, we… ▽ More

    Submitted 3 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV 2023

  17. arXiv:2309.01035  [pdf, other

    cs.CV cs.AI

    Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency

    Authors: Di Liu, Long Zhao, Qilong Zhangli, Yunhe Gao, Ting Liu, Dimitris N. Metaxas

    Abstract: The task of shape abstraction with semantic part consistency is challenging due to the complex geometries of natural objects. Recent methods learn to represent an object shape using a set of simple primitives to fit the target. \textcolor{black}{However, in these methods, the primitives used do not always correspond to real parts or lack geometric flexibility for semantic interpretation.} In this… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

  18. arXiv:2308.09223  [pdf, other

    eess.IV cs.CV cs.LG

    DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction

    Authors: Xiaoxiao He, Chaowei Tan, Ligong Han, Bo Liu, Leon Axel, Kang Li, Dimitris N. Metaxas

    Abstract: Accurate 3D cardiac reconstruction from cine magnetic resonance imaging (cMRI) is crucial for improved cardiovascular disease diagnosis and understanding of the heart's motion. However, current cardiac MRI-based reconstruction technology used in clinical settings is 2D with limited through-plane resolution, resulting in low-quality reconstructed cardiac volumes. To better reconstruct 3D cardiac vo… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted in MICCAI 2023

  19. arXiv:2308.06412  [pdf, other

    cs.CV

    Taming Self-Training for Open-Vocabulary Object Detection

    Authors: Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas

    Abstract: Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm to leverage PLs, is rarely explored for OVD. This work identifies two challenges of using self-training in OVD: noisy PLs from VLMs and frequent distr… ▽ More

    Submitted 12 April, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: Accepted to CVPR 2024. The supplementary document included

  20. arXiv:2308.04663  [pdf, other

    eess.IV cs.CV cs.LG

    Classification of lung cancer subtypes on CT images with synthetic pathological priors

    Authors: Wentao Zhu, Yuan **, Gege Ma, Geng Chen, Jan Egger, Shaoting Zhang, Dimitris N. Metaxas

    Abstract: The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements. In this paper, we propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on computed tomography (CT) images. Inspired by studies stating that cross-scale associations exist in the image patterns betwe… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 16 pages, 7 figures

    Journal ref: Medical Image Analysis 95, July 2024, 103199

  21. arXiv:2307.11952  [pdf, other

    cs.CV cs.AI

    Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction

    Authors: Kexin Ding, Mu Zhou, Dimitris N. Metaxas, Shaoting Zhang

    Abstract: Survival outcome assessment is challenging and inherently associated with multiple clinical factors (e.g., imaging and genomics biomarkers) in cancer. Enabling multimodal analytics promises to reveal novel predictive patterns of patient outcomes. In this study, we propose a multimodal transformer (PathOmics) integrating pathology and genomics insights into colon-related cancer survival prediction.… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: Accepted to MICCAI2023 (Top14%)

  22. arXiv:2307.07693  [pdf, other

    cs.CV

    Neural Deformable Models for 3D Bi-Ventricular Heart Shape Reconstruction and Modeling from 2D Sparse Cardiac Magnetic Resonance Imaging

    Authors: Meng Ye, Dong Yang, Mikael Kanski, Leon Axel, Dimitris Metaxas

    Abstract: We propose a novel neural deformable model (NDM) targeting at the reconstruction and modeling of 3D bi-ventricular shape of the heart from 2D sparse cardiac magnetic resonance (CMR) imaging data. We model the bi-ventricular shape using blended deformable superquadrics, which are parameterized by a set of geometric parameter functions and are capable of deforming globally and locally. While global… ▽ More

    Submitted 12 August, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023

  23. arXiv:2307.03108  [pdf, other

    cs.CV cs.CR cs.LG

    DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models

    Authors: Zhenting Wang, Chen Chen, Lingjuan Lyu, Dimitris N. Metaxas, Shiqing Ma

    Abstract: Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized data usage during the training or fine-tuning process. One example is when a model trainer collects a set of images created by a particular artist and attempts to train a model capable of generating similar images without obtaining perm… ▽ More

    Submitted 9 April, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: ICLR 2024

  24. arXiv:2306.05705  [pdf, other

    eess.IV cs.CV

    On the Challenges and Perspectives of Foundation Models for Medical Image Analysis

    Authors: Shaoting Zhang, Dimitris Metaxas

    Abstract: This article discusses the opportunities, applications and future directions of large-scale pre-trained models, i.e., foundation models, for analyzing medical images. Medical foundation models have immense potential in solving a wide range of downstream tasks, as they can help to accelerate the development of accurate and robust models, reduce the large amounts of required labeled data, preserve t… ▽ More

    Submitted 21 November, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  25. arXiv:2306.05414  [pdf, other

    cs.CV

    Improving Tuning-Free Real Image Editing with Proximal Guidance

    Authors: Ligong Han, Song Wen, Qi Chen, Zhixing Zhang, Kunpeng Song, Mengwei Ren, Ruijiang Gao, Anastasis Stathopoulos, Xiaoxiao He, Yuxiao Chen, Di Liu, Qilong Zhangli, **dong Jiang, Zhaoyang Xia, Akash Srivastava, Dimitris Metaxas

    Abstract: DDIM inversion has revealed the remarkable potential of real image editing within diffusion-based methods. However, the accuracy of DDIM reconstruction degrades as larger classifier-free guidance (CFG) scales being used for enhanced editing. Null-text inversion (NTI) optimizes null embeddings to align the reconstruction and inversion trajectories with larger CFG scales, enabling real image editing… ▽ More

    Submitted 5 July, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Added inversion guidance, and fixed typos

  26. arXiv:2306.02416  [pdf, other

    cs.CV

    Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation

    Authors: Yunhe Gao, Zhuowei Li, Di Liu, Mu Zhou, Shaoting Zhang, Dimitris N. Metaxas

    Abstract: A major focus of clinical imaging workflow is disease diagnosis and management, leading to medical imaging datasets strongly tied to specific clinical objectives. This scenario has led to the prevailing practice of develo** task-specific segmentation models, without gaining insights from widespread imaging cohorts. Inspired by the training program of medical radiology residents, we propose a shi… ▽ More

    Submitted 6 April, 2024; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted by CVPR 2024

  27. arXiv:2305.00244  [pdf, other

    cs.CV cs.LG

    A Critical Analysis of the Limitation of Deep Learning based 3D Dental Mesh Segmentation Methods in Segmenting Partial Scans

    Authors: Ananya Jana, Aniruddha Maiti, Dimitris N. Metaxas

    Abstract: Tooth segmentation from intraoral scans is a crucial part of digital dentistry. Many Deep Learning based tooth segmentation algorithms have been developed for this task. In most of the cases, high accuracy has been achieved, although, most of the available tooth segmentation techniques make an implicit restrictive assumption of full jaw model and they report accuracy based on full jaw models. Medi… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

    Comments: accepted to IEEE EMBC 2023

  28. arXiv:2304.14396  [pdf, other

    cs.CV

    Learning Articulated Shape with Keypoint Pseudo-labels from Web Images

    Authors: Anastasis Stathopoulos, Georgios Pavlakos, Ligong Han, Dimitris Metaxas

    Abstract: This paper shows that it is possible to learn models for monocular 3D reconstruction of articulated objects (e.g., horses, cows, sheep), using as few as 50-150 images labeled with 2D keypoints. Our proposed approach involves training category-specific keypoint estimators, generating 2D keypoint pseudo-labels on unlabeled web images, and using both the labeled and self-labeled sets to train 3D reco… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: CVPR 2023 (project page: https://statho.github.io/projects/animals3d/index.html)

  29. arXiv:2304.11463  [pdf, other

    cs.CV

    OmniLabel: A Challenging Benchmark for Language-Based Object Detection

    Authors: Samuel Schulter, Vijay Kumar B G, Yumin Suh, Konstantinos M. Dafnis, Zhixing Zhang, Shiyu Zhao, Dimitris Metaxas

    Abstract: Language-based object detection is a promising direction towards building a natural interface to describe objects in images that goes far beyond plain category names. While recent methods show great progress in that direction, proper evaluation is lacking. With OmniLabel, we propose a novel task definition, dataset, and evaluation metric. The task subsumes standard- and open-vocabulary detection a… ▽ More

    Submitted 14 August, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

    Comments: ICCV 2023 Oral - Visit our project website at https://www.omnilabel.org

  30. arXiv:2304.00601  [pdf, other

    cs.CV cs.LG

    Constructive Assimilation: Boosting Contrastive Learning Performance through View Generation Strategies

    Authors: Ligong Han, Seungwook Han, Shivchander Sudalairaj, Charlotte Loh, Rumen Dangovski, Fei Deng, Pulkit Agrawal, Dimitris Metaxas, Leonid Karlinsky, Tsui-Wei Weng, Akash Srivastava

    Abstract: Transformations based on domain expertise (expert transformations), such as random-resized-crop and color-jitter, have proven critical to the success of contrastive learning techniques such as SimCLR. Recently, several attempts have been made to replace such domain-specific, human-designed transformations with generated views that are learned. However for imagery data, so far none of these view-ge… ▽ More

    Submitted 8 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: Accepted at Generative Models for Computer Vision Workshop 2023

  31. arXiv:2303.14865  [pdf, other

    cs.CV

    Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens

    Authors: Yuxiao Chen, Jianbo Yuan, Yu Tian, Shijie Geng, Xinyu Li, Ding Zhou, Dimitris N. Metaxas, Hongxia Yang

    Abstract: Contrastive learning-based vision-language pre-training approaches, such as CLIP, have demonstrated great success in many vision-language tasks. These methods achieve cross-modal alignment by encoding a matched image-text pair with similar feature embeddings, which are generated by aggregating information from visual patches and language tokens. However, direct aligning cross-modal information usi… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  32. arXiv:2303.14357  [pdf, other

    eess.IV cs.CV cs.LG

    Dealing With Heterogeneous 3D MR Knee Images: A Federated Few-Shot Learning Method With Dual Knowledge Distillation

    Authors: Xiaoxiao He, Chaowei Tan, Bo Liu, Li** Si, Weiwu Yao, Liang Zhao, Di Liu, Qilong Zhangli, Qi Chang, Kang Li, Dimitris N. Metaxas

    Abstract: Federated Learning has gained popularity among medical institutions since it enables collaborative training between clients (e.g., hospitals) without aggregating data. However, due to the high cost associated with creating annotations, especially for large 3D image datasets, clinical institutions do not have enough supervised data for training locally. Thus, the performance of the collaborative mo… ▽ More

    Submitted 17 April, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

  33. arXiv:2303.11305  [pdf, other

    cs.CV

    SVDiff: Compact Parameter Space for Diffusion Fine-Tuning

    Authors: Ligong Han, Yinxiao Li, Han Zhang, Peyman Milanfar, Dimitris Metaxas, Feng Yang

    Abstract: Diffusion models have achieved remarkable success in text-to-image generation, enabling the creation of high-quality images from text prompts or other modalities. However, existing methods for customizing these models are limited by handling multiple personalized subjects and the risk of overfitting. Moreover, their large number of parameters is inefficient for model storage. In this paper, we pro… ▽ More

    Submitted 2 July, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: Added additional analysis and style-mixing results

  34. arXiv:2303.09447  [pdf, other

    cs.LG cs.AI cs.CV

    Steering Prototypes with Prompt-tuning for Rehearsal-free Continual Learning

    Authors: Zhuowei Li, Long Zhao, Zizhao Zhang, Han Zhang, Di Liu, Ting Liu, Dimitris N. Metaxas

    Abstract: In the context of continual learning, prototypes-as representative class embeddings-offer advantages in memory conservation and the mitigation of catastrophic forgetting. However, challenges related to semantic drift and prototype interference persist. In this study, we introduce the Contrastive Prototypical Prompt (CPP) approach. Through task-specific prompt-tuning, underpinned by a contrastive l… ▽ More

    Submitted 12 November, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accept to WACV 2024. Code is available at https://github.com/LzVv123456/Contrastive-Prototypical-Prompt

  35. arXiv:2301.10531  [pdf, other

    cs.CV cs.AI

    3D Tooth Mesh Segmentation with Simplified Mesh Cell Representation

    Authors: Ananya Jana, Hrebesh Molly Subhash, Dimitris N. Metaxas

    Abstract: Manual tooth segmentation of 3D tooth meshes is tedious and there is variations among dentists. %Manual tooth annotation of 3D tooth meshes is a tedious task. Several deep learning based methods have been proposed to perform automatic tooth mesh segmentation. Many of the proposed tooth mesh segmentation algorithms summarize the mesh cell as - the cell center or barycenter, the normal at barycenter… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

    Comments: accepted at IEEE ISBI 2023 International Symposium on Biomedical Imaging

  36. arXiv:2212.04489  [pdf, other

    cs.CV cs.AI

    SINE: SINgle Image Editing with Text-to-Image Diffusion Models

    Authors: Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, Jian Ren

    Abstract: Recent works on diffusion models have demonstrated a strong capability for conditioning image generation, e.g., text-guided image synthesis. Such success inspires many efforts trying to use large-scale pre-trained diffusion models for tackling a challenging problem--real image editing. Works conducted in this area learn a unique textual token corresponding to several images containing the same obj… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: Project website: https://zhang-zx.github.io/SINE/

  37. arXiv:2212.04473  [pdf, other

    cs.CV

    Diffusion Guided Domain Adaptation of Image Generators

    Authors: Kunpeng Song, Ligong Han, Bingchen Liu, Dimitris Metaxas, Ahmed Elgammal

    Abstract: Can a text-to-image diffusion model be used as a training objective for adapting a GAN generator to another domain? In this paper, we show that the classifier-free guidance can be leveraged as a critic and enable generators to distill knowledge from large-scale text-to-image diffusion models. Generators can be efficiently shifted into new domains indicated by text prompts without access to groundt… ▽ More

    Submitted 9 December, 2022; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Project website: https://styleganfusion.github.io/

  38. arXiv:2211.12081  [pdf, other

    cs.CV

    CDDSA: Contrastive Domain Disentanglement and Style Augmentation for Generalizable Medical Image Segmentation

    Authors: Ran Gu, Guotai Wang, Jiangshan Lu, **gyang Zhang, Wenhui Lei, Yinan Chen, Wenjun Liao, Shichuan Zhang, Kang Li, Dimitris N. Metaxas, Shaoting Zhang

    Abstract: Generalization to previously unseen images with potential domain shifts and different styles is essential for clinically applicable medical image segmentation, and the ability to disentangle domain-specific and domain-invariant features is key for achieving Domain Generalization (DG). However, existing DG methods can hardly achieve effective disentanglement to get high generalizability. To deal wi… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: 14 pages, 8 figures

  39. arXiv:2210.04831  [pdf, other

    cs.CV

    Visual Prompt Tuning for Test-time Domain Adaptation

    Authors: Yunhe Gao, Xingjian Shi, Yi Zhu, Hao Wang, Zhiqiang Tang, Xiong Zhou, Mu Li, Dimitris N. Metaxas

    Abstract: Models should be able to adapt to unseen data during test-time to avoid performance drops caused by inevitable distribution shifts in real-world deployment scenarios. In this work, we tackle the practical yet challenging test-time adaptation (TTA) problem, where a model adapts to the target domain without accessing the source data. We propose a simple recipe called \textit{Data-efficient Prompt Tu… ▽ More

    Submitted 30 November, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

  40. arXiv:2209.08132  [pdf, other

    cs.CV

    Automatic Tooth Segmentation from 3D Dental Model using Deep Learning: A Quantitative Analysis of what can be learnt from a Single 3D Dental Model

    Authors: Ananya Jana, Hrebesh Molly Subhash, Dimitris Metaxas

    Abstract: 3D tooth segmentation is an important task for digital orthodontics. Several Deep Learning methods have been proposed for automatic tooth segmentation from 3D dental models or intraoral scans. These methods require annotated 3D intraoral scans. Manually annotating 3D intraoral scans is a laborious task. One approach is to devise self-supervision methods to reduce the manual labeling effort. Compar… ▽ More

    Submitted 16 September, 2022; originally announced September 2022.

    Comments: accepted to SIPAIM 2022

  41. arXiv:2207.09644  [pdf, other

    cs.CV

    Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

    Authors: Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas

    Abstract: Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult. Recent studies focus on learning video-level temporal and discriminative information using contrastive learning, but overlook the hie… ▽ More

    Submitted 27 March, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022

  42. arXiv:2207.08954  [pdf, other

    cs.CV

    Exploiting Unlabeled Data with Vision and Language Models for Object Detection

    Authors: Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B. G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas

    Abstract: Building robust and generic object detection frameworks requires scaling to larger label spaces and bigger training datasets. However, it is prohibitively costly to acquire annotations for thousands of categories at a large scale. We propose a novel method that leverages the rich semantics available in recent vision and language models to localize and classify objects in unlabeled images, effectiv… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022 (with the supplementary document)

  43. arXiv:2206.09089  [pdf

    cs.CV

    A Dynamic Data Driven Approach for Explainable Scene Understanding

    Authors: Zachary A Daniels, Dimitris Metaxas

    Abstract: Scene-understanding is an important topic in the area of Computer Vision, and illustrates computational challenges with applications to a wide range of domains including remote sensing, surveillance, smart agriculture, robotics, autonomous driving, and smart cities. We consider the active explanation-driven understanding and classification of scenes. Suppose that an agent utilizing one or more sen… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Unpublished draft of book chapter

  44. arXiv:2206.07163  [pdf, other

    cs.CV cs.LG eess.IV

    DeepRecon: Joint 2D Cardiac Segmentation and 3D Volume Reconstruction via A Structure-Specific Generative Method

    Authors: Qi Chang, Zhennan Yan, Mu Zhou, Di Liu, Khalid Sawalha, Meng Ye, Qilong Zhangli, Mikael Kanski, Subhi Al Aref, Leon Axel, Dimitris Metaxas

    Abstract: Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental to building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high inter-subject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an end-to-… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: MICCAI2022

  45. arXiv:2206.04125  [pdf, other

    cs.CV

    Towards Self-supervised and Weight-preserving Neural Architecture Search

    Authors: Zhuowei Li, Yibo Gao, Zhenzhou Zha, Zhiqiang HU, Qing Xia, Shaoting Zhang, Dimitris N. Metaxas

    Abstract: Neural architecture search (NAS) algorithms save tremendous labor from human experts. Recent advancements further reduce the computational overhead to an affordable level. However, it is still cumbersome to deploy the NAS techniques in real-world applications due to the fussy procedures and the supervised learning paradigm. In this work, we propose the self-supervised and weight-preserving neural… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

  46. arXiv:2203.13277  [pdf, other

    cs.LG

    A Manifold View of Adversarial Risk

    Authors: Wenjia Zhang, Yikai Zhang, Xiaoling Hu, Mayank Goswami, Chao Chen, Dimitris Metaxas

    Abstract: The adversarial risk of a machine learning model has been widely studied. Most previous works assume that the data lies in the whole ambient space. We propose to take a new angle and take the manifold assumption into consideration. Assuming data lies in a manifold, we investigate two new types of adversarial risk, the normal adversarial risk due to perturbation along normal direction, and the in-m… ▽ More

    Submitted 7 April, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

  47. arXiv:2203.11335  [pdf, other

    cs.CV

    Global Matching with Overlap** Attention for Optical Flow Estimation

    Authors: Shiyu Zhao, Long Zhao, Zhixing Zhang, Enyu Zhou, Dimitris Metaxas

    Abstract: Optical flow estimation is a fundamental task in computer vision. Recent direct-regression methods using deep neural networks achieve remarkable performance improvement. However, they do not explicitly capture long-term motion correspondences and thus cannot handle large motions effectively. In this paper, inspired by the traditional matching-optimization methods where matching is introduced to ha… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022 (with additional figures)

  48. arXiv:2203.10726  [pdf, other

    eess.IV cs.CV

    TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers

    Authors: Di Liu, Yunhe Gao, Qilong Zhangli, Ligong Han, Xiaoxiao He, Zhaoyang Xia, Song Wen, Qi Chang, Zhennan Yan, Mu Zhou, Dimitris Metaxas

    Abstract: Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view im… ▽ More

    Submitted 5 September, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

  49. arXiv:2203.02846  [pdf, other

    cs.CV

    Region Proposal Rectification Towards Robust Instance Segmentation of Biological Images

    Authors: Qilong Zhangli, **gru Yi, Di Liu, Xiaoxiao He, Zhaoyang Xia, Qi Chang, Ligong Han, Yunhe Gao, Song Wen, Haiming Tang, He Wang, Mu Zhou, Dimitris Metaxas

    Abstract: Top-down instance segmentation framework has shown its superiority in object detection compared to the bottom-up framework. While it is efficient in addressing over-segmentation, top-down instance segmentation suffers from over-crop problem. However, a complete segmentation mask is crucial for biological image analysis as it delivers important morphological properties such as shapes and volumes. I… ▽ More

    Submitted 3 November, 2022; v1 submitted 5 March, 2022; originally announced March 2022.

  50. arXiv:2203.02573  [pdf, other

    cs.CV cs.AI cs.LG

    Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

    Authors: Ligong Han, Jian Ren, Hsin-Ying Lee, Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris Metaxas, Sergey Tulyakov

    Abstract: Most methods for conditional video synthesis use a single modality as the condition. This comes with major limitations. For example, it is problematic for a model conditioned on an image to generate a specific motion trajectory desired by the user since there is no means to provide motion information. Conversely, language information can describe the desired motion, while not precisely defining th… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022