Skip to main content

Showing 1–50 of 204 results for author: Gong, B

.
  1. arXiv:2407.01606  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    On Discrete Prompt Optimization for Diffusion Models

    Authors: Ruochen Wang, Ting Liu, Cho-Jui Hsieh, Boqing Gong

    Abstract: This paper introduces the first gradient-based framework for prompt optimization in text-to-image diffusion models. We formulate prompt engineering as a discrete optimization problem over the language space. Two major challenges arise in efficiently finding a solution to this problem: (1) Enormous Domain Space: Setting the domain to the entire language space poses significant difficulty to the opt… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: ICML 2024. Code available at https://github.com/ruocwang/dpo-diffusion

    MSC Class: 68T01

    Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  2. arXiv:2406.16476  [pdf, other

    cs.CV

    ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

    Authors: Shuwei Shi, Wenbo Li, Yuechen Zhang, **gwen He, Biao Gong, Yinqiang Zheng

    Abstract: Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in over-smoothed content, structural distortions, and repetitive patterns. To this end, we introduce ResMaster, a novel, training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions. Specifically, ResMast… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2406.02965  [pdf, other

    cs.CV

    Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

    Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh

    Abstract: The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  4. arXiv:2406.01970  [pdf, other

    cs.CV cs.AI

    The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

    Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Boqing Gong, Cho-Jui Hsieh, Minhao Cheng

    Abstract: Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``universal'' and can be generalized across various positio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2406.00448  [pdf, other

    cs.CV cs.GR

    Bilateral Guided Radiance Field Processing

    Authors: Yuehao Wang, Chaoyi Wang, Bingchen Gong, Tianfan Xue

    Abstract: Neural Radiance Fields (NeRF) achieves unprecedented performance in synthesizing novel view synthesis, utilizing multi-view consistency. When capturing multiple inputs, image signal processing (ISP) in modern cameras will independently enhance them, including exposure adjustment, color correction, local tone map**, etc. While these processings greatly improve image quality, they often break the… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: SIGGRAPH (ACM TOG), 2024. Project page: https://bilarfpro.github.io

  6. arXiv:2405.17835  [pdf, other

    cs.CV

    Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting

    Authors: Shuojue Yang, Qian Li, Daiyun Shen, Bingchen Gong, Qi Dou, Yueming **

    Abstract: Tissue deformation poses a key challenge for accurate surgical scene reconstruction. Despite yielding high reconstruction quality, existing methods suffer from slow rendering speeds and long training times, limiting their intraoperative applicability. Motivated by recent progress in 3D Gaussian Splatting, an emerging technology in real-time 3D rendering, this work presents a novel fast reconstruct… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Early accepted at MICCAI 2024, 10 pages, 2 figures

  7. arXiv:2405.16567  [pdf, other

    cs.AI cs.CR

    Automatic Jailbreaking of the Text-to-Image Generative AI Systems

    Authors: Minseon Kim, Hyomin Lee, Boqing Gong, Huishuai Zhang, Sung Ju Hwang

    Abstract: Recent AI systems have shown extremely powerful performance, even surpassing human performance, on various tasks such as information retrieval, language generation, and image generation based on large language models (LLMs). At the same time, there are diverse safety risks that can cause the generation of malicious contents by circumventing the alignment in LLMs, which are often referred to as jai… ▽ More

    Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: Under review

  8. arXiv:2405.12367  [pdf, other

    eess.IV cs.CV

    Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning

    Authors: Zheyuan Zhang, Elif Keles, Gorkem Durak, Yavuz Taktak, Onkar Susladkar, Vandan Gorade, Debesh Jha, Asli C. Ormeci, Alpay Medetalibeyoglu, Lanhong Yao, Bin Wang, Ilkin Sevgi Isler, Linkai Peng, Hongyi Pan, Camila Lopes Vendrami, Amir Bourhani, Yury Velichko, Boqing Gong, Concetto Spampinato, Ayis Pyrros, Pallavi Tiwari, Derk C. F. Klatte, Megan Engels, Sanne Hoogenboom, Candice W. Bolan , et al. (13 additional authors not shown)

    Abstract: Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective st… ▽ More

    Submitted 25 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: under review version

  9. arXiv:2404.15339  [pdf, other

    eess.IV

    Efficient EndoNeRF Reconstruction and Its Application for Data-driven Surgical Simulation

    Authors: Yuehao Wang, Bingchen Gong, Yonghao Long, Siu Hin Fan, Qi Dou

    Abstract: The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate sha… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 14 pages, 4 figures. Accepted by International Journal of Computer Assisted Radiology and Surgery

  10. arXiv:2404.09300  [pdf, other

    math.NA

    Analysis of a finite element DtN method for scattering resonances of sound hard obstacles

    Authors: Yingxia Xi, Bo Gong, Jiguang Sun

    Abstract: Scattering resonances have important applications in many areas of science and engineering. They are the replacement of discrete spectral data for problems on non-compact domains. In this paper, we consider the computation of scattering resonances defined on the exterior to a compact sound hard obstacle. The resonances are the eigenvalues of a holomorphic Fredholm operator function. We truncate th… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  11. arXiv:2403.16796  [pdf

    physics.app-ph

    Development and Assessment of a Miniaturized Thermocouple for Precise Temperature Measurement in Biological Tissues and Cells

    Authors: Onnop Srivannavit, Rakesh Joshi, Weibin Zhu, Bin Gong, Stuart C. Sealfon, Theodorian Borca-Tasciuc, Angelo Gaitas

    Abstract: This study presents a novel thermocouple instrument designed for precise temperature monitoring within biological tissues and cells, addressing a significant gap in biological research. Constructed on a Silicon-On-Insulator (SOI) substrate, the instrument employs doped silicon and chromium/gold junctions, achieving a Seebeck coefficient of up to 447 uV/K, rapid response times, high temperature acc… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  12. arXiv:2402.13217  [pdf, other

    cs.CV cs.AI

    VideoPrism: A Foundational Visual Encoder for Video Understanding

    Authors: Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

    Abstract: We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips with noisy parallel text (e.g., ASR transcripts). The pretraining approach improves upon masked autoencoding by global-local distillation of semantic… ▽ More

    Submitted 15 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024. v2: added retrieval results on MSRVTT (1K-A), more data analyses, and ablation studies

  13. arXiv:2401.06129  [pdf, other

    cs.CV

    Distilling Vision-Language Models on Millions of Videos

    Authors: Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan

    Abstract: The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this success for video-language models, but there simply is not enough human-curated video-text data available. We thus resort to fine-tuning a video-language model from a strong image-language baseline with synthesized instructional data. The resulting video model by video-i… ▽ More

    Submitted 15 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: CVPR 2024. Project page: https://zhaoyue-zephyrus.github.io/video-instruction-tuning

  14. arXiv:2401.01952  [pdf, other

    cs.CV cs.AI cs.CL

    Instruct-Imagen: Image Generation with Multi-modal Instruction

    Authors: Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia

    Abstract: This paper presents instruct-imagen, a model that tackles heterogeneous image generation tasks and generalizes across unseen tasks. We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision. It uses natural language to amalgamate disparate modalities (e.g., text, edge, style, subject, etc.), such that abundant gener… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: 20 pages, 18 figures

  15. arXiv:2312.15770  [pdf, other

    cs.CV cs.AI

    A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

    Authors: Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

    Abstract: Diffusion-based text-to-video generation has witnessed impressive progress in the past year yet still falls behind text-to-image generation. One of the key reasons is the limited scale of publicly available data (e.g., 10M video-text pairs in WebVid10M vs. 5B image-text pairs in LAION), considering the high cost of video captioning. Instead, it could be far easier to collect unlabeled clips from v… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Project page: https://tf-t2v.github.io/

  16. arXiv:2311.17002  [pdf, other

    cs.CV

    Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

    Authors: Yutong Feng, Biao Gong, Di Chen, Yujun Shen, Yu Liu, **gren Zhou

    Abstract: Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts, especially those with quantity, object-attribute binding, and multi-subject descriptions. In this work, we introduce a semantic panel as the middleware in decoding texts to images, supporting the generator to better follow instructions. The panel is obtained through arranging the visual concepts parsed… ▽ More

    Submitted 9 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  17. SeamlessNeRF: Stitching Part NeRFs with Gradient Propagation

    Authors: Bingchen Gong, Yuehao Wang, Xiaoguang Han, Qi Dou

    Abstract: Neural Radiance Fields (NeRFs) have emerged as promising digital mediums of 3D objects and scenes, sparking a surge in research to extend the editing capabilities in this domain. The task of seamless editing and merging of multiple NeRFs, resembling the ``Poisson blending'' in 2D image editing, remains a critical operation that is under-explored by existing work. To fill this gap, we propose Seaml… ▽ More

    Submitted 30 October, 2023; originally announced November 2023.

    Comments: To appear in SIGGRAPH Asia 2023. Project website is accessible at https://sites.google.com/view/seamlessnerf

  18. arXiv:2311.15841  [pdf, other

    cs.CV

    Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

    Authors: Siteng Huang, Biao Gong, Yutong Feng, Xi Chen, Yuqian Fu, Yu Liu, Donglin Wang

    Abstract: This study focuses on a novel task in text-to-image (T2I) generation, namely action customization. The objective of this task is to learn the co-existing action from limited data and generalize it to unseen humans or even animals. Experimental results show that existing subject-driven customization methods fail to learn the representative characteristics of actions and struggle in decoupling actio… ▽ More

    Submitted 10 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  19. arXiv:2311.15773  [pdf, other

    cs.CV

    Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

    Authors: Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

    Abstract: Diffusion models have recently achieved remarkable progress in generating realistic images. However, challenges remain in accurately understanding and synthesizing the layout requirements in the textual prompts. To align the generated image with layout instructions, we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.… ▽ More

    Submitted 25 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  20. arXiv:2311.06386  [pdf, other

    cs.CV cs.LG

    Towards A Unified Neural Architecture for Visual Recognition and Reasoning

    Authors: Calvin Luo, Boqing Gong, Ting Chen, Chen Sun

    Abstract: Recognition and reasoning are two pillars of visual understanding. However, these tasks have an imbalance in focus; whereas recent advances in neural networks have shown strong empirical performance in visual recognition, there has been comparably much less success in solving visual reasoning. Intuitively, unifying these two tasks under a singular framework is desirable, as they are mutually depen… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  21. Next-to-next-to-leading-order QCD corrections to double $J/ψ$ production at the $B$ factories

    Authors: Xu-Dong Huang, Bin Gong, Rui-Chang Niu, Huai-Min Yu, Jian-Xiong Wang

    Abstract: In this paper, we study the next-to-next-to-leading-order (NNLO) QCD corrections for the process $e^+e^- \to J/ψ+J/ψ$ at the $B$ factories. By including the NNLO corrections, the cross section turns negative due to the poor convergence of perturbative expansion. Consequently, to obtain a reasonable estimation for the cross section, the square of the amplitude up to NNLO is used. In addition, the c… ▽ More

    Submitted 11 February, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 15 pages, 3 figures, matches published version

    Journal ref: JHEP 02 (2024) 055

  22. arXiv:2310.05737  [pdf, other

    cs.CV cs.AI cs.MM

    Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

    Authors: Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

    Abstract: While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation. To effectively use LLMs for visual generation, one crucial component is the visual tokenizer that maps pixel-space inputs to discrete tokens appropriate for LLM learning. In this paper, we introduce MAGVIT-v2, a video tokenizer… ▽ More

    Submitted 29 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  23. arXiv:2310.04550  [pdf, other

    cs.CV cs.CL cs.LG

    Module-wise Adaptive Distillation for Multimodality Foundation Models

    Authors: Chen Liang, Jiahui Yu, Ming-Hsuan Yang, Matthew Brown, Yin Cui, Tuo Zhao, Boqing Gong, Tianyi Zhou

    Abstract: Pre-trained multimodal foundation models have demonstrated remarkable generalizability but pose challenges for deployment due to their large sizes. One effective approach to reducing their sizes is layerwise distillation, wherein small student models are trained to match the hidden representations of large teacher models at each layer. Motivated by our observation that certain architecture compone… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  24. arXiv:2309.13446  [pdf, other

    cs.CV

    Video Timeline Modeling For News Story Understanding

    Authors: Meng Liu, Mingda Zhang, Jialu Liu, Hanjun Dai, Ming-Hsuan Yang, Shuiwang Ji, Zheyun Feng, Boqing Gong

    Abstract: In this paper, we present a novel problem, namely video timeline modeling. Our objective is to create a video-associated timeline from a set of videos related to a specific topic, thereby facilitating the content and structure understanding of the story being told. This problem has significant potential in various real-world applications, for instance, news story summarization. To bootstrap resear… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted as a spotlight by NeurIPS 2023, Track on Datasets and Benchmarks

  25. arXiv:2309.13247  [pdf, other

    cs.CV

    Multi-modal Domain Adaptation for REG via Relation Transfer

    Authors: Yifan Ding, Liqiang Wang, Boqing Gong

    Abstract: Domain adaptation, which aims to transfer knowledge between domains, has been well studied in many areas such as image classification and object detection. However, for multi-modal tasks, conventional approaches rely on large-scale pre-training. But due to the difficulty of acquiring multi-modal data, large-scale pre-training is often impractical. Therefore, domain adaptation, which can efficientl… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  26. arXiv:2309.09565  [pdf, other

    eess.SP

    A Covariance Adaptive Student's t Based Kalman Filter

    Authors: Benyang Gong, Jiacheng He, Gang Wang, Bei Peng

    Abstract: In the classical Kalman filter(KF), the estimated state is a linear combination of the one-step predicted state and measurement state, their confidence level change when the prediction mean square error matrix and covariance matrix of measurement noise vary. The existing student's t based Kalman filter(TKF) works similarly to the way KF works, they both work well with impulse noise, but when it co… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  27. arXiv:2308.13280  [pdf, other

    physics.ao-ph cs.AI cs.LG physics.comp-ph

    AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning

    Authors: Christian Lessig, Ilaria Luise, Bing Gong, Michael Langguth, Scarlet Stadtler, Martin Schultz

    Abstract: The atmosphere affects humans in a multitude of ways, from loss of life due to adverse weather effects to long-term social and economic impacts on societies. Computer simulations of atmospheric dynamics are, therefore, of great importance for the well-being of our and future generations. Here, we propose AtmoRep, a novel, task-independent stochastic computer model of atmospheric dynamics that can… ▽ More

    Submitted 7 September, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

  28. arXiv:2308.01002  [pdf, other

    astro-ph.HE astro-ph.SR

    The innermost jet in the hidden ultra-luminous X-ray source Cygnus X-3

    Authors: Jun Yang, Federico García, Santiago del Palacio, Ralph Spencer, Zsolt Paragi, Noel Castro Segura, Bi** Gong, Hongmin Cao, Wen Chen

    Abstract: Cygnus X-3 is a high-mass X-ray binary with a compact object accreting matter from a Wolf-Rayet donor star. Recently, it has been revealed by the Imaging X-ray Polarimetry Explorer (IXPE) as a hidden Galactic ultra-luminous X-ray (ULX) source with a luminosity above the Eddington limit along the direction of a narrow (opening angle <~32 degree) funnel. In between the IXPE observations, we observed… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 7 pages, 8 figures, accepted for publication in MNRAS Letters

  29. arXiv:2307.14886  [pdf, other

    astro-ph.SR astro-ph.HE

    On the Formation of Eccentric Millisecond Pulsars by Accretion-induced Collapse of Massive White Dwarfs

    Authors: D. Wang, B. P. Gong

    Abstract: The millisecond pulsar(MSP) is believed to be an old neutron star(NS) having undergone spin-up by the accreting material from the donor. Whereas, the discovery of eccentric millisecond pulsars (eMSPs) in the Galactic field challenges such a scenario producing MSP-white dwarf (WD) only in the circular orbit. As orbital periods and companion mass of these eMSPs are located in a narrow range, a reaso… ▽ More

    Submitted 20 October, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: accepted by MNRAS

  30. arXiv:2307.03166  [pdf, other

    cs.CV

    VideoGLUE: Video General Understanding Evaluation of Foundation Models

    Authors: Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

    Abstract: We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task. Moreover, we propose a scalar VideoG… ▽ More

    Submitted 1 December, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: Fixes some typos and include project open-source page: https://github.com/tensorflow/models/tree/master/official/projects/videoglue

  31. arXiv:2306.09669  [pdf, ps, other

    cond-mat.supr-con cond-mat.mtrl-sci cond-mat.str-el

    Exploring charge and spin fluctuations in infinite-layer cuprate SrCuO$_{2}$ from a phonon perspective

    Authors: Xin Du, Pei-Han Sun, Ben-Chao Gong, Jian-Feng Zhang, Zhong-Yi Lu, Kai Liu

    Abstract: The infinite-layer cuprate $A$CuO$_2$ ($A=$ Ca, Sr, Ba) has the simplest crystal structure among numerous cuprate superconductors and can serve a prototypical system to explore the unconventional superconductivity. Based on the first-principles electronic structure calculations, we have studied the electronic and magnetic properties of the infinite-layer cuprate SrCuO$_{2}$ from a phonon perspecti… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: 7 pages, 4 figures, 1 table

  32. arXiv:2306.03515  [pdf, other

    cs.LG cs.AI cs.LO

    Logic Diffusion for Knowledge Graph Reasoning

    Authors: Xiaoying Xie, Biao Gong, Yiliang Lv, Zhen Han, Guoshuai Zhao, Xueming Qian

    Abstract: Most recent works focus on answering first order logical queries to explore the knowledge graph reasoning via multi-hop logic predictions. However, existing reasoning models are limited by the circumscribed logical paradigms of training samples, which leads to a weak generalization of unseen logic. To address these issues, we propose a plug-in module called Logic Diffusion (LoD) to discover unseen… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: 10 pages, 6 figures

  33. arXiv:2304.10199  [pdf, other

    cs.IR

    Selective and Collaborative Influence Function for Efficient Recommendation Unlearning

    Authors: Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Biao Gong, Jun Wang

    Abstract: Recent regulations on the Right to be Forgotten have greatly influenced the way of running a recommender system, because users now have the right to withdraw their private data. Besides simply deleting the target data in the database, unlearning the associated data lineage e.g., the learned personal features and preferences in the model, is also necessary for data withdrawal. Existing unlearning m… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

  34. arXiv:2304.07882  [pdf, other

    cs.CV

    Federated Learning of Shareable Bases for Personalization-Friendly Image Classification

    Authors: Hong-You Chen, Jike Zhong, Mingda Zhang, Xuhui Jia, Hang Qi, Boqing Gong, Wei-Lun Chao, Li Zhang

    Abstract: Personalized federated learning (PFL) aims to harness the collective wisdom of clients' data while building personalized models tailored to individual clients' data distributions. Existing works offer personalization primarily to clients who participate in the FL process, making it hard to encompass new clients who were absent or newly show up. In this paper, we propose FedBasis, a novel PFL frame… ▽ More

    Submitted 31 October, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

    Comments: Preprint

  35. arXiv:2304.07429  [pdf, other

    cs.CV

    Identity Encoder for Personalized Diffusion

    Authors: Yu-Chuan Su, Kelvin C. K. Chan, Yandong Li, Yang Zhao, Han Zhang, Boqing Gong, Huisheng Wang, Xuhui Jia

    Abstract: Many applications can benefit from personalized image generation models, including image enhancement, video conferences, just to name a few. Existing works achieved personalization by fine-tuning one model for each person. While being successful, this approach incurs additional computation and storage overhead for each new identity. Furthermore, it usually expects tens or hundreds of examples per… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

  36. arXiv:2304.02720  [pdf, other

    eess.IV cs.CR cs.CV

    Domain Generalization with Adversarial Intensity Attack for Medical Image Segmentation

    Authors: Zheyuan Zhang, Bin Wang, Lanhong Yao, Ugur Demir, Debesh Jha, Ismail Baris Turkbey, Boqing Gong, Ulas Bagci

    Abstract: Most statistical learning algorithms rely on an over-simplified assumption, that is, the train and test data are independent and identically distributed. In real-world scenarios, however, it is common for models to encounter data from new and different domains to which they were not exposed to during training. This is often the case in medical imaging applications due to differences in acquisition… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Code is available upon publication

  37. arXiv:2304.02642  [pdf, other

    cs.CV

    Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models

    Authors: Xuhui Jia, Yang Zhao, Kelvin C. K. Chan, Yandong Li, Han Zhang, Boqing Gong, Tingbo Hou, Huisheng Wang, Yu-Chuan Su

    Abstract: This paper proposes a method for generating images of customized objects specified by users. The method is based on a general framework that bypasses the lengthy optimization required by previous approaches, which often employ a per-object optimization paradigm. Our framework adopts an encoder to capture high-level identifiable semantics of objects, producing an object-specific embedding with only… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  38. arXiv:2303.16341  [pdf, other

    cs.CV

    Structured Video-Language Modeling with Temporal Grou** and Spatial Grounding

    Authors: Yuanhao Xiong, Long Zhao, Boqing Gong, Ming-Hsuan Yang, Florian Schroff, Ting Liu, Cho-Jui Hsieh, Liangzhe Yuan

    Abstract: Existing video-language pre-training methods primarily focus on instance-level alignment between video clips and captions via global contrastive learning but neglect rich fine-grained local information in both videos and text, which is of importance to downstream tasks requiring temporal localization and semantic reasoning. A powerful model is expected to be capable of capturing region-object corr… ▽ More

    Submitted 8 March, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

  39. arXiv:2303.15230  [pdf, other

    cs.CV cs.CL cs.LG

    Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

    Authors: Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, Donglin Wang

    Abstract: Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs. Relying on learning the joint representation of seen compositions, these methods ignore the explicit modeling of the state and object, thus limiting the exploitation of pre-trained knowledge and generalization to unseen compo… ▽ More

    Submitted 25 March, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: CVPR 2024

  40. arXiv:2303.08998  [pdf, other

    cs.CV

    Unified Visual Relationship Detection with Vision and Language Models

    Authors: Long Zhao, Liangzhe Yuan, Boqing Gong, Yin Cui, Florian Schroff, Ming-Hsuan Yang, Hartwig Adam, Ting Liu

    Abstract: This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets. Merging labels spanning different datasets could be challenging due to inconsistent taxonomies. The issue is exacerbated in visual relationship detection when second-order visual semantics are introduced between pairs of objects. To address this challenge, we propos… ▽ More

    Submitted 20 August, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023. Code is available at https://github.com/google-research/scenic/tree/main/scenic/projects/univrd

  41. arXiv:2303.08561  [pdf, other

    cs.SD eess.AS

    Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation

    Authors: Yulin Pan, Xiangteng He, Biao Gong, Yuxin Peng, Yiliang Lv

    Abstract: Existing audio analysis methods generally first transform the audio stream to spectrogram, and then feed it into CNN for further analysis. A standard CNN recognizes specific visual patterns over feature map, then pools for high-level representation, which overlooks the positional information of recognized patterns. However, unlike natural image, the semantic of an audio spectrogram is sensitive to… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 8 pages, 4 figures

  42. Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

    Authors: Yulin Pan, Xiangteng He, Biao Gong, Yiliang Lv, Yujun Shen, Yuxin Peng, Deli Zhao

    Abstract: Video temporal grounding aims to pinpoint a video segment that matches the query description. Despite the recent advance in short-form videos (\textit{e.g.}, in minutes), temporal grounding in long videos (\textit{e.g.}, in hours) is still at its early stage. To address this challenge, a common practice is to employ a sliding window, yet can be inefficient and inflexible due to the limited number… ▽ More

    Submitted 22 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: 11 pages, 8 figures

    Journal ref: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

  43. arXiv:2303.06911  [pdf, other

    cs.CV

    ViM: Vision Middleware for Unified Downstream Transferring

    Authors: Yutong Feng, Biao Gong, Jianwen Jiang, Yiliang Lv, Yujun Shen, Deli Zhao, **gren Zhou

    Abstract: Foundation models are pre-trained on massive data and transferred to downstream tasks via fine-tuning. This work presents Vision Middleware (ViM), a new learning paradigm that targets unified transferring from a single foundation model to a variety of downstream tasks. ViM consists of a zoo of lightweight plug-in modules, each of which is independently learned on a midstream dataset with a shared… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  44. arXiv:2302.06891  [pdf, other

    cs.CV

    UKnow: A Unified Knowledge Protocol for Common-Sense Reasoning and Vision-Language Pre-training

    Authors: Biao Gong, Xiaoying Xie, Yutong Feng, Yiliang Lv, Yujun Shen, Deli Zhao

    Abstract: This work presents a unified knowledge protocol, called UKnow, which facilitates knowledge-based studies from the perspective of data. Particularly focusing on visual and linguistic modalities, we categorize data knowledge into five unit types, namely, in-image, in-text, cross-image, cross-text, and image-text, and set up an efficient pipeline to help construct the multimodal knowledge graph from… ▽ More

    Submitted 21 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

  45. RecolorNeRF: Layer Decomposed Radiance Fields for Efficient Color Editing of 3D Scenes

    Authors: Bingchen Gong, Yuehao Wang, Xiaoguang Han, Qi Dou

    Abstract: Radiance fields have gradually become a main representation of media. Although its appearance editing has been studied, how to achieve view-consistent recoloring in an efficient manner is still under explored. We present RecolorNeRF, a novel user-friendly color editing approach for the neural radiance fields. Our key idea is to decompose the scene into a set of pure-colored layers, forming a palet… ▽ More

    Submitted 18 September, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: To appear in ACM Multimedia 2023. Project website is accessible at https://sites.google.com/view/recolornerf

  46. arXiv:2212.12053  [pdf, other

    cs.CV cs.AI cs.LG

    On Calibrating Semantic Segmentation Models: Analyses and An Algorithm

    Authors: Dongdong Wang, Boqing Gong, Liqiang Wang

    Abstract: We study the problem of semantic segmentation calibration. Lots of solutions have been proposed to approach model miscalibration of confidence in image classification. However, to date, confidence calibration research on semantic segmentation is still limited. We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach. First, we fin… ▽ More

    Submitted 25 March, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: Accepted to CVPR-2023 (8 pages, 4 figures)

  47. Next-to-next-to-leading-order QCD corrections to $J/ψ$ plus $η_c$ production at the $B$ factories

    Authors: Xu-Dong Huang, Bin Gong, Jian-Xiong Wang

    Abstract: In this paper, we calculate the next-to-next-to-leading-order (NNLO) QCD corrections to $e^+e^- \to J/ψ+η_c$ at the $B$ factories. After including the NNLO corrections, the cross section of $e^+e^- \to J/ψ+η_c$ is enhanced by about $17\%$, and the perturbative series of the prediction shows the convergent behavior. It is also found that the contribution from bottom quark starts at the $α_s^3$-orde… ▽ More

    Submitted 7 February, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: 13 pages, 3 figures. Published version

    Journal ref: JHEP 02 (2023) 049

  48. arXiv:2211.12764  [pdf, other

    cs.CV cs.AI cs.CL

    VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval

    Authors: Siteng Huang, Biao Gong, Yulin Pan, Jianwen Jiang, Yiliang Lv, Yuyuan Li, Donglin Wang

    Abstract: Many recent studies leverage the pre-trained CLIP for text-video cross-modal retrieval by tuning the backbone with additional heavy modules, which not only brings huge computational burdens with much more parameters, but also leads to the knowledge forgetting from upstream models. In this work, we propose the VoP: Text-Video Co-operative Prompt Tuning for efficient tuning on the text-video retriev… ▽ More

    Submitted 21 March, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted by CVPR 2023

  49. Intrinsic ferromagnetic axion states and a single pair of Weyl fermions in the stable-state Mn\emph{X}$_{2}$\emph{B}$_{2}$\emph{T}$_{6}$-family materials

    Authors: Yan Gao, Weikang Wu, Ben-Chao Gong, Huan-Cheng Yang, Xiang-Feng Zhou, Yong Liu, Shengyuan A. Yang, Kai Liu, Zhong-Yi Lu

    Abstract: The intrinsic ferromagnetic (FM) axion insulators and Weyl semimetals (WSMs) with only single pair of Weyl points have drawn intensive attention but so far remain rare and elusive in real materials. Here, we propose a new class of Mn\emph{X}$_{2}$\emph{B}$_{2}$\emph{T}$_{6}$-B (\emph{X}=Ge, Sn, or Pb; \emph{B}=Sb or Bi; \emph{T}=Se or Te) family that is the stable structural form of this system. W… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: 6 pages, 5 figures

  50. arXiv:2210.08064  [pdf, other

    cs.CV cs.RO

    LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds

    Authors: Minghua Liu, Yin Zhou, Charles R. Qi, Boqing Gong, Hao Su, Dragomir Anguelov

    Abstract: Semantic segmentation of LiDAR point clouds is an important task in autonomous driving. However, training deep models via conventional supervised methods requires large datasets which are costly to label. It is critical to have label-efficient segmentation approaches to scale up the model to new operational domains or to improve performance on rare cases. While most prior works focus on indoor sce… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.