Skip to main content

Showing 1–50 of 361 results for author: Zhong, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17672  [pdf, other

    cs.SD eess.AS

    SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

    Authors: Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of mod… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 6 pages, 8 figures, 8 tables. Audio samples: https://zzaudio.github.io/SpecMaskGIT/index.html

  2. arXiv:2406.11933  [pdf, other

    cs.CV

    Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset

    Authors: Fengxiang Wang, Hongzhen Wang, Di Wang, Zonghao Guo, Zhenyu Zhong, Long Lan, **g Zhang, Zhiyuan Liu, Maosong Sun

    Abstract: Masked Image Modeling (MIM) has emerged as a pivotal approach for develo** foundational visual models in the field of remote sensing (RS). However, current RS datasets are limited in volume and diversity, which significantly constrains the capacity of MIM methods to learn generalizable representations. In this study, we introduce \textbf{RS-4M}, a large-scale dataset designed to enable highly ef… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.10710  [pdf, other

    cs.AI cs.CL

    SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task

    Authors: Ziije Zhong, Linqing Zhong, Zhaoze Sun, Qingyun **, Zengchang Qin, Xiaofan Zhang

    Abstract: Integrating Large Language Models (LLMs) with existing Knowledge Graph (KG) databases presents a promising avenue for enhancing LLMs' efficacy and mitigating their "hallucinations". Given that most KGs reside in graph databases accessible solely through specialized query languages (e.g., Cypher), there exists a critical need to bridge the divide between LLMs and KG databases by automating the tran… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 19 pages, 15 figures, 8 tables

  4. arXiv:2406.09732  [pdf, ps, other

    math.PR cs.GT econ.TH

    Finding pure Nash equilibria in large random games

    Authors: Andrea Collevecchio, Tuan-Minh Nguyen, Ziwen Zhong

    Abstract: Best Response Dynamics (BRD) is a class of strategy updating rules to find Pure Nash Equilibria (PNE) in a game. At each step, a player is randomly picked and they switches to a "best response" strategy based on the strategies chosen by others, so that the new strategy profile maximises their payoff. If no such strategy exists, a different player will be chosen randomly. When no player wants to ch… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 19 pages, 5 figures, 1 table

    MSC Class: 91A10; 91A06; 60K35; 60K37

  5. arXiv:2406.06813  [pdf, other

    cs.CV

    Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation

    Authors: Dong Zhao, Shuang Wang, Qi Zang, Licheng Jiao, Nicu Sebe, Zhun Zhong

    Abstract: We study source-free unsupervised domain adaptation (SFUDA) for semantic segmentation, which aims to adapt a source-trained model to the target domain without accessing the source data. Many works have been proposed to address this challenging problem, among which uncertainty-based self-training is a predominant approach. However, without comprehensive denoising mechanisms, they still largely fall… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 2024 Conference on Computer Vision and Pattern Recognition

    Journal ref: (2024 Conference on Computer Vision and Pattern Recognition)

  6. arXiv:2406.01302  [pdf

    cs.CV

    Pulmonary Embolism Mortality Prediction Using Multimodal Learning Based on Computed Tomography Angiography and Clinical Data

    Authors: Zhusi Zhong, Helen Zhang, Fayez H. Fayad, Andrew C. Lancaster, John Sollee, Shreyas Kulkarni, Cheng Ting Lin, Jie Li, Xinbo Gao, Scott Collins, Colin Greineder, Sun H. Ahn, Harrison X. Bai, Zhicheng Jiao, Michael K. Atalay

    Abstract: Purpose: Pulmonary embolism (PE) is a significant cause of mortality in the United States. The objective of this study is to implement deep learning (DL) models using Computed Tomography Pulmonary Angiography (CTPA), clinical data, and PE Severity Index (PESI) scores to predict PE mortality. Materials and Methods: 918 patients (median age 64 years, range 13-99 years, 52% female) with 3,978 CTPAs w… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  7. arXiv:2406.00806  [pdf, other

    cs.LG

    Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

    Authors: Chentao Cao, Zhun Zhong, Zhanke Zhou, Yang Liu, Tongliang Liu, Bo Han

    Abstract: Detecting out-of-distribution (OOD) samples is essential when deploying machine learning models in open-world scenarios. Zero-shot OOD detection, requiring no training on in-distribution (ID) data, has been possible with the advent of vision-language models like CLIP. Existing methods build a text-based classifier with only closed-set labels. However, this largely restricts the inherent capability… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  8. arXiv:2406.00456  [pdf, other

    cs.LG cs.AI cs.CL

    Mix-of-Granularity: Optimize the Chunking Granularity for Retrieval-Augmented Generation

    Authors: Zijie Zhong, Hanwen Liu, Xiaoya Cui, Xiaofan Zhang, Zengchang Qin

    Abstract: Integrating information from different reference data sources is a major challenge for Retrieval-Augmented Generation (RAG) systems because each knowledge source adopts a unique data structure and follows different conventions. Retrieving from multiple knowledge sources with one fixed strategy usually leads to under-exploitation of information. To mitigate this drawback, inspired by Mix-of-Expert,… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 17 pages, 6 figures and 8 tables

  9. arXiv:2405.18503  [pdf, other

    cs.SD cs.LG eess.AS

    SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

    Authors: Koichi Saito, Dongjun Kim, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong, Yuhta Takida, Yuki Mitsufuji

    Abstract: Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often suffer from slow inference speeds. This drawback burdens creators, who typically refine their sounds through trial and error… ▽ More

    Submitted 10 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Audio samples: https://koichi-saito-sony.github.io/soundctm/. Codes: https://github.com/sony/soundctm. Checkpoints: https://huggingface.co/Sony/soundctm

  10. arXiv:2405.17267  [pdf, other

    cs.LG cs.CV

    FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning and Logit Distillation

    Authors: Yuting Ma, Lechao Cheng, Yaxiong Wang, Zhun Zhong, Xiaohua Xu, Meng Wang

    Abstract: Federated learning (FL) is a popular privacy-preserving paradigm that enables distributed clients to collaboratively train models with a central server while kee** raw data locally. In practice, distinct model architectures, varying data distributions, and limited resources across local clients inevitably cause model performance degradation and a slowdown in convergence speed. However, existing… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 35 pages

  11. arXiv:2405.15556  [pdf, other

    cs.LG cs.CL cs.CR

    Certifiably Robust RAG against Retrieval Corruption

    Authors: Chong Xiang, Tong Wu, Zexuan Zhong, David Wagner, Danqi Chen, Prateek Mittal

    Abstract: Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks: an attacker can inject malicious passages into retrieval results to induce inaccurate responses. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we get LLM responses from each pas… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  12. arXiv:2405.14905  [pdf, other

    eess.IV cs.AI cs.CL

    Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation

    Authors: Kang Liu, Zhuoqi Ma, Xiaolu Kang, Zhusi Zhong, Zhicheng Jiao, Grayson Baird, Harrison Bai, Qiguang Miao

    Abstract: The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction a… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: The code is available at https://github.com/mk-runner/SEI-Temp or https://github.com/mk-runner/SEI

  13. arXiv:2405.14598  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

    Authors: Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

    Abstract: In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation method… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 10 pages

  14. arXiv:2405.14113  [pdf, other

    eess.IV cs.CV

    Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation

    Authors: Zhusi Zhong, Jie Li, John Sollee, Scott Collins, Harrison Bai, Paul Zhang, Terrence Healey, Michael Atalay, Xinbo Gao, Zhicheng Jiao

    Abstract: In response to the worldwide COVID-19 pandemic, advanced automated technologies have emerged as valuable tools to aid healthcare professionals in managing an increased workload by improving radiology report generation and prognostic analysis. This study proposes Multi-modality Regional Alignment Network (MRANet), an explainable model for radiology report generation and survival prediction that foc… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  15. arXiv:2405.13770  [pdf, other

    cs.RO

    Expansion-GRR: Efficient Generation of Smooth Global Redundancy Resolution Roadmaps

    Authors: Zhuoyun Zhong, Zhi Li, Constantinos Chamzas

    Abstract: Global redundancy resolution (GRR) roadmap is a novel concept in robotics that facilitates the map** from task space paths to configuration space paths in a legible, predictable, and repeatable way. Such roadmaps could find widespread utility in applications such as safe teleoperation, consistent path planning, and factory workcell design. However, the previous methods to compute GRR roadmaps of… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  16. arXiv:2405.12724  [pdf, other

    cs.CV

    RemoCap: Disentangled Representation Learning for Motion Capture

    Authors: Hongsheng Wang, Lizao Zhang, Zhangnan Zhong, Shuolin Xu, Xinrui Zhou, Shengyu Zhang, Huahao Xu, Fei Wu, Feng Lin

    Abstract: Reconstructing 3D human bodies from realistic motion sequences remains a challenge due to pervasive and complex occlusions. Current methods struggle to capture the dynamics of occluded body parts, leading to model penetration and distorted motion. RemoCap leverages Spatial Disentanglement (SD) and Motion Disentanglement (MD) to overcome these limitations. SD addresses occlusion interference betwee… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  17. arXiv:2405.11238  [pdf, other

    cs.LG cs.AI

    SimAD: A Simple Dissimilarity-based Approach for Time Series Anomaly Detection

    Authors: Zhijie Zhong, Zhiwen Yu, Xing Xi, Yue Xu, Jiahui Chen, Kaixiang Yang

    Abstract: Despite the prevalence of reconstruction-based deep learning methods, time series anomaly detection remains challenging. Existing approaches often struggle with limited temporal contexts, inadequate representation of normal patterns, and flawed evaluation metrics, hindering their effectiveness in identifying aberrant behavior. To address these issues, we introduce $\textbf{SimAD}$, a… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 18 pages, 12 figures,7 tables, Under review

  18. arXiv:2405.03133  [pdf, other

    cs.CL cs.LG

    Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training

    Authors: Zexuan Zhong, Mengzhou Xia, Danqi Chen, Mike Lewis

    Abstract: Mixture-of-experts (MoE) models facilitate efficient scaling; however, training the router network introduces the challenge of optimizing a non-differentiable, discrete objective. Recently, a fully-differentiable MoE architecture, SMEAR, was proposed (Muqeeth et al., 2023), which softly merges experts in the parameter space; nevertheless, its effectiveness was only demonstrated in downstream fine-… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 21 pages, 12 figures

  19. arXiv:2405.02815  [pdf, other

    cs.CV cs.AI

    Region-specific Risk Quantification for Interpretable Prognosis of COVID-19

    Authors: Zhusi Zhong, Jie Li, Zhuoqi Ma, Scott Collins, Harrison Bai, Paul Zhang, Terrance Healey, Xinbo Gao, Michael K. Atalay, Zhicheng Jiao

    Abstract: The COVID-19 pandemic has strained global public health, necessitating accurate diagnosis and intervention to control disease spread and reduce mortality rates. This paper introduces an interpretable deep survival prediction model designed specifically for improved understanding and trust in COVID-19 prognosis using chest X-ray (CXR) images. By integrating a large-scale pretrained image encoder, R… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  20. arXiv:2404.18392  [pdf, other

    cs.DC

    Dflow, a Python framework for constructing cloud-native AI-for-Science workflows

    Authors: Xinzijian Liu, Yanbo Han, Zhuoyuan Li, Jiahao Fan, Chengqian Zhang, **zhe Zeng, Yifan Shan, Yannan Yuan, Wei-Hong Xu, Yun-Pei Liu, Yuzhi Zhang, Tongqi Wen, Darrin M. York, Zhicheng Zhong, Hang Zheng, Jun Cheng, Linfeng Zhang, Han Wang

    Abstract: In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on both cloud and high-performance supercomputers. Here we introduce Dflow, an open-source Python toolkit designed for scientists to construct workflows with simple… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  21. arXiv:2404.16233  [pdf, other

    cs.LG cs.AI

    AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

    Authors: Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

    Abstract: AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted at AutoML 2024 Conference

  22. arXiv:2404.15661  [pdf, other

    cs.GR cs.CG cs.CV

    CWF: Consolidating Weak Features in High-quality Mesh Simplification

    Authors: Rui Xu, Longdu Liu, Ningna Wang, Shuangmin Chen, Shiqing Xin, Xiaohu Guo, Zichun Zhong, Taku Komura, Wen** Wang, Changhe Tu

    Abstract: In mesh simplification, common requirements like accuracy, triangle quality, and feature alignment are often considered as a trade-off. Existing algorithms concentrate on just one or a few specific aspects of these requirements. For example, the well-known Quadric Error Metrics (QEM) approach prioritizes accuracy and can preserve strong feature lines/points as well but falls short in ensuring high… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 14 pages, 22 figures

  23. arXiv:2404.10378  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

    Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw , et al. (33 additional authors not shown)

    Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.10476

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)

  24. arXiv:2404.03642  [pdf, other

    cs.CV

    DiffBody: Human Body Restoration by Imagining with Generative Diffusion Prior

    Authors: Yiming Zhang, Zhe Wang, Xinjie Li, Yunchen Yuan, Chengsong Zhang, Xiao Sun, Zhihang Zhong, Jian Wang

    Abstract: Human body restoration plays a vital role in various applications related to the human body. Despite recent advances in general image restoration using generative models, their performance in human body restoration remains mediocre, often resulting in foreground and background blending, over-smoothing surface textures, missing accessories, and distorted limbs. Addressing these challenges, we propo… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  25. arXiv:2403.19969  [pdf, other

    cs.CV cs.LG

    Separate, Dynamic and Differentiable (SMART) Pruner for Block/Output Channel Pruning on Computer Vision Tasks

    Authors: Guanhua Ding, Zexi Ye, Zhen Zhong, Gang Li, David Shao

    Abstract: Deep Neural Network (DNN) pruning has emerged as a key strategy to reduce model size, improve inference latency, and lower power consumption on DNN accelerators. Among various pruning techniques, block and output channel pruning have shown significant potential in accelerating hardware performance. However, their accuracy often requires further improvement. In response to this challenge, we introd… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  26. arXiv:2403.19160  [pdf, other

    cs.CV

    Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence

    Authors: Yutong Chen, Yifan Zhan, Zhihang Zhong, Wei Wang, Xiao Sun, Yu Qiao, Yinqiang Zheng

    Abstract: Neural rendering techniques have significantly advanced 3D human body modeling. However, previous approaches often overlook dynamics induced by factors such as motion inertia, leading to challenges in scenarios like abrupt stops after rotation, where the pose remains static while the appearance changes. This limitation arises from reliance on a single pose as conditional input, resulting in ambigu… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  27. arXiv:2403.18814  [pdf, other

    cs.CV cs.AI cs.CL

    Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

    Authors: Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

    Abstract: In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from thr… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Code and models are available at https://github.com/dvlab-research/MiniGemini

  28. arXiv:2403.17839  [pdf, other

    cs.CV cs.AI

    ReMamber: Referring Image Segmentation with Mamba Twister

    Authors: Yuhuan Yang, Chaofan Ma, Jiangchao Yao, Zhun Zhong, Ya Zhang, Yanfeng Wang

    Abstract: Referring Image Segmentation (RIS) leveraging transformers has achieved great success on the interpretation of complex visual-language tasks. However, the quadratic computation cost makes it resource-consuming in capturing long-range visual-language dependencies. Fortunately, Mamba addresses this with efficient linear complexity in processing. However, directly applying Mamba to multi-modal intera… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  29. arXiv:2403.12457  [pdf, other

    cs.CV

    Privacy-Preserving Face Recognition Using Trainable Feature Subtraction

    Authors: Yuxi Mi, Zhizhou Zhong, Yuge Huang, Jiazhen Ji, Jianqing Xu, Jun Wang, Shaoming Wang, Shouhong Ding, Shuigeng Zhou

    Abstract: The widespread adoption of face recognition has led to increasing privacy concerns, as unauthorized access to face images can expose sensitive personal information. This paper explores face image protection against viewing and recovery attacks. Inspired by image compression, we propose creating a visually uninformative face image through feature subtraction between an original face and its model-p… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  30. arXiv:2403.10497  [pdf, ps, other

    eess.SY cs.LG

    Data-Driven Distributionally Robust Safety Verification Using Barrier Certificates and Conditional Mean Embeddings

    Authors: Oliver Schön, Zhengang Zhong, Sadegh Soudjani

    Abstract: Algorithmic verification of realistic systems to satisfy safety and other temporal requirements has suffered from poor scalability of the employed formal approaches. To design systems with rigorous guarantees, many approaches still rely on exact models of the underlying systems. Since this assumption can rarely be met in practice, models have to be inferred from measurement data or are bypassed co… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 7 pages, 2 figures, accepted to American Control Conference (ACC) 2024

  31. arXiv:2403.07369  [pdf, other

    cs.CV

    Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery

    Authors: Haiyang Zheng, Nan Pu, Wen**g Li, Nicu Sebe, Zhun Zhong

    Abstract: In this paper, we study the problem of Generalized Category Discovery (GCD), which aims to cluster unlabeled data from both known and unknown categories using the knowledge of labeled data from known categories. Current GCD methods rely on only visual cues, which however neglect the multi-modality perceptive nature of human cognitive processes in discovering novel visual categories. To address thi… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  32. arXiv:2403.07347  [pdf, other

    cs.CV

    Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

    Authors: Fei Wang, Dan Guo, Kun Li, Zhun Zhong, Meng Wang

    Abstract: Video Motion Magnification (VMM) aims to reveal subtle and imperceptible motion information of objects in the macroscopic world. Prior methods directly model the motion field from the Eulerian perspective by Representation Learning that separates shape and texture or Multi-domain Learning from phase fluctuations. Inspired by the frequency spectrum, we observe that the low-frequency components with… ▽ More

    Submitted 24 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  33. arXiv:2403.06424  [pdf, other

    stat.ML cs.CV cs.LG

    Bridging Domains with Approximately Shared Features

    Authors: Ziliang Samuel Zhong, Xiang Pan, Qi Lei

    Abstract: Multi-source domain adaptation aims to reduce performance degradation when applying machine learning models to unseen domains. A fundamental challenge is devising the optimal strategy for feature selection. Existing literature is somewhat paradoxical: some advocate for learning invariant features from source domains, while others favor more diverse features. To address the challenge, we propose a… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  34. arXiv:2403.05075  [pdf, other

    cs.LG q-bio.BM

    Benchmarking Large Language Models for Molecule Prediction Tasks

    Authors: Zhiqiang Zhong, Kuangyu Zhou, Davide Mottin

    Abstract: Large Language Models (LLMs) stand at the forefront of a number of Natural Language Processing (NLP) tasks. Despite the widespread adoption of LLMs in NLP, much of their potential in broader fields remains largely unexplored, and significant limitations persist in their design and implementation. Notably, LLMs struggle with structured data, such as graphs, and often falter when tasked with answeri… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  35. arXiv:2403.04272  [pdf, other

    cs.CV

    Active Generalized Category Discovery

    Authors: Shijie Ma, Fei Zhu, Zhun Zhong, Xu-Yao Zhang, Cheng-Lin Liu

    Abstract: Generalized Category Discovery (GCD) is a pragmatic and challenging open-world task, which endeavors to cluster unlabeled samples from both novel and old classes, leveraging some labeled data of old classes. Given that knowledge learned from old classes is not fully transferable to new classes, and that novel categories are fully unlabeled, GCD inherently faces intractable problems, including imba… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  36. arXiv:2403.03187  [pdf, other

    cs.CL cs.AI cs.LG

    Reliable, Adaptable, and Attributable Language Models with Retrieval

    Authors: Akari Asai, Zexuan Zhong, Danqi Chen, Pang Wei Koh, Luke Zettlemoyer, Hannaneh Hajishirzi, Wen-tau Yih

    Abstract: Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, difficulty in adapting to new data distributions, and a lack of verifiability. In this position paper, we advocate for retrieval-augmented LMs to replace parametric LMs as the next generation of LMs. By… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  37. arXiv:2402.17235  [pdf, other

    cs.LG

    Stochastic Gradient Succeeds for Bandits

    Authors: **cheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans

    Abstract: We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size. Remarkably, global convergence of the stochastic gradient bandit algorithm has not been previously established, even though it is an old algorithm known to be applicable to bandits. The new result is achieved by establishing two nove… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 39 pages; Correction for a previous version published at ICML 2023 conference

  38. arXiv:2402.13418  [pdf, other

    cs.LG q-bio.BM

    Efficiently Predicting Mutational Effect on Homologous Proteins by Evolution Encoding

    Authors: Zhiqiang Zhong, Davide Mottin

    Abstract: Predicting protein properties is paramount for biological and medical advancements. Current protein engineering mutates on a typical protein, called the wild-type, to construct a family of homologous proteins and study their properties. Yet, existing methods easily neglect subtle mutations, failing to capture the effect on the protein properties. To this end, we propose EvolMPNN, Evolution-aware M… ▽ More

    Submitted 25 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  39. arXiv:2402.13414  [pdf, other

    cs.LG cs.CL

    Harnessing Large Language Models as Post-hoc Correctors

    Authors: Zhiqiang Zhong, Kuangyu Zhou, Davide Mottin

    Abstract: As Machine Learning (ML) models grow in size and demand higher-quality training data, the expenses associated with re-training and fine-tuning these models are escalating rapidly. Inspired by recent impressive achievements of Large Language Models (LLMs) in different fields, this paper delves into the question: can LLMs efficiently improve an ML's performance at a minimal cost? We show that, throu… ▽ More

    Submitted 25 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  40. arXiv:2402.07763  [pdf, other

    math.OC cs.LG math.NA

    Multi-level Optimal Control with Neural Surrogate Models

    Authors: Dante Kalise, Estefanía Loayza-Romero, Kirsten A. Morris, Zhengang Zhong

    Abstract: Optimal actuator and control design is studied as a multi-level optimisation problem, where the actuator design is evaluated based on the performance of the associated optimal closed loop. The evaluation of the optimal closed loop for a given actuator realisation is a computationally demanding task, for which the use of a neural network surrogate is proposed. The use of neural network surrogates t… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  41. arXiv:2402.05135  [pdf, other

    cs.AI cs.CL cs.IR

    CADReN: Contextual Anchor-Driven Relational Network for Controllable Cross-Graphs Node Importance Estimation

    Authors: Zijie Zhong, Yunhui Zhang, Ziyi Chang, Zengchang Qin

    Abstract: Node Importance Estimation (NIE) is crucial for integrating external information into Large Language Models through Retriever-Augmented Generation. Traditional methods, focusing on static, single-graph characteristics, lack adaptability to new graphs and user-specific requirements. CADReN, our proposed method, addresses these limitations by introducing a Contextual Anchor (CA) mechanism. This appr… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 8 pages, 6 figures

    MSC Class: 68T07

  42. arXiv:2401.17868  [pdf, other

    cs.CV cs.LG

    Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

    Authors: Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan

    Abstract: The Segment Anything Model (SAM) stands as a foundational framework for image segmentation. While it exhibits remarkable zero-shot generalization in typical scenarios, its advantage diminishes when applied to specialized domains like medical imagery and remote sensing. To address this limitation, this paper introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning approach. By i… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted at ICLR 2024 Conference

  43. arXiv:2401.13837  [pdf, other

    cs.CV

    Democratizing Fine-grained Visual Recognition with Large Language Models

    Authors: Mingxuan Liu, Subhankar Roy, Wen**g Li, Zhun Zhong, Nicu Sebe, Elisa Ricci

    Abstract: Identifying subordinate-level categories from images is a longstanding task in computer vision and is referred to as fine-grained visual recognition (FGVR). It has tremendous significance in real-world applications since an average layperson does not excel at differentiating species of birds or mushrooms due to subtle differences among the species. A major bottleneck in develo** FGVR systems is… ▽ More

    Submitted 10 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted as a conference paper at ICLR 2024; Project page: https://projfiner.github.io/

  44. arXiv:2401.13325  [pdf, other

    cs.CV

    Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery

    Authors: Yuanpeng Tu, Zhun Zhong, Yuxi Li, Hengshuang Zhao

    Abstract: Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning, where only part of the category labels are assigned to certain training samples. Previous methods generally employ naive contrastive learning or unsupervised clustering scheme for all the samples. Nevertheless, they usually ignore the inherent critical information within th… ▽ More

    Submitted 31 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  45. arXiv:2401.11874  [pdf, other

    cs.CV

    Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

    Authors: Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo

    Abstract: Document structure analysis (aka document layout analysis) is crucial for understanding the physical layout and logical structure of documents, with applications in information retrieval, document summarization, knowledge extraction, etc. In this paper, we concentrate on Hierarchical Document Structure Analysis (HDSA) to explore hierarchical relationships within structured documents created using… ▽ More

    Submitted 28 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Submitted to Pattern Recognition

  46. arXiv:2401.10090  [pdf, other

    cs.CV

    Cross-Modality Perturbation Synergy Attack for Person Re-identification

    Authors: Yunpeng Gong, Zhun Zhong, Zhiming Luo, Yansong Qu, Rongrong Ji, Min Jiang

    Abstract: In recent years, there has been significant research focusing on addressing security concerns in single-modal person re-identification (ReID) systems that are based on RGB images. However, the safety of cross-modality scenarios, which are more commonly encountered in practical applications involving images captured by infrared cameras, has not received adequate attention. The main challenge in cro… ▽ More

    Submitted 18 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  47. arXiv:2401.09793  [pdf, other

    cs.LG

    PatchAD: A Lightweight Patch-based MLP-Mixer for Time Series Anomaly Detection

    Authors: Zhijie Zhong, Zhiwen Yu, Yiyuan Yang, Weizheng Wang, Kaixiang Yang

    Abstract: Anomaly detection in time series analysis is a pivotal task, yet it poses the challenge of discerning normal and abnormal patterns in label-deficient scenarios. While prior studies have largely employed reconstruction-based approaches, which limits the models' representational capacities. Moreover, existing deep learning-based methods are not sufficiently lightweight. Addressing these issues, we p… ▽ More

    Submitted 28 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 22 pages, 11 figures, 14 tables, Under review

  48. arXiv:2401.09232  [pdf, other

    cs.CV

    Dynamic Relation Transformer for Contextual Text Block Detection

    Authors: Jiawei Wang, Shunchi Zhang, Kai Hu, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo

    Abstract: Contextual Text Block Detection (CTBD) is the task of identifying coherent text blocks within the complexity of natural scenes. Previous methodologies have treated CTBD as either a visual relation extraction challenge within computer vision or as a sequence modeling problem from the perspective of natural language processing. We introduce a new framework that frames CTBD as a graph generation prob… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  49. arXiv:2401.09220  [pdf, other

    cs.CL

    UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents

    Authors: Kai Hu, Jiawei Wang, Weihong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo

    Abstract: Existing methods for Visual Information Extraction (VIE) from form-like documents typically fragment the process into separate subtasks, such as key information extraction, key-value pair extraction, and choice group extraction. However, these approaches often overlook the hierarchical structure of form documents, including hierarchical key-value pairs and hierarchical choice groups. To address th… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  50. arXiv:2312.16486  [pdf, other

    cs.CV cs.AI

    PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

    Authors: Guansong Lu, Yuanfan Guo, Jianhua Han, Minzhe Niu, Yihan Zeng, Songcen Xu, Zeyi Huang, Zhao Zhong, Wei Zhang, Hang Xu

    Abstract: Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges. However, their reliance on substantial computational resources and extensive data collection remains a bottleneck. On the other hand, the integration of existing diffusion models, each specialized for different controls and oper… ▽ More

    Submitted 28 December, 2023; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: 16 pages, 16 figures