Skip to main content

Showing 1–50 of 265 results for author: Feng, R

.
  1. arXiv:2407.03314  [pdf, other

    cs.CV cs.CL cs.DB

    BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

    Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, **yu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

    Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2406.10517  [pdf, other

    cs.IR cs.AI cs.LG

    ADSNet: Cross-Domain LTV Prediction with an Adaptive Siamese Network in Advertising

    Authors: Ruize Wang, Hui Xu, Ying Cheng, Qi He, Xing Zhou, Rui Feng, Wei Xu, Lei Huang, Jie Jiang

    Abstract: Advertising platforms have evolved in estimating Lifetime Value (LTV) to better align with advertisers' true performance metric. However, the sparsity of real-world LTV data presents a significant challenge to LTV predictive model(i.e., pLTV), severely limiting the their capabilities. Therefore, we propose to utilize external data, in addition to the internal data of advertising platform, to expan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to KDD 2024

  3. arXiv:2406.07410  [pdf, other

    eess.AS

    Clever Hans Effect Found in Automatic Detection of Alzheimer's Disease through Speech

    Authors: Yin-Long Liu, Rui Feng, Jia-Hong Yuan, Zhen-Hua Ling

    Abstract: We uncover an underlying bias present in the audio recordings produced from the picture description task of the Pitt corpus, the largest publicly accessible database for Alzheimer's Disease (AD) detection research. Even by solely utilizing the silent segments of these audio recordings, we achieve nearly 100% accuracy in AD detection. However, employing the same methods to other datasets and prepro… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  4. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin **, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, **g**g Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, **long Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, **gfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  5. arXiv:2406.01597  [pdf, other

    cs.CV cs.GR

    End-to-End Rate-Distortion Optimized 3D Gaussian Representation

    Authors: Henan Wang, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen

    Abstract: 3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we formulate the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that can achieve flexible… ▽ More

    Submitted 9 April, 2024; originally announced June 2024.

  6. arXiv:2405.16980  [pdf, other

    cs.CV eess.IV

    DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking

    Authors: Hongtao Wang, Rongyu Feng, Liangyi Wu, Mutian Liu, Yinuo Cui, Chunxia Zhang, Zhenbo Guo

    Abstract: In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based pi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  7. arXiv:2405.14735  [pdf

    physics.optics

    Generalized all-optical complex exponential operator

    Authors: Baiqiao Chen, Qi Jia, Rui Feng, Fangkui Sun, Yongyin Cao, Jian Wang, Weiqiang Ding

    Abstract: Euler's formula, an extraordinary mathematical formula, establishes a vital link between complex-valued operations and trigonometric functions, finding widespread application in various fields. With the end of Moore's Law, electronic computing methods are encountering developmental bottlenecks. With its enviable potential, optical computing has successfully achieved high-speed operation of designe… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 17 pages, 4 figures, 1 table

  8. arXiv:2405.09786  [pdf, other

    cs.LG cs.CR

    IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency

    Authors: Linshan Hou, Ruili Feng, Zhongyun Hua, Wei Luo, Leo Yu Zhang, Yiming Li

    Abstract: Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries can maliciously trigger model misclassifications by implanting a hidden backdoor during model training. This paper proposes a simple yet effective input-level backdoor detection (dubbed IBD-PSC) as a `firewall' to filter out malicious testing images. Our method is motivated by an intriguing phenomenon, i.e., paramete… ▽ More

    Submitted 2 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024, 31 pages

  9. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  10. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu **, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huan**g Yue, **gyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  11. arXiv:2404.17433  [pdf, other

    cs.CV

    PromptCIR: Blind Compressed Image Restoration with Prompt Learning

    Authors: Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

    Abstract: Blind Compressed Image Restoration (CIR) has garnered significant attention due to its practical applications. It aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing works on blind CIR often seek assistance from a quality factor prediction network to facilitate their network to restore compressed images. However, the predicted numerical… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Winner of NTIRE 2024 Blind Compressed Image Enhancement Challenge

  12. arXiv:2404.09599  [pdf, other

    cs.CR

    Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation

    Authors: Shangqing Liu, Wei Ma, Jian Wang, Xiaofei Xie, Ruitao Feng, Yang Liu

    Abstract: Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task for example determining whether it is vulnerable or not. This poses a challenge for a single deep learning-based model to effectively lea… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  13. arXiv:2404.05169  [pdf, other

    cs.CV

    QMix: Quality-aware Learning with Mixed Noise for Robust Retinal Disease Diagnosis

    Authors: Junlin Hou, Jilan Xu, Rui Feng, Hao Chen

    Abstract: Due to the complexity of medical image acquisition and the difficulty of annotation, medical image datasets inevitably contain noise. Noisy data with wrong labels affects the robustness and generalization ability of deep neural networks. Previous noise learning methods mainly considered noise arising from images being mislabeled, i.e. label noise, assuming that all mislabeled images are of high im… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  14. arXiv:2404.02710  [pdf, other

    cs.CL eess.AS

    ART: The Alternating Reading Task Corpus for Speech Entrainment and Imitation

    Authors: Zheng Yuan, Dorina de Jong, Štefan Beňuš, Noël Nguyen, Ruitao Feng, Róbert Sabo, Luciano Fadiga, Alessandro D`Ausilio

    Abstract: We introduce the Alternating Reading Task (ART) Corpus, a collection of dyadic sentence reading for studying the entrainment and imitation behaviour in speech communication. The ART corpus features three experimental conditions - solo reading, alternating reading, and deliberate imitation - as well as three sub-corpora encompassing French-, Italian-, and Slovak-accented English. This design allows… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 15 pages, 2 figures, 7 tables, accepted at LREC-COLING 2024 conference

  15. FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion

    Authors: Qi Guo, Xiaohong Li, Xiaofei Xie, Shangqing Liu, Ze Tang, Ruitao Feng, Junjie Wang, Jidong Ge, Lei Bu

    Abstract: The rise of code pre-trained models has significantly enhanced various coding tasks, such as code completion, and tools like GitHub Copilot. However, the substantial size of these models, especially large models, poses a significant challenge when it comes to fine-tuning them for specific downstream tasks. As an alternative approach, retrieval-based methods have emerged as a promising solution, au… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: ISSTA 2024

  16. arXiv:2404.00964  [pdf, other

    cs.CV

    S2RC-GCN: A Spatial-Spectral Reliable Contrastive Graph Convolutional Network for Complex Land Cover Classification Using Hyperspectral Images

    Authors: Renxiang Guan, Zihao Li, Chujia Song, Guo Yu, Xianju Li, Ruyi Feng

    Abstract: Spatial correlations between different ground objects are an important feature of mining land cover research. Graph Convolutional Networks (GCNs) can effectively capture such spatial feature representations and have demonstrated promising results in performing hyperspectral imagery (HSI) classification tasks of complex land. However, the existing GCN-based HSI classification methods are prone to i… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to IJCNN 2024 (International Joint Conference on Neural Networks)

  17. arXiv:2403.13228  [pdf, ps, other

    math.RA math.CA

    Hilbert's Irreducibility Theorem for Linear Differential Operators

    Authors: Ruyong Feng, Zewang Guo, Wei Lu

    Abstract: We prove a differential analogue of Hilbert's irreducibility theorem. Let $\mathcal{L}$ be a linear differential operator with coefficients in $C(\mathbb{X})(x)$ that is irreducible over $\overline{C(\mathbb{X})}(x)$, where $\mathbb{X}$ is an irreducible affine algebraic variety over an algebraically closed field $C$ of characteristic zero. We show that the set of $c\in \mathbb{X}(C)$ such that th… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    MSC Class: 16S32; 68W30

  18. arXiv:2403.11953  [pdf, other

    eess.IV cs.CV

    Advancing COVID-19 Detection in 3D CT Scans

    Authors: Qingqiu Li, Runtian Yuan, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

    Abstract: To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior kno… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  19. arXiv:2403.11498  [pdf, other

    eess.IV cs.CV

    Domain Adaptation Using Pseudo Labels for COVID-19 Detection

    Authors: Runtian Yuan, Qingqiu Li, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen

    Abstract: In response to the need for rapid and accurate COVID-19 diagnosis during the global pandemic, we present a two-stage framework that leverages pseudo labels for domain adaptation to enhance the detection of COVID-19 from CT scans. By utilizing annotated data from one domain and non-annotated data from another, the model overcomes the challenge of data scarcity and variability, common in emergent he… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  20. arXiv:2403.09294  [pdf, other

    cs.CV cs.CL

    Anatomical Structure-Guided Medical Vision-Language Pre-training

    Authors: Qingqiu Li, Xiaohan Yan, Jilan Xu, Runtian Yuan, Yuejie Zhang, Rui Feng, Quanli Shen, Xiaobo Zhang, Shujun Wang

    Abstract: Learning medical visual representations through vision-language pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (A… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  21. arXiv:2402.19387  [pdf, other

    eess.IV cs.CV

    SeD: Semantic-Aware Discriminator for Image Super-Resolution

    Authors: Bingchen Li, Xin Li, Hanxin Zhu, Yeying **, Ruoyu Feng, Zhizheng Zhang, Zhibo Chen

    Abstract: Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner. However, the distribution learning is overly coarse-grained, which is susceptible to virtual textures and caus… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR2024

  22. arXiv:2402.18180  [pdf, other

    cs.CY

    Human Simulacra: Benchmarking the Personification of Large Language Models

    Authors: Qiuejie Xie, Qiming Feng, Tianqi Zhang, Qingqiu Li, Linyi Yang, Yuejie Zhang, Rui Feng, Liang He, Shang Gao, Yue Zhang

    Abstract: Large language models (LLMs) are recognized as systems that closely mimic aspects of human intelligence. This capability has attracted attention from the social science community, who see the potential in leveraging LLMs to replace human participants in experiments, thereby reducing research costs and complexity. In this paper, we introduce a framework for large language models personification, in… ▽ More

    Submitted 9 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  23. arXiv:2402.14983  [pdf, other

    cs.LG cs.CR q-fin.RM

    Privacy-Enhancing Collaborative Information Sharing through Federated Learning -- A Case of the Insurance Industry

    Authors: Panyi Dong, Zhiyu Quan, Brandon Edwards, Shih-han Wang, Runhuan Feng, Tianyang Wang, Patrick Foley, Prashant Shah

    Abstract: The report demonstrates the benefits (in terms of improved claims loss modeling) of harnessing the value of Federated Learning (FL) to learn a single model across multiple insurance industry datasets without requiring the datasets themselves to be shared from one company to another. The application of FL addresses two of the most pressing concerns: limited data volume and data variety, which are c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  24. arXiv:2402.04684  [pdf, ps, other

    math.CO cs.SC

    Parallel Summation in P-Recursive Extensions

    Authors: Shaoshi Chen, Ruyong Feng, Manuel Kauers, Xiuyun Li

    Abstract: We propose investigating a summation analog of the paradigm for parallel integration. We make some first steps towards an indefinite summation method applicable to summands that rationally depend on the summation index and a P-recursive sequence and its shifts. There is a distinction between so-called normal and so-called special polynomials. Under the assumption that the corresponding difference… ▽ More

    Submitted 7 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  25. arXiv:2401.13959  [pdf, other

    eess.IV cs.CV

    Conditional Neural Video Coding with Spatial-Temporal Super-Resolution

    Authors: Henan Wang, Xiaohan Pan, Runsen Feng, Zongyu Guo, Zhibo Chen

    Abstract: This document is an expanded version of a one-page abstract originally presented at the 2024 Data Compression Conference. It describes our proposed method for the video track of the Challenge on Learned Image Compression (CLIC) 2024. Our scheme follows the typical hybrid coding framework with some novel techniques. Firstly, we adopt Spynet network to produce accurate motion vectors for motion esti… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted by the 2024 Data Compression Conference (DCC) for presentation as a poster

  26. arXiv:2401.06166  [pdf

    q-bio.BM cs.AI cs.LG

    AdaMR: Adaptable Molecular Representation for Unified Pre-training Strategy

    Authors: Yan Ding, Hao Cheng, Ziliang Ye, Ruyi Feng, Wei Tian, Peng Xie, Juan Zhang, Zhongze Gu

    Abstract: We propose Adjustable Molecular Representation (AdaMR), a new large-scale uniform pre-training strategy for small-molecule drugs, as a novel unified pre-training strategy. AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization, setting it apart from recent large-scale molecular models. This adaptabilit… ▽ More

    Submitted 27 April, 2024; v1 submitted 28 December, 2023; originally announced January 2024.

  27. arXiv:2401.02686  [pdf, other

    cs.CR cs.LG cs.SE

    Beyond Fidelity: Explaining Vulnerability Localization of Learning-based Detectors

    Authors: Baijun Cheng, Shengming Zhao, Kailong Wang, Meizhen Wang, Guangdong Bai, Ruitao Feng, Yao Guo, Lei Ma, Haoyu Wang

    Abstract: Vulnerability detectors based on deep learning (DL) models have proven their effectiveness in recent years. However, the shroud of opacity surrounding the decision-making process of these detectors makes it difficult for security analysts to comprehend. To address this, various explanation approaches have been proposed to explain the predictions by highlighting important features, which have been… ▽ More

    Submitted 21 February, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted by Tosem

  28. arXiv:2401.00789  [pdf, other

    cs.CV

    Retrieval-Augmented Egocentric Video Captioning

    Authors: Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie

    Abstract: Understanding human actions from videos of first-person view poses significant challenges. Most prior approaches explore representation learning on egocentric videos only, while overlooking the potential benefit of exploiting existing large-scale third-person videos. In this paper, (1) we develop EgoInstructor, a retrieval-augmented multimodal captioning model that automatically retrieves semantic… ▽ More

    Submitted 19 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: CVPR 2024. Project page is available at: https://jazzcharles.github.io/Egoinstructor/

  29. arXiv:2312.15674  [pdf, other

    cs.MA

    Multi-Task Multi-Agent Shared Layers are Universal Cognition of Multi-Agent Coordination

    Authors: Jiawei Wang, Jian Zhao, Zhengtao Cao, Ruili Feng, Rongjun Qin, Yang Yu

    Abstract: Multi-agent reinforcement learning shines as the pinnacle of multi-agent systems, conquering intricate real-world challenges, fostering collaboration and coordination among agents, and unleashing the potential for intelligent decision-making across domains. However, training a multi-agent reinforcement learning network is a formidable endeavor, demanding substantial computational resources to inte… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  30. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  31. arXiv:2312.11521  [pdf, other

    cs.CL cs.AI

    Large Language Models are Complex Table Parsers

    Authors: Bowen Zhao, Changkai Ji, Yuejie Zhang, Wen He, Yingwen Wang, Qing Wang, Rui Feng, Xiaobo Zhang

    Abstract: With the Generative Pre-trained Transformer 3.5 (GPT-3.5) exhibiting remarkable reasoning and comprehension abilities in Natural Language Processing (NLP), most Question Answering (QA) research has primarily centered around general QA tasks based on GPT, neglecting the specific challenges posed by Complex Table QA. In this paper, we propose to incorporate GPT-3.5 to address such challenges, in whi… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: EMNLP 2023 Main

  32. arXiv:2312.06068  [pdf, other

    cs.CV cs.AI

    Contrastive Multi-view Subspace Clustering of Hyperspectral Images based on Graph Convolutional Networks

    Authors: Renxiang Guan, Zihao Li, Xianju Li, Chang Tang, Ruyi Feng

    Abstract: High-dimensional and complex spectral structures make the clustering of hyperspectral images (HSI) a challenging task. Subspace clustering is an effective approach for addressing this problem. However, current subspace clustering algorithms are primarily designed for a single view and do not fully exploit the spatial or textural feature information in HSI. In this study, contrastive multi-view sub… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  33. arXiv:2312.02684  [pdf, other

    cs.CV cs.LG cs.RO

    DeepPointMap: Advancing LiDAR SLAM with Unified Neural Descriptors

    Authors: Xiaze Zhang, Ziheng Ding, Qi **g, Yuejie Zhang, Wenchao Ding, Rui Feng

    Abstract: Point clouds have shown significant potential in various domains, including Simultaneous Localization and Map** (SLAM). However, existing approaches either rely on dense point clouds to achieve high localization accuracy or use generalized descriptors to reduce map size. Unfortunately, these two aspects seem to conflict with each other. To address this limitation, we propose a unified architectu… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  34. arXiv:2312.01454  [pdf, other

    cs.DB cs.AI cs.CL cs.LG

    D-Bot: Database Diagnosis System using Large Language Models

    Authors: Xuanhe Zhou, Guoliang Li, Zhaoyan Sun, Zhiyuan Liu, Weize Chen, Jianming Wu, Jiesi Liu, Ruohang Feng, Guoyang Zeng

    Abstract: Database administrators (DBAs) play an important role in managing, maintaining and optimizing database systems. However, it is hard and tedious for DBAs to manage a large number of databases and give timely response (waiting for hours is intolerable in many online cases). In addition, existing empirical methods only support limited diagnosis scenarios, which are also labor-intensive to update the… ▽ More

    Submitted 5 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

  35. arXiv:2312.00568  [pdf, ps, other

    eess.SP

    A WINNER+ Based 3-D Non-Stationary Wideband MIMO Channel Model

    Authors: Ji Bian, Jian Sun, Cheng-Xiang Wang, Rui Feng, Jie Huang, Yang Yang, Minggao Zhang

    Abstract: In this paper, a three-dimensional (3-D) non-stationary wideband multiple-input multiple-output (MIMO) channel model based on the WINNER+ channel model is proposed. The angular distributions of clusters in both the horizontal and vertical planes are jointly considered. The receiver and clusters can be moving, which makes the model more general. Parameters including number of clusters, powers, dela… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

  36. arXiv:2311.18834  [pdf, other

    cs.CV

    ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

    Authors: Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

    Abstract: We present ART$\boldsymbol{\cdot}$V, an efficient framework for auto-regressive video generation with diffusion models. Unlike existing methods that generate entire videos in one-shot, ART$\boldsymbol{\cdot}$V generates a single frame at a time, conditioned on the previous ones. The framework offers three distinct advantages. First, it only learns simple continual motions between adjacent frames,… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 24 pages, 21 figures. Project page at https://warranweng.github.io/art.v

  37. arXiv:2311.18829  [pdf, other

    cs.CV

    MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

    Authors: Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, **gxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

    Abstract: We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy which divides the text-to-video into a two-stage process: text-to-image generation and image\&text-to-video generation. This strategy offers two signific… ▽ More

    Submitted 29 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Project page: https://wangyanhui666.github.io/MicroCinema.github.io/

  38. arXiv:2311.14934  [pdf, other

    cs.LG

    Robust Graph Neural Networks via Unbiased Aggregation

    Authors: Ruiqi Feng, Zhichao Hou, Tyler Derr, Xiaorui Liu

    Abstract: The adversarial robustness of Graph Neural Networks (GNNs) has been questioned due to the false sense of security uncovered by strong adaptive attacks despite the existence of numerous defenses. In this work, we delve into the robustness analysis of representative robust GNNs and provide a unified robust estimation point of view to understand their robustness and limitations. Our novel analysis of… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  39. arXiv:2311.12892  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    IMJENSE: Scan-specific Implicit Representation for Joint Coil Sensitivity and Image Estimation in Parallel MRI

    Authors: Ruimin Feng, Qing Wu, Jie Feng, Huajun She, Chunlei Liu, Yuyao Zhang, Hongjiang Wei

    Abstract: Parallel imaging is a commonly used technique to accelerate magnetic resonance imaging (MRI) data acquisition. Mathematically, parallel MRI reconstruction can be formulated as an inverse problem relating the sparsely sampled k-space measurements to the desired MRI image. Despite the success of many existing reconstruction algorithms, it remains a challenge to reliably reconstruct a high-quality im… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  40. arXiv:2311.05897  [pdf, ps, other

    cs.SC math.CA math.DS

    Stability Problems on D-finite Functions

    Authors: Shaoshi Chen, Ruyong Feng, Zewang Guo, Wei Lu

    Abstract: This paper continues the studies of symbolic integration by focusing on the stability problems on D-finite functions. We introduce the notion of stability index in order to investigate the order growth of the differential operators satisfied by iterated integrals of D-finite functions and determine bounds and exact formula for stability indices of several special classes of differential operators.… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 9 pages

    MSC Class: 12H05; 37P15; 33F10 ACM Class: I.1.2

    Journal ref: Proceedings of ISSAC'23,2023

  41. arXiv:2311.00399  [pdf, other

    cs.CV cs.CL

    Enhanced Knowledge Injection for Radiology Report Generation

    Authors: Qingqiu Li, Jilan Xu, Runtian Yuan, Mohan Chen, Yuejie Zhang, Rui Feng, Xiaobo Zhang, Shang Gao

    Abstract: Automatic generation of radiology reports holds crucial clinical value, as it can alleviate substantial workload on radiologists and remind less experienced ones of potential anomalies. Despite the remarkable performance of various image captioning methods in the natural image field, generating accurate reports for medical images still faces challenges, i.e., disparities in visual and textual data… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted by BIBM 2023

  42. arXiv:2310.15200  [pdf, other

    cs.CV

    Open-Set Image Tagging with Multi-Grained Text Supervision

    Authors: Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang

    Abstract: In this paper, we introduce the Recognize Anything Plus Model (RAM++), an open-set image tagging model effectively leveraging multi-grained text supervision. Previous approaches (e.g., CLIP) primarily utilize global text supervision paired with images, leading to sub-optimal performance in recognizing multiple individual semantic tags. In contrast, RAM++ seamlessly integrates individual tag superv… ▽ More

    Submitted 16 November, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Homepage: https://github.com/xinyu1205/recognize-anything

  43. arXiv:2310.09625  [pdf, other

    eess.IV cs.CV

    JSMoCo: Joint Coil Sensitivity and Motion Correction in Parallel MRI with a Self-Calibrating Score-Based Diffusion Model

    Authors: Lixuan Chen, Xuanyu Tian, Jiangjie Wu, Ruimin Feng, Guoyan Lao, Yuyao Zhang, Hongjiang Wei

    Abstract: Magnetic Resonance Imaging (MRI) stands as a powerful modality in clinical diagnosis. However, it is known that MRI faces challenges such as long acquisition time and vulnerability to motion-induced artifacts. Despite the success of many existing motion correction algorithms, there has been limited research focused on correcting motion artifacts on the estimated coil sensitivity maps for fast MRI… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: 10 pages,8 figures, journal

  44. arXiv:2309.16496  [pdf, other

    cs.CV

    CCEdit: Creative and Controllable Video Editing via Diffusion Models

    Authors: Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

    Abstract: In this paper, we present CCEdit, a versatile generative video editing framework based on diffusion models. Our approach employs a novel trident network structure that separates structure and appearance control, ensuring precise and creative editing capabilities. Utilizing the foundational ControlNet architecture, we maintain the structural integrity of the video during editing. The incorporation… ▽ More

    Submitted 6 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

  45. arXiv:2309.12677  [pdf

    cs.AI physics.data-an

    TrTr: A Versatile Pre-Trained Large Traffic Model based on Transformer for Capturing Trajectory Diversity in Vehicle Population

    Authors: Ruyi Feng, Zhibin Li, Bowen Liu, Yan Ding

    Abstract: Understanding trajectory diversity is a fundamental aspect of addressing practical traffic tasks. However, capturing the diversity of trajectories presents challenges, particularly with traditional machine learning and recurrent neural networks due to the requirement of large-scale parameters. The emerging Transformer technology, renowned for its parallel computation capabilities enabling the util… ▽ More

    Submitted 30 November, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

    Comments: 16 pages, 6 figures, work in update

  46. arXiv:2309.06255  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Enhancing multimodal cooperation via sample-level modality valuation

    Authors: Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu

    Abstract: One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities. However most models often suffer from unsatisfactory multimodal cooperation which cannot jointly utilize all modalities well. Some methods are proposed to identify and enhance the worse learnt modality but they are often hard to provide the fine-grained observation of multimodal… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: Accepted by CVPR 2024

  47. arXiv:2309.00585  [pdf, other

    cs.LG cond-mat.soft

    PolyGET: Accelerating Polymer Simulations by Accurate and Generalizable Forcefield with Equivariant Transformer

    Authors: Rui Feng, Huan Tran, Aubrey Toland, Binghong Chen, Qi Zhu, Rampi Ramprasad, Chao Zhang

    Abstract: Polymer simulation with both accuracy and efficiency is a challenging task. Machine learning (ML) forcefields have been developed to achieve both the accuracy of ab initio methods and the efficiency of empirical force fields. However, existing ML force fields are usually limited to single-molecule settings, and their simulations are not robust enough. In this paper, we present PolyGET, a new frame… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  48. arXiv:2308.14759  [pdf, other

    physics.chem-ph cs.AI cs.LG q-bio.BM

    May the Force be with You: Unified Force-Centric Pre-Training for 3D Molecular Conformations

    Authors: Rui Feng, Qi Zhu, Huan Tran, Binghong Chen, Aubrey Toland, Rampi Ramprasad, Chao Zhang

    Abstract: Recent works have shown the promise of learning pre-trained models for 3D molecular representation. However, existing pre-training models focus predominantly on equilibrium data and largely overlook off-equilibrium conformations. It is challenging to extend these methods to off-equilibrium data because their training objective relies on assumptions of conformations being the local energy minima. W… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  49. arXiv:2307.16687  [pdf, other

    cs.CV

    DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation

    Authors: Runyang Feng, Yixing Gao, Tze Ho Elden Tse, Xueqing Ma, Hyung ** Chang

    Abstract: Denoising diffusion probabilistic models that were initially proposed for realistic image generation have recently shown success in various perception tasks (e.g., object detection and image segmentation) and are increasingly gaining attention in computer vision. However, extending such models to multi-frame human pose estimation is non-trivial due to the presence of the additional temporal dimens… ▽ More

    Submitted 5 August, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: This paper is accepted to ICCV 2023

  50. arXiv:2307.16331  [pdf, other

    cs.LG cs.CR

    Theoretically Principled Trade-off for Stateful Defenses against Query-Based Black-Box Attacks

    Authors: Ashish Hooda, Neal Mangaokar, Ryan Feng, Kassem Fawaz, Somesh Jha, Atul Prakash

    Abstract: Adversarial examples threaten the integrity of machine learning systems with alarming success rates even under constrained black-box conditions. Stateful defenses have emerged as an effective countermeasure, detecting potential attacks by maintaining a buffer of recent queries and detecting new queries that are too similar. However, these defenses fundamentally pose a trade-off between attack dete… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

    Comments: 2nd AdvML Frontiers Workshop at ICML 2023