Skip to main content

Showing 1–50 of 1,647 results for author: Liu, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01492  [pdf, other

    cs.CL cs.AI

    RegMix: Data Mixture as Regression for Language Model Pre-training

    Authors: Qian Liu, Xiaosen Zheng, Niklas Muennighoff, Guangtao Zeng, Longxu Dou, Tianyu Pang, **g Jiang, Min Lin

    Abstract: The data mixture for large language model pre-training significantly impacts performance, yet how to determine an effective mixture remains unclear. We propose RegMix to automatically identify a high-performing data mixture by formulating it as a regression task. RegMix involves training a set of small models with diverse data mixtures and fitting a regression model to predict their performance gi… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00944  [pdf

    cs.CV

    Diffusion Transformer Model With Compact Prior for Low-dose PET Reconstruction

    Authors: Bin Huang, Xubiao Liu, Lei Fang, Qiegen Liu, Bingxuan Li

    Abstract: Positron emission tomography (PET) is an advanced medical imaging technique that plays a crucial role in non-invasive clinical diagnosis. However, while reducing radiation exposure through low-dose PET scans is beneficial for patient safety, it often results in insufficient statistical data. This scarcity of data poses significant challenges for accurately reconstructing high-quality images, which… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2407.00596  [pdf, other

    eess.IV cs.CV

    HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

    Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

    Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  4. arXiv:2407.00033  [pdf, other

    q-bio.NC cs.AI

    Uncovering cognitive taskonomy through transfer learning in masked autoencoder-based fMRI reconstruction

    Authors: Youzhi Qu, Junfeng Xia, Xinyao Jian, Wendu Li, Kaining Peng, Zhichao Liang, Haiyan Wu, Quanying Liu

    Abstract: Data reconstruction is a widely used pre-training task to learn the generalized features for many downstream tasks. Although reconstruction tasks have been applied to neural signal completion and denoising, neural signal reconstruction is less studied. Here, we employ the masked autoencoder (MAE) model to reconstruct functional magnetic resonance imaging (fMRI) data, and utilize a transfer learnin… ▽ More

    Submitted 24 May, 2024; originally announced July 2024.

  5. arXiv:2406.19540  [pdf, other

    cs.CV

    Weighted Circle Fusion: Ensembling Circle Representation from Different Object Detection Results

    Authors: Jialin Yue, Tianyuan Yao, Ruining Deng, Quan Liu, Juming Xiong, Haichun Yang, Yuankai Huo

    Abstract: Recently, the use of circle representation has emerged as a method to improve the identification of spherical objects (such as glomeruli, cells, and nuclei) in medical imaging studies. In traditional bounding box-based object detection, combining results from multiple models improves accuracy, especially when real-time processing isn't crucial. Unfortunately, this widely adopted strategy is not re… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  6. arXiv:2406.17777  [pdf, other

    cs.CV

    Text-Animator: Controllable Visual Text Video Generation

    Authors: Lin Liu, Quande Liu, Shengju Qian, Yuan Zhou, Wengang Zhou, Houqiang Li, Lingxi Xie, Qi Tian

    Abstract: Video generation is a challenging yet pivotal task in various industries, such as gaming, e-commerce, and advertising. One significant unresolved aspect within T2V is the effective visualization of text within generated videos. Despite the progress achieved in Text-to-Video~(T2V) generation, current methods still cannot effectively visualize texts in videos directly, as they mainly focus on summar… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Project Page: https://laulampaul.github.io/text-animator.html

  7. arXiv:2406.16330  [pdf, other

    cs.CL cs.AI

    Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging

    Authors: Deyuan Liu, Zhanyue Qin, Hairu Wang, Zhao Yang, Zecheng Wang, Fangying Rong, Qingbin Liu, Yanchao Hao, Xi Chen, Cunhang Fan, Zhao Lv, Zhiying Tu, Dianhui Chu, Bo Li, Dianbo Sui

    Abstract: While large language models (LLMs) excel in many domains, their complexity and scale challenge deployment in resource-limited environments. Current compression techniques, such as parameter pruning, often fail to effectively utilize the knowledge from pruned parameters. To address these challenges, we propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA), a novel approach… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  8. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  9. arXiv:2406.16144  [pdf, other

    cs.CL

    Chain-of-Probe: Examing the Necessity and Accuracy of CoT Step-by-Step

    Authors: Zezhong Wang, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Current research found the issue of Early Answering in large language models (LLMs), where the models already have an answer before generating the Chain-of-Thought (CoT). This phenomenon suggests a potential lack of necessary dependency between the predicted answer and the reasoning process. Consequently, two important questions arise: (1) Is CoT still necessary if the model already has an answer?… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  10. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

  11. arXiv:2406.15766  [pdf, ps, other

    cs.LG

    Continual Learning with Diffusion-based Generative Replay for Industrial Streaming Data

    Authors: Jiayi He, Jiao Chen, Qianmiao Liu, Suyan Dai, Jianhua Tang, Dongpo Liu

    Abstract: The Industrial Internet of Things (IIoT) integrates interconnected sensors and devices to support industrial applications, but its dynamic environments pose challenges related to data drift. Considering the limited resources and the need to effectively adapt models to new data distributions, this paper introduces a Continual Learning (CL) approach, i.e., Distillation-based Self-Guidance (DSG), to… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 2024 IEEE/CIC International Conference on Communications in China (ICCC)

  12. arXiv:2406.15471  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Large Models with Small models: Lower Costs and Better Performance

    Authors: Dong Chen, Shuo Zhang, Yueting Zhuang, Siliang Tang, Qidong Liu, Hua Wang, Mingliang Xu

    Abstract: Pretrained large models (PLMs), such as ChatGPT, have demonstrated remarkable performance across diverse tasks. However, the significant computational requirements of PLMs have discouraged most product teams from running or fine-tuning them. In such cases, to harness the exceptional performance of PLMs, one must rely on expensive APIs, thereby exacerbating the economic burden. Despite the overall… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 11 pages

  13. arXiv:2406.15177  [pdf, other

    cs.MM

    EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot

    Authors: Hao Fei, Han Zhang, Bin Wang, Lizi Liao, Qian Liu, Erik Cambria

    Abstract: This paper introduces EmpathyEar, a pioneering open-source, avatar-based multimodal empathetic chatbot, to fill the gap in traditional text-only empathetic response generation (ERG) systems. Leveraging the advancements of a large language model, combined with multimodal encoders and generators, EmpathyEar supports user inputs in any combination of text, sound, and vision, and produces multimodal e… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Demonstration Paper

  14. arXiv:2406.14848  [pdf, other

    cs.CL cs.IR

    Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models

    Authors: Qi Liu, Bo Wang, Nan Wang, Jiaxin Mao

    Abstract: Recent studies have demonstrated the effectiveness of using large language language models (LLMs) in passage ranking. The listwise approaches, such as RankGPT, have become new state-of-the-art in this task. However, the efficiency of RankGPT models is limited by the maximum context length and relatively high latency of LLM inference. To address these issues, in this paper, we propose PE-Rank, leve… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  15. arXiv:2406.14393  [pdf, other

    cs.LG cs.CL

    Jailbreaking as a Reward Misspecification Problem

    Authors: Zhihui Xie, Jiahui Gao, Lei Li, Zhenguo Li, Qi Liu, Lingpeng Kong

    Abstract: The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and d… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  16. arXiv:2406.13498  [pdf, other

    cs.CV

    Semantic Enhanced Few-shot Object Detection

    Authors: Zheng Wang, Yingjie Gao, Qingjie Liu, Yunhong Wang

    Abstract: Few-shot object detection~(FSOD), which aims to detect novel objects with limited annotated instances, has made significant progress in recent years. However, existing methods still suffer from biased representations, especially for novel classes in extremely low-shot scenarios. During fine-tuning, a novel class may exploit knowledge from similar base classes to construct its own feature distribut… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by ICIP 2024

  17. arXiv:2406.12921  [pdf, other

    cs.LG

    WindowMixer: Intra-Window and Inter-Window Modeling for Time Series Forecasting

    Authors: Quangao Liu, Ruiqi Li, Maowei Jiang, Wei Yang, Chen Liang, LongLong Pang, Zhuozhang Zou

    Abstract: Time series forecasting (TSF) is crucial in fields like economic forecasting, weather prediction, traffic flow analysis, and public health surveillance. Real-world time series data often include noise, outliers, and missing values, making accurate forecasting challenging. Traditional methods model point-to-point relationships, which limits their ability to capture complex temporal patterns and inc… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  18. arXiv:2406.12726  [pdf, other

    cs.SD cs.AI eess.AS

    ED-sKWS: Early-Decision Spiking Neural Networks for Rapid,and Energy-Efficient Keyword Spotting

    Authors: Zeyang Song, Qianhui Liu, Qu Yang, Yizhou Peng, Haizhou Li

    Abstract: Keyword Spotting (KWS) is essential in edge computing requiring rapid and energy-efficient responses. Spiking Neural Networks (SNNs) are well-suited for KWS for their efficiency and temporal capacity for speech. To further reduce the latency and energy consumption, this study introduces ED-sKWS, an SNN-based KWS model with an early-decision mechanism that can stop speech processing and output the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  19. arXiv:2406.12359  [pdf, other

    cs.LG cs.AI

    Memory Sequence Length of Data Sampling Impacts the Adaptation of Meta-Reinforcement Learning Agents

    Authors: Menglong Zhang, Fuyuan Qian, Quanying Liu

    Abstract: Fast adaptation to new tasks is extremely important for embodied agents in the real world. Meta-reinforcement learning (meta-RL) has emerged as an effective method to enable fast adaptation in unknown environments. Compared to on-policy meta-RL algorithms, off-policy algorithms rely heavily on efficient data sampling strategies to extract and represent the historical trajectories. However, little… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  20. arXiv:2406.11678  [pdf, other

    cs.IR cs.CL

    TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy

    Authors: Yiqun Chen, Qi Liu, Yi Zhang, Weiwei Sun, Daiting Shi, Jiaxin Mao, Dawei Yin

    Abstract: Large Language Models (LLMs) are increasingly employed in zero-shot documents ranking, yielding commendable results. However, several significant challenges still persist in LLMs for ranking: (1) LLMs are constrained by limited input length, precluding them from processing a large number of documents simultaneously; (2) The output document sequence is influenced by the input order of documents, re… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  21. arXiv:2406.11008  [pdf, other

    quant-ph cs.CR

    Unclonable Secret Sharing

    Authors: Prabhanjan Ananth, Vipul Goyal, Jiahui Liu, Qipeng Liu

    Abstract: Unclonable cryptography utilizes the principles of quantum mechanics to addresses cryptographic tasks that are impossible classically. We introduce a novel unclonable primitive in the context of secret sharing, called unclonable secret sharing (USS). In a USS scheme, there are $n$ shareholders, each holding a share of a classical secret represented as a quantum state. They can recover the secret o… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  22. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  23. arXiv:2406.10393  [pdf, other

    cs.CL

    EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems

    Authors: Mohammad Dehghan, Mohammad Ali Alomrani, Sunyam Bagga, David Alfonso-Hermelo, Khalil Bibi, Abbas Ghaddar, Yingxue Zhang, Xiaoguang Li, Jianye Hao, Qun Liu, Jimmy Lin, Boxing Chen, Prasanna Parthasarathi, Mahdi Biparva, Mehdi Rezagholizadeh

    Abstract: The emerging citation-based QA systems are gaining more attention especially in generative AI search applications. The importance of extracted knowledge provided to these systems is vital from both accuracy (completeness of information) and efficiency (extracting the information in a timely manner). In this regard, citation-based QA systems are suffering from two shortcomings. First, they usually… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  24. arXiv:2406.10278  [pdf, other

    cs.CL cs.AI

    Prompt-Based Length Controlled Generation with Multiple Control Types

    Authors: Renlong Jie, Xiaojun Meng, Lifeng Shang, Xin Jiang, Qun Liu

    Abstract: Large language models (LLMs) have attracted great attention given their strong performance on a wide range of NLP tasks. In practice, users often expect generated texts to fall within a specific length range, making length controlled generation an important topic, especially for GPT-style models. Existing length control methods mostly focus on a simple control type of "equal to" a target length. D… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 findings. arXiv admin note: text overlap with arXiv:2308.12030

  25. arXiv:2406.09973  [pdf, other

    cs.CV

    InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning

    Authors: Tiancheng Li, **xiu Liu, Huajun Chen, Qi Liu

    Abstract: Instruction-based image editing has made a great process in using natural human language to manipulate the visual content of images. However, existing models are limited by the quality of the dataset and cannot accurately localize editing regions in images with complex object relationships. In this paper, we propose Reinforcement Learning Guided Image Editing Method(InstructRL4Pix) to train a diff… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  26. arXiv:2406.09958  [pdf, other

    cs.LG

    H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent

    Authors: Son Nguyen, Lizhang Chen, Bo Liu, Qiang Liu

    Abstract: In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a factorized approach to momentum and scaling parameters. Our algorithm demonstrates competitive performances on both ResNets and Vision Transformers, while achieving sublinear memory costs through the use of rank-1 parameterizations for moment estimators. We develop our algorithms based on principles derived from Ha… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 21 pages, 4 figures

  27. arXiv:2406.09760  [pdf, other

    cs.CL cs.LG

    Bootstrap** Language Models with DPO Implicit Rewards

    Authors: Changyu Chen, Zichen Liu, Chao Du, Tianyu Pang, Qian Liu, Arunesh Sinha, Pradeep Varakantham, Min Lin

    Abstract: Human alignment in large language models (LLMs) is an active area of research. A recent groundbreaking work, direct preference optimization (DPO), has greatly simplified the process from past work in reinforcement learning from human feedback (RLHF) by bypassing the reward learning stage in RLHF. DPO, after training, provides an implicit reward model. In this work, we make a novel observation that… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  28. arXiv:2406.09613  [pdf, other

    cs.CV

    ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

    Authors: Wufei Ma, Guanning Zeng, Guofeng Zhang, Qihao Liu, Letian Zhang, Adam Kortylewski, Yaoyao Liu, Alan Yuille

    Abstract: A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e.g., class name and bounding box) and 3D information (e.g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images. This is a challenging task, as it involves inferring 3D information from 2D signals and most importantly, generalizing to rigid objects from unseen categori… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  29. arXiv:2406.09601  [pdf, other

    cs.CV

    Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos

    Authors: Qingyuan Liu, Pengyuan Shi, Yun-Yun Tsai, Chengzhi Mao, Junfeng Yang

    Abstract: The impressive achievements of generative models in creating high-quality videos have raised concerns about digital integrity and privacy vulnerabilities. Recent works to combat Deepfakes videos have developed detectors that are highly accurate at identifying GAN-generated samples. However, the robustness of these detectors on diffusion-generated videos generated from video creation tools (e.g., S… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  30. arXiv:2406.09416  [pdf, other

    cs.CV

    Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

    Authors: Qihao Liu, Zhanpeng Zeng, Ju He, Qihang Yu, Xiaohui Shen, Liang-Chieh Chen

    Abstract: This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation. While conventional approaches rely on convolutional U-Net architectures, recent Transformer-based designs have demonstrated superior performance and… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Introducing DiMR, a new diffusion backbone that surpasses all existing image generation models of various sizes on ImageNet 256 with only 505M parameters. Project page: https://qihao067.github.io/projects/DiMR

  31. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  32. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  33. arXiv:2406.09136  [pdf, other

    cs.CL cs.LG

    Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

    Authors: Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin

    Abstract: The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT dec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  34. arXiv:2406.08478  [pdf, other

    cs.CV cs.CL

    What If We Recaption Billions of Web Images with LLaMA-3?

    Authors: Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie

    Abstract: Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training across various vision-language tasks, particularly text-to-image generation. However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community eff… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: First five authors contributed equally

  35. arXiv:2406.08374  [pdf, other

    cs.CV cs.AI eess.IV

    2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

    Authors: Tianqi Chen, Jun Hou, Yinchi Zhou, Huidong Xie, Xiongchao Chen, Qiong Liu, Xueqi Guo, Menghua Xia, James S. Duncan, Chi Liu, Bo Zhou

    Abstract: Positron Emission Tomography (PET) is an important clinical imaging tool but inevitably introduces radiation hazards to patients and healthcare providers. Reducing the tracer injection dose and eliminating the CT acquisition for attenuation correction can reduce the overall radiation dose, but often results in PET with high noise and bias. Thus, it is desirable to develop 3D methods to translate t… ▽ More

    Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 7 figures

  36. arXiv:2406.08343  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation Solver

    Authors: Hegan Chen, Jichang Yang, Jia Chen, Songqi Wang, Shaocong Wang, Dingchen Wang, Xinyu Tian, Yifei Yu, Xi Chen, Yinan Lin, Yangu He, Xiaoshan Wu, Yi Li, Xinyuan Zhang, Ning Lin, Meng Xu, Yi Li, Xumeng Zhang, Zhongrui Wang, Han Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for develo** digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underl… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  37. arXiv:2406.07706  [pdf, other

    cs.CV

    Object-level Scene Deocclusion

    Authors: Zhengzhe Liu, Qing Liu, Chirui Chang, Jianming Zhang, Daniil Pakhomov, Haitian Zheng, Zhe Lin, Daniel Cohen-Or, Chi-Wing Fu

    Abstract: Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a foundation model for object-level scene deocclusion. Leveraging the rich prior of pre-trained models, we first design the parallel variational autoencoder, which pr… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: SIGGRAPH 2024. A foundation model for category-agnostic object deocclusion

  38. arXiv:2406.07648  [pdf, other

    cs.CV

    M-LRM: Multi-view Large Reconstruction Model

    Authors: Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Xiaowei Chi, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  39. arXiv:2406.06891  [pdf, other

    cs.LG cs.AI

    Tokenize features, enhancing tables: the FT-TABPFN model for tabular classification

    Authors: Quangao Liu, Wei Yang, Chen Liang, Longlong Pang, Zhuozhang Zou

    Abstract: Traditional methods for tabular classification usually rely on supervised learning from scratch, which requires extensive training data to determine model parameters. However, a novel approach called Prior-Data Fitted Networks (TabPFN) has changed this paradigm. TabPFN uses a 12-layer transformer trained on large synthetic datasets to learn universal tabular representations. This method enables fa… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  40. arXiv:2406.06517  [pdf, other

    cs.CV

    Genomics-guided Representation Learning for Pathologic Pan-cancer Tumor Microenvironment Subtype Prediction

    Authors: Fangliangzi Meng, Hongrun Zhang, Ruodan Yan, Guohui Chuai, Chao Li, Qi Liu

    Abstract: The characterization of Tumor MicroEnvironment (TME) is challenging due to its complexity and heterogeneity. Relatively consistent TME characteristics embedded within highly specific tissue features, render them difficult to predict. The capability to accurately classify TME subtypes is of critical significance for clinical tumor diagnosis and precision medicine. Based on the observation that tumo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  41. arXiv:2406.06357  [pdf, other

    cs.CL cs.AI

    MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows

    Authors: Xingjian Zhang, Yutong Xie, ** Huang, **ge Ma, Zhaoying Pan, Qijia Liu, Ziyang Xiong, Tolga Ergen, Dongsub Shim, Honglak Lee, Qiaozhu Mei

    Abstract: Scientific innovation relies on detailed workflows, which include critical steps such as analyzing literature, generating ideas, validating these ideas, interpreting results, and inspiring follow-up research. However, scientific publications that document these workflows are extensive and unstructured. This makes it difficult for both human researchers and AI systems to effectively navigate and ex… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:1706.03762 by other authors

  42. arXiv:2406.05352  [pdf, other

    cs.CV

    1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation

    Authors: Qingfeng Liu, Mostafa El-Khamy, Kee-Bong Song

    Abstract: The third Pixel-level Video Understanding in the Wild (PVUW CVPR 2024) challenge aims to advance the state of art in video understanding through benchmarking Video Panoptic Segmentation (VPS) and Video Semantic Segmentation (VSS) on challenging videos and scenes introduced in the large-scale Video Panoptic Segmentation in the Wild (VIPSeg) test set and the large-scale Video Scene Parsing in the Wi… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  43. arXiv:2406.05013  [pdf, other

    cs.IR cs.CL

    CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search

    Authors: Fengran Mo, Abbas Ghaddar, Kelong Mao, Mehdi Rezagholizadeh, Boxing Chen, Qun Liu, Jian-Yun Nie

    Abstract: In this paper, we study how open-source large language models (LLMs) can be effectively deployed for improving query rewriting in conversational search, especially for ambiguous queries. We introduce CHIQ, a two-step method that leverages the capabilities of LLMs to resolve ambiguities in the conversation history before query rewriting. This approach contrasts with prior studies that predominantly… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  44. arXiv:2406.04828  [pdf, other

    cs.IR

    QAGCF: Graph Collaborative Filtering for Q&A Recommendation

    Authors: Changshuo Zhang, Teng Shi, Xiao Zhang, Yan** Zheng, Ruobing Xie, Qi Liu, Jun Xu, Ji-Rong Wen

    Abstract: Question and answer (Q&A) platforms usually recommend question-answer pairs to meet users' knowledge acquisition needs, unlike traditional recommendations that recommend only one item. This makes user behaviors more complex, and presents two challenges for Q&A recommendation, including: the collaborative information entanglement, which means user feedback is influenced by either the question or th… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  45. Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization

    Authors: Huanhuan Ma, **ghao Zhang, Qiang Liu, Shu Wu, Liang Wang

    Abstract: The rapid spread of information through mobile devices and media has led to the widespread of false or deceptive news, causing significant concerns in society. Among different types of misinformation, image repurposing, also known as out-of-context misinformation, remains highly prevalent and effective. However, current approaches for detecting out-of-context misinformation often lack interpretabi… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ICASSP 2024 lecture paper

  46. arXiv:2406.04322  [pdf, other

    cs.CV

    DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data

    Authors: Qihao Liu, Yi Zhang, Song Bai, Adam Kortylewski, Alan Yuille

    Abstract: We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets (represented by Neural Radiance Fields) from text prompts. Unlike recent 3D generative models that rely on clean and well-aligned 3D data, limiting them to single or few-class generation, our model is directly trained on extensive noisy and unaligned `in-the-wild' 3D assets, mitigating the key challenge… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024. Code: https://github.com/qihao067/direct3d Project page: https://direct-3d.github.io/

  47. arXiv:2406.04321  [pdf, other

    cs.CV cs.LG cs.MM cs.SD

    VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

    Authors: Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Xiaoqiang Huang, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo

    Abstract: In this work, we systematically study music generation conditioned solely on the video. First, we present a large-scale dataset comprising 190K video-music pairs, including various genres such as movie trailers, advertisements, and documentaries. Furthermore, we propose VidMuse, a simple framework for generating music aligned with video inputs. VidMuse stands out by producing high-fidelity music t… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: The code and datasets will be available at https://github.com/ZeyueT/VidMuse/

  48. arXiv:2406.04089  [pdf, other

    cs.LG cs.AI

    On Limitation of Transformer for Learning HMMs

    Authors: Jiachen Hu, Qinghua Liu, Chi **

    Abstract: Despite the remarkable success of Transformer-based architectures in various sequential modeling tasks, such as natural language processing, computer vision, and robotics, their ability to learn basic sequential models, like Hidden Markov Models (HMMs), is still unclear. This paper investigates the performance of Transformers in learning HMMs and their variants through extensive experimentation an… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  49. arXiv:2406.02134  [pdf, other

    cs.CL

    The current status of large language models in summarizing radiology report impressions

    Authors: Danqing Hu, Shanyuan Zhang, Qing Liu, Xiaofeng Zhu, Bing Liu

    Abstract: Large language models (LLMs) like ChatGPT show excellent capabilities in various natural language processing tasks, especially for text generation. The effectiveness of LLMs in summarizing radiology report impressions remains unclear. In this study, we explore the capability of eight LLMs on the radiology report impression summarization. Three types of radiology reports, i.e., CT, PET-CT, and Ultr… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  50. arXiv:2406.02110  [pdf, other

    cs.CL cs.AI

    UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models

    Authors: Zhuoyang Li, Liran Deng, Hui Liu, Qiaoqiao Liu, Junzhao Du

    Abstract: OwnThink stands as the most extensive Chinese open-domain knowledge graph introduced in recent times. Despite prior attempts in question answering over OwnThink (OQA), existing studies have faced limitations in model representation capabilities, posing challenges in further enhancing overall accuracy in question answering. In this paper, we introduce UniOQA, a unified framework that integrates two… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures