Skip to main content

Showing 1–50 of 238 results for author: Ren, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16116  [pdf, ps, other

    cs.NE

    A First Running Time Analysis of the Strength Pareto Evolutionary Algorithm 2 (SPEA2)

    Authors: Shengjie Ren, Chao Bian, Miqing Li, Chao Qian

    Abstract: Evolutionary algorithms (EAs) have emerged as a predominant approach for addressing multi-objective optimization problems. However, the theoretical foundation of multi-objective EAs (MOEAs), particularly the fundamental aspects like running time analysis, remains largely underexplored. Existing theoretical studies mainly focus on basic MOEAs, with little attention given to practical MOEAs. In this… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. Learning Flexible Time-windowed Granger Causality Integrating Heterogeneous Interventional Time Series Data

    Authors: Ziyi Zhang, Shaogang Ren, Xiaoning Qian, Nick Duffield

    Abstract: Granger causality, commonly used for inferring causal structures from time series data, has been adopted in widespread applications across various fields due to its intuitive explainability and high compatibility with emerging deep neural network prediction models. To alleviate challenges in better deciphering causal structures unambiguously from time series, the use of interventional data has bec… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by ACM SIGKDD 2024

  3. arXiv:2406.08478  [pdf, other

    cs.CV cs.CL

    What If We Recaption Billions of Web Images with LLaMA-3?

    Authors: Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie

    Abstract: Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training across various vision-language tasks, particularly text-to-image generation. However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community eff… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: First five authors contributed equally

  4. arXiv:2406.07537  [pdf, other

    cs.CV

    Autoregressive Pretraining with Mamba in Vision

    Authors: Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

    Abstract: The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. Efficiency-wise, the autoregressive nature can well capitalize on the Mamba's unidirectional recurrent structur… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2406.05565  [pdf, other

    cs.CV

    Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

    Authors: Sucheng Ren, Xiaoke Huang, Xianhang Li, Junfei Xiao, Jieru Mei, Zeyu Wang, Alan Yuille, Yuyin Zhou

    Abstract: This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework. Specifically, MVG employs an in-context generation strategy that standardizes the handling of inputs and outputs as images. By treati… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  6. arXiv:2406.02790  [pdf, other

    cs.LG cs.CY

    Building Socially-Equitable Public Models

    Authors: Yejia Liu, Jianyi Yang, Pengfei Li, Tongxin Li, Shaolei Ren

    Abstract: Public models offer predictions to a variety of downstream tasks and have played a crucial role in various AI applications, showcasing their proficiency in accurate predictions. However, the exclusive emphasis on prediction accuracy may not align with the diverse end objectives of downstream agents. Recognizing the public model's predictions as a service, we advocate for integrating the objectives… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by the ICML 2024

  7. arXiv:2406.02658  [pdf, other

    cs.NE

    Maintaining Diversity Provably Helps in Evolutionary Multimodal Optimization

    Authors: Shengjie Ren, Zhijia Qiu, Chao Bian, Miqing Li, Chao Qian

    Abstract: In the real world, there exist a class of optimization problems that multiple (local) optimal solutions in the solution space correspond to a single point in the objective space. In this paper, we theoretically show that for such multimodal problems, a simple method that considers the diversity of solutions in the solution space can benefit the search in evolutionary algorithms (EAs). Specifically… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2406.02118

  8. arXiv:2406.02118  [pdf, other

    cs.NE

    An Archive Can Bring Provable Speed-ups in Multi-Objective Evolutionary Algorithms

    Authors: Chao Bian, Shengjie Ren, Miqing Li, Chao Qian

    Abstract: In the area of multi-objective evolutionary algorithms (MOEAs), there is a trend of using an archive to store non-dominated solutions generated during the search. This is because 1) MOEAs may easily end up with the final population containing inferior solutions that are dominated by other solutions discarded during the search process and 2) the population that has a commensurable size of the probl… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  9. arXiv:2406.01946  [pdf, other

    cs.CR cs.CL

    Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature

    Authors: Tong Zhou, Xuandong Zhao, Xiaolin Xu, Shaolei Ren

    Abstract: Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  10. arXiv:2405.21075  [pdf, other

    cs.CV cs.CL

    Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

    Authors: Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

    Abstract: In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on develo** their capabilities in static image understanding. The potential of MLLMs in processing sequential visual data is still insufficiently explored, highlighting the absence of a comprehensive, high-quality… ▽ More

    Submitted 16 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Project Page: https://video-mme.github.io

  11. arXiv:2405.20985  [pdf, other

    cs.CV

    DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

    Authors: Linli Yao, Lei Li, Shuhuai Ren, Lean Wang, Yuanxin Liu, Xu Sun, Lu Hou

    Abstract: The visual projector, which bridges the vision and language modalities and facilitates cross-modal alignment, serves as a crucial component in MLLMs. However, measuring the effectiveness of projectors in vision-language alignment remains under-explored, which currently can only be inferred from the performance of MLLMs on downstream tasks. Motivated by the problem, this study examines the projecto… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  12. arXiv:2405.17469  [pdf, other

    cs.LG cs.AI cs.CY cs.PF

    A Dataset for Research on Water Sustainability

    Authors: Pranjol Sen Gupta, Md Rajib Hossen, Pengfei Li, Shaolei Ren, Mohammad A. Islam

    Abstract: Freshwater scarcity is a global problem that requires collective efforts across all industry sectors. Nevertheless, a lack of access to operational water footprint data bars many applications from exploring optimization opportunities hidden within the temporal and spatial variations. To break this barrier into research in water sustainability, we build a dataset for operation direct water usage in… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted by ACM e-Energy 2024

  13. arXiv:2405.15160  [pdf, other

    cs.CV

    ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

    Authors: Sucheng Ren, Hongru Zhu, Chen Wei, Yijiang Li, Alan Yuille, Cihang Xie

    Abstract: This paper presents a new self-supervised video representation learning framework, ARVideo, which autoregressively predicts the next video token in a tailored sequence order. Two key designs are included. First, we organize autoregressive video tokens into clusters that span both spatially and temporally, thereby enabling a richer aggregation of contextual information compared to the standard spat… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  14. arXiv:2405.15034  [pdf, other

    cs.CG

    NeCGS: Neural Compression for 3D Geometry Sets

    Authors: Siyu Ren, Junhui Hou, Wen** Wang

    Abstract: This paper explores the problem of effectively compressing 3D geometry sets containing diverse categories. We make \textit{the first} attempt to tackle this fundamental and challenging problem and propose NeCGS, a neural compression paradigm, which can compress hundreds of detailed and diverse 3D mesh models (~684 MB) by about 900 times (0.76 MB) with high accuracy and preservation of detailed geo… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  15. arXiv:2405.14858  [pdf, other

    cs.CV

    Mamba-R: Vision Mamba ALSO Needs Registers

    Authors: Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

    Abstract: Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba. These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe in Vision Mamba -- they exist prevalently even with the tiny-sized model and activate extensively across background regions. To mitigate this issue, we… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  16. arXiv:2405.14854  [pdf, other

    cs.CV cs.LG

    TerDiT: Ternary Diffusion Models with Transformers

    Authors: Xudong Lu, Aojun Zhou, Ziyi Lin, Qi Liu, Yuhui Xu, Renrui Zhang, Yafei Wen, Shuai Ren, Peng Gao, Junchi Yan, Hongsheng Li

    Abstract: Recent developments in large-scale pre-trained text-to-image diffusion models have significantly improved the generation of high-fidelity images, particularly with the emergence of diffusion models based on transformer architecture (DiTs). Among these diffusion models, diffusion transformers have demonstrated superior image generation capabilities, boosting lower FID scores and higher scalability.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 18 pages, 13 figures

  17. arXiv:2405.07293  [pdf, other

    cs.CV cs.AI

    Sparse Sampling is All You Need for Fast Wrong-way Cycling Detection in CCTV Videos

    Authors: **g Xu, Wentao Shi, Sheng Ren, Pan Gao, Peng Zhou, Jie Qin

    Abstract: In the field of transportation, it is of paramount importance to address and mitigate illegal actions committed by both motor and non-motor vehicles. Among those actions, wrong-way cycling (i.e., riding a bicycle or e-bike in the opposite direction of the designated traffic flow) poses significant risks to both cyclists and other road users. To this end, this paper formulates a problem of detectin… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  18. arXiv:2405.05430  [pdf, other

    cs.LG

    Towards Invariant Time Series Forecasting in Smart Cities

    Authors: Ziyi Zhang, Shaogang Ren, Xiaoning Qian, Nick Duffield

    Abstract: In the transformative landscape of smart cities, the integration of the cutting-edge web technologies into time series forecasting presents a pivotal opportunity to enhance urban planning, sustainability, and economic growth. The advancement of deep neural networks has significantly improved forecasting performance. However, a notable challenge lies in the ability of these models to generalize wel… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by ACM WWW Companion 2024

  19. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  20. arXiv:2404.18231  [pdf, other

    cs.CL cs.AI

    From Persona to Personalization: A Survey on Role-Playing Language Agents

    Authors: Jiangjie Chen, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie, Shuang Li, Ruihan Yang, Tinghui Zhu, Aili Chen, Nianqi Li, Lida Chen, Caiyu Hu, Siye Wu, Scott Ren, Ziquan Fu, Yanghua Xiao

    Abstract: Recent advancements in large language models (LLMs) have significantly boosted the rise of Role-Playing Language Agents (RPLAs), i.e., specialized AI systems designed to simulate assigned personas. By harnessing multiple advanced abilities of LLMs, including in-context learning, instruction following, and social intelligence, RPLAs achieve a remarkable sense of human likeness and vivid role-playin… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Preprint

  21. arXiv:2404.13300  [pdf, other

    cs.LG

    Capturing Momentum: Tennis Match Analysis Using Machine Learning and Time Series Theory

    Authors: **gdi Lei, Tianqi Kang, Yuluan Cao, Shiwei Ren

    Abstract: This paper represents an analysis on the momentum of tennis match. And due to Generalization performance of it, it can be helpful in constructing a system to predict the result of sports game and analyze the performance of player based on the Technical statistics. We First use hidden markov models to predict the momentum which is defined as the performance of players. Then we use Xgboost to prove… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 16 pages, 18 figures

  22. arXiv:2404.10763  [pdf, other

    cs.AI cs.CL cs.CV

    LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?

    Authors: Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun

    Abstract: Diffusion models have exhibited remarkable capabilities in text-to-image generation. However, their performance in image-to-text generation, specifically image captioning, has lagged behind Auto-Regressive (AR) models, casting doubt on their applicability for such tasks. In this work, we revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding. With… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  23. arXiv:2403.19221  [pdf, other

    cs.CV cs.AI

    Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

    Authors: Sishuo Chen, Lei Li, Shuhuai Ren, Rundong Gao, Yuanxin Liu, Xiaohan Bi, Xu Sun, Lu Hou

    Abstract: Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries. However, the existing models are constrained by the assumption of constant availability of a single auxiliary modality, which is impractical given the diversity and unpredictable nature of real-world scenarios. To this end, we propose a Miss… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Code available at https://github.com/lancopku/MR-VPC

  24. arXiv:2403.17610  [pdf, other

    cs.CV

    MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors

    Authors: He Zhang, Shenghao Ren, Haolei Yuan, Jianhui Zhao, Fan Li, Shuangpeng Sun, Zhenghao Liang, Tao Yu, Qiu Shen, Xun Cao

    Abstract: Foot contact is an important cue for human motion capture, understanding, and generation. Existing datasets tend to annotate dense foot contact using visual matching with thresholding or incorporating pressure signals. However, these approaches either suffer from low accuracy or are only designed for small-range and slow motion. There is still a lack of a vision-pressure multimodal dataset with la… ▽ More

    Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  25. Masked Multi-Domain Network: Multi-Type and Multi-Scenario Conversion Rate Prediction with a Single Model

    Authors: Wentao Ouyang, Xiuwu Zhang, Chaofeng Guo, Shukui Ren, Yupei Sui, Kun Zhang, **mei Luo, Yunfeng Chen, Dongbo Xu, Xiangzheng Liu, Yanlong Du

    Abstract: In real-world advertising systems, conversions have different types in nature and ads can be shown in different display scenarios, both of which highly impact the actual conversion rate (CVR). This results in the multi-type and multi-scenario CVR prediction problem. A desired model for this problem should satisfy the following requirements: 1) Accuracy: the model should achieve fine-grained accura… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CIKM 2023 (larger figures)

  26. arXiv:2403.13844  [pdf, other

    cs.LG cs.AI

    Scheduled Knowledge Acquisition on Lightweight Vector Symbolic Architectures for Brain-Computer Interfaces

    Authors: Yejia Liu, Shi** Duan, Xiaolin Xu, Shaolei Ren

    Abstract: Brain-Computer interfaces (BCIs) are typically designed to be lightweight and responsive in real-time to provide users timely feedback. Classical feature engineering is computationally efficient but has low accuracy, whereas the recent neural networks (DNNs) improve accuracy but are computationally expensive and incur high latency. As a promising alternative, the low-dimensional computing (LDC) cl… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted as a full paper by the tinyML Research Symposium 2024

  27. arXiv:2403.11586  [pdf, other

    cs.CV

    DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction

    Authors: Yuxin Yao, Siyu Ren, Junhui Hou, Zhi Deng, Juyong Zhang, Wen** Wang

    Abstract: This paper explores the problem of reconstructing temporally consistent surfaces from a 3D point cloud sequence without correspondence. To address this challenging task, we propose DynoSurf, an unsupervised learning framework integrating a template surface representation with a learnable deformation field. Specifically, we design a coarse-to-fine strategy for learning the template surface based on… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  28. arXiv:2403.08251  [pdf, other

    cs.MA cs.AI cs.CY

    Emergence of Social Norms in Generative Agent Societies: Principles and Architecture

    Authors: Siyue Ren, Zhiyao Cui, Ruiqi Song, Zhen Wang, Shuyue Hu

    Abstract: Social norms play a crucial role in guiding agents towards understanding and adhering to standards of behavior, thus reducing social conflicts within multi-agent systems (MASs). However, current LLM-based (or generative) MASs lack the capability to be normative. In this paper, we propose a novel architecture, named CRSEC, to empower the emergence of social norms within generative MASs. Our archite… ▽ More

    Submitted 20 May, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at IJCAI 2024

  29. arXiv:2403.05523  [pdf, other

    cs.CV

    Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapolation

    Authors: Yijiang Li, Sucheng Ren, Weipeng Deng, Yuzhi Xu, Ying Gao, Edith Ngai, Haohan Wang

    Abstract: Out-of-distribution (OOD) generalization is a favorable yet challenging property for deep neural networks. The core challenges lie in the limited availability of source domains that help models learn an invariant representation from the spurious features. Various domain augmentation have been proposed but largely rely on interpolating existing domains and frequently face difficulties in creating t… ▽ More

    Submitted 11 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: Preprint. Paper under review

  30. arXiv:2403.01129  [pdf, other

    cs.CV

    Dynamic 3D Point Cloud Sequences as 2D Videos

    Authors: Yiming Zeng, Junhui Hou, Qijian Zhang, Siyu Ren, Wen** Wang

    Abstract: Dynamic 3D point cloud sequences serve as one of the most common and practical representation modalities of dynamic real-world environments. However, their unstructured nature in both spatial and temporal domains poses significant challenges to effective and efficient processing. Existing deep point cloud sequence modeling approaches imitate the mature 2D video learning mechanisms by develo** co… ▽ More

    Submitted 21 June, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: The manuscript has been accepted by IEEE TPAMI in 2024

  31. arXiv:2403.00476  [pdf, other

    cs.CV

    TempCompass: Do Video LLMs Really Understand Videos?

    Authors: Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, Lei Li, Sishuo Chen, Xu Sun, Lu Hou

    Abstract: Recently, there is a surge in interest surrounding video large language models (Video LLMs). However, existing benchmarks fail to provide a comprehensive feedback on the temporal perception ability of Video LLMs. On the one hand, most of them are unable to distinguish between different temporal aspects (e.g., speed, direction) and thus cannot reflect the nuanced performance on these specific aspec… ▽ More

    Submitted 3 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  32. arXiv:2402.15527  [pdf, other

    cs.CL cs.AI cs.CV

    PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

    Authors: Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Xiangdi Meng, Tianyu Liu, Baobao Chang

    Abstract: We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-Bench introduces three complex scenarios: autonomous driving, domestic robotics, and open-world games. Given task instructions and diverse contexts, t… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Code and Data released at https://github.com/pkunlp-icler/PCA-EVAL. Leaderboard at: https://docs.qq.com/sheet/DVUd4WUpGRHRqUnNV. This article supersedes its workshop version arxiv: 2310.02071. arXiv admin note: text overlap with arXiv:2310.02071

  33. arXiv:2402.09747  [pdf, other

    eess.IV cs.CV cs.LG

    Less is more: Ensemble Learning for Retinal Disease Recognition Under Limited Resources

    Authors: Jiahao Wang, Hong Peng, Shengchao Chen, Sufen Ren

    Abstract: Retinal optical coherence tomography (OCT) images provide crucial insights into the health of the posterior ocular segment. Therefore, the advancement of automated image analysis methods is imperative to equip clinicians and researchers with quantitative data, thereby facilitating informed decision-making. The application of deep learning (DL)-based approaches has gained extensive traction for exe… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Ongoing work

  34. arXiv:2402.06262  [pdf, other

    cs.CL cs.AI

    On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference

    Authors: Siyu Ren, Kenny Q. Zhu

    Abstract: Despite the recent success associated with Large Language Models (LLMs), they are notably cost-prohibitive to deploy in resource-constrained environments due to their excessive memory and computational demands. In addition to model parameters, the key-value cache is also stored in GPU memory, growing linearly with batch size and sequence length. As a remedy, recent works have proposed various evic… ▽ More

    Submitted 17 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  35. arXiv:2402.05940  [pdf

    cs.LG cs.AI stat.ME

    Causal Relationship Network of Risk Factors Impacting Workday Loss in Underground Coal Mines

    Authors: Shangsi Ren, Cameron A. Beeche, Zhiyi Shi, Maria Acevedo Garcia, Katherine Zychowski, Shuguang Leng, Pedram Roghanchi, Jiantao Pu

    Abstract: This study aims to establish the causal relationship network between various factors leading to workday loss in underground coal mines using a novel causal artificial intelligence (AI) method. The analysis utilizes data obtained from the National Institute for Occupational Safety and Health (NIOSH). A total of 101,010 injury records from 3,982 unique underground coal mines spanning the years from… ▽ More

    Submitted 24 January, 2024; originally announced February 2024.

    Comments: 5 figures 5 tables

  36. arXiv:2402.02322  [pdf, other

    cs.LG stat.ML

    Dynamic Incremental Optimization for Best Subset Selection

    Authors: Shaogang Ren, Xiaoning Qian

    Abstract: Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-smooth non-convex problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual algorithm is developed based on the primal and dual problem structures. By leveraging the d… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2207.02058

  37. arXiv:2402.02277  [pdf, other

    cs.LG stat.ML

    Causal Bayesian Optimization via Exogenous Distribution Learning

    Authors: Shaogang Ren, Xiaoning Qian

    Abstract: Maximizing a target variable as an operational objective in a structural causal model is an important problem. Existing Causal Bayesian Optimization~(CBO) methods either rely on hard interventions that alter the causal structure to maximize the reward; or introduce action nodes to endogenous variables so that the data generation mechanisms are adjusted to achieve the objective. In this paper, a no… ▽ More

    Submitted 26 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  38. arXiv:2401.12452  [pdf, other

    cs.CV

    Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration

    Authors: Yifan Zhang, Siyu Ren, Junhui Hou, **jian Wu, Guangming Shi

    Abstract: This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes. Specifically, our approach, named NCLR, focuses on 2D-3D neural calibration, a novel pretext task that estimates the rigid transformation aligning camera and LiDAR coordinate systems. First, we propose the learnable transformation alignment to bridge the domain gap between ima… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Under review

  39. arXiv:2401.09736  [pdf, other

    cs.CV

    Measuring the Discrepancy between 3D Geometric Models using Directional Distance Fields

    Authors: Siyu Ren, Junhui Hou, Xiaodong Chen, Hongkai Xiong, Wen** Wang

    Abstract: Qualifying the discrepancy between 3D geometric models, which could be represented with either point clouds or triangle meshes, is a pivotal issue with board applications. Existing methods mainly focus on directly establishing the correspondence between two models and then aggregating point-wise distance between corresponding points, resulting in them being either inefficient or ineffective. In th… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  40. arXiv:2401.07261  [pdf, other

    cs.CR

    LookAhead: Preventing DeFi Attacks via Unveiling Adversarial Contracts

    Authors: Shoupeng Ren, Tianyu Tu, Jian Liu, Di Wu, Kui Ren

    Abstract: DeFi incidents stemming from various smart contract vulnerabilities have culminated in financial damages exceeding 3 billion USD. The attacks causing such incidents commonly commence with the deployment of adversarial contracts, subsequently leveraging these contracts to execute adversarial transactions that exploit vulnerabilities in victim contracts. Existing defense mechanisms leverage heuristi… ▽ More

    Submitted 2 February, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: 14 pages, 11 figures

  41. arXiv:2401.04340  [pdf, other

    cs.GT cs.PF

    Online Allocation with Replenishable Budgets: Worst Case and Beyond

    Authors: Jianyi Yang, Pengfei Li, Mohammad Jaminur Islam, Shaolei Ren

    Abstract: This paper studies online resource allocation with replenishable budgets, where budgets can be replenished on top of the initial budget and an agent sequentially chooses online allocation decisions without violating the available budget constraint at each round. We propose a novel online algorithm, called OACP (Opportunistic Allocation with Conservative Pricing), that conservatively adjusts dual v… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by ACM SIGMETRICS 2024

  42. arXiv:2401.01493  [pdf, other

    cs.LG cs.AI cs.CR

    Free Lunch for Federated Remote Sensing Target Fine-Grained Classification: A Parameter-Efficient Framework

    Authors: Shengchao Chen, Ting Shu, Huan Zhao, Jiahao Wang, Sufen Ren, Lina Yang

    Abstract: Remote Sensing Target Fine-grained Classification (TFGC) is of great significance in both military and civilian fields. Due to location differences, growth in data size, and centralized server storage constraints, these data are usually stored under different databases across regions/countries. However, privacy laws and national security concerns constrain researchers from accessing these sensitiv… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Under Review, 23 pages, 3 figures, 12 tables

  43. arXiv:2312.08985  [pdf, other

    cs.CV

    OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

    Authors: Han Liang, Jiacheng Bao, Ruichi Zhang, Sihan Ren, Yuecheng Xu, Sibei Yang, Xin Chen, **gyi Yu, Lan Xu

    Abstract: We have recently seen tremendous progress in realistic text-to-motion generation. Yet, the existing methods often fail or produce implausible motions with unseen text inputs, which limits the applications. In this paper, we present OMG, a novel framework, which enables compelling motion generation from zero-shot open-vocabulary text prompts. Our key idea is to carefully tailor the pretrain-then-fi… ▽ More

    Submitted 19 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: accepted by CVPR 2024

  44. arXiv:2312.06726  [pdf, other

    cs.CV

    Compress & Align: Curating Image-Text Data with Human Knowledge

    Authors: Lei Zhang, Fangxun Shu, Sucheng Ren, Bingchen Zhao, Hao Jiang, Cihang Xie

    Abstract: The massive growth of image-text data through web crawling inherently presents the challenge of variability in data quality. This paper introduces a novel algorithm, rooted in human knowledge, to compress this vast corpus of web-crawled image-text datasets to a compact and high-quality form. Our method unfolds in three major steps. First, we collect an image-text dataset, wherein each image is ass… ▽ More

    Submitted 12 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

  45. arXiv:2312.02147  [pdf, other

    cs.CV

    Rejuvenating image-GPT as Strong Visual Representation Learners

    Authors: Sucheng Ren, Zeyu Wang, Hongru Zhu, Junfei Xiao, Alan Yuille, Cihang Xie

    Abstract: This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict next pixels for visual representation learning. Two simple yet essential changes are made. First, we shift the prediction target from raw pixels to semantic tokens, enabling a higher-level understanding of visual content. Second, we supplement the autoregressive modeling by instru… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Larger models are coming

  46. arXiv:2312.02051  [pdf, other

    cs.CV cs.AI cs.CL

    TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

    Authors: Shuhuai Ren, Linli Yao, Shicheng Li, Xu Sun, Lu Hou

    Abstract: This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding. Our model incorporates two key architectural contributions: (1) a timestamp-aware frame encoder that binds visual content with the timestamp of each frame, and (2) a sliding video Q-Former that produces a video token sequence of varying lengths to accommodate videos of… ▽ More

    Submitted 28 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 camera-ready version, code is available at https://github.com/RenShuhuai-Andy/TimeChat

  47. arXiv:2311.17404  [pdf, other

    cs.CV cs.AI cs.CL

    VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

    Authors: Shicheng Li, Lei Li, Shuhuai Ren, Yuanxin Liu, Yi Liu, Rundong Gao, Xu Sun, Lu Hou

    Abstract: The ability to perceive how objects change over time is a crucial ingredient in human intelligence. However, current benchmarks cannot faithfully reflect the temporal understanding abilities of video-language models (VidLMs) due to the existence of static visual shortcuts. To remedy this issue, we present VITATECS, a diagnostic VIdeo-Text dAtaset for the evaluation of TEmporal Concept underStandin… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 23 pages, 6 figures, 18 tables, data is available at https://github.com/lscpku/VITATECS

  48. arXiv:2311.10529  [pdf, other

    cs.CV

    Enhancing the Reliability of Segment Anything Model for Auto-Prompting Medical Image Segmentation with Uncertainty Rectification

    Authors: Yichi Zhang, Shiyao Hu, Sijie Ren, Chen Jiang, Yuan Cheng, Yuan Qi

    Abstract: The Segment Anything Model (SAM) has recently emerged as a groundbreaking foundation model for prompt-driven image segmentation tasks. However, both the original SAM and its medical variants require slice-by-slice manual prompting of target structures, which directly increase the burden for applications. Despite attempts of auto-prompting to turn SAM into a fully automatic manner, it still exhibit… ▽ More

    Submitted 18 March, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

  49. arXiv:2311.09278  [pdf, other

    cs.CL cs.AI

    Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models

    Authors: Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan, Qika Lin, Yu Qiao, Jun Liu

    Abstract: Although Large Language Models (LLMs) demonstrate remarkable ability in processing and generating human-like text, they do have limitations when it comes to comprehending and expressing world knowledge that extends beyond the boundaries of natural language(e.g., chemical molecular formula). Injecting a collection of symbolic data directly into the training of LLMs can be problematic, as it disrega… ▽ More

    Submitted 18 February, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: 23 pages, 13 figures

  50. arXiv:2311.03615  [pdf, other

    cs.LG cs.DC

    CAFE: Carbon-Aware Federated Learning in Geographically Distributed Data Centers

    Authors: Jieming Bian, Lei Wang, Shaolei Ren, Jie Xu

    Abstract: Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consi… ▽ More

    Submitted 5 February, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Preprint, Experiments Updated