Skip to main content

Showing 1–50 of 126 results for author: Ge, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15968  [pdf, other

    cs.CL cs.LG

    ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods

    Authors: Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Zhenqiang Gong, Bhuwan Dhingra

    Abstract: The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the pretraining data used for training them. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraini… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  2. arXiv:2406.04845  [pdf, other

    cs.CL cs.AI cs.DC cs.LG cs.MA

    FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models

    Authors: Rui Ye, Rui Ge, Xinyu Zhu, **gyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, Siheng Chen

    Abstract: Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM). Following this training paradigm, the community has put massive efforts from diverse aspects including framework, performance, and privacy. However, an unpleasant fact is that there are currently no realistic datasets and benchmarks for FedLLM and previous wo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 22 pages

  3. arXiv:2406.04068  [pdf, other

    cs.LG math.ST stat.ML

    Reassessing How to Compare and Improve the Calibration of Machine Learning Models

    Authors: Muthu Chidambaram, Rong Ge

    Abstract: A machine learning model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction. This property has become increasingly important as the impact of machine learning models has continued to spread to various domains. As a result, there are now a dizzying number of recent papers on measuring and improving the calibr… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 20 pages, 7 figures

  4. arXiv:2406.01766  [pdf, ps, other

    cs.LG stat.ML

    How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks

    Authors: Mo Zhou, Rong Ge

    Abstract: The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learnin… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  5. arXiv:2405.10561  [pdf, other

    eess.IV cs.CV

    Infrared Image Super-Resolution via Lightweight Information Split Network

    Authors: Shijie Liu, Kang Yan, Feiwei Qin, Changmiao Wang, Ruiquan Ge, Kai Zhang, Jie Huang, Yong Peng, ** Cao

    Abstract: Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory… ▽ More

    Submitted 27 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  6. Shared Virtual Memory: Its Design and Performance Implications for Diverse Applications

    Authors: Bennett Cooper, Thomas R. W. Scogland, Rong Ge

    Abstract: Discrete GPU accelerators, while providing massive computing power for supercomputers and data centers, have their separate memory domain. Explicit memory management across device and host domains in programming is tedious and error-prone. To improve programming portability and productivity, Unified Memory (UM) integrates GPU memory into the host virtual memory systems, and provides transparent da… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: To be published in ICS '24

  7. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2405.00542  [pdf, other

    eess.IV cs.CV

    UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement

    Authors: Ruiquan Ge, Zhaojie Fang, Pengxue Wei, Zhanghao Chen, Hongyang Jiang, Ahmed Elazab, Wangting Li, Xiang Wan, Shaochong Zhang, Changmiao Wang

    Abstract: Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  9. arXiv:2404.19531  [pdf, other

    cs.CV

    MoST: Multi-modality Scene Tokenization for Motion Prediction

    Authors: Norman Mu, **gwei Ji, Zhenpei Yang, Nate Harada, Haotian Tang, Kan Chen, Charles R. Qi, Runzhou Ge, Kratarth Goel, Zoey Yang, Scott Ettinger, Rami Al-Rfou, Dragomir Anguelov, Yin Zhou

    Abstract: Many existing motion prediction approaches rely on symbolic perception outputs to generate agent trajectories, such as bounding boxes, road graph information and traffic lights. This symbolic representation is a high-level abstraction of the real world, which may render the motion prediction model vulnerable to perception errors (e.g., failures in detecting open-vocabulary obstacles) while missing… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  10. arXiv:2404.18007  [pdf, ps, other

    cs.LO

    A Formal Model to Prove Instantiation Termination for E-matching-Based Axiomatisations (Extended Version)

    Authors: Rui Ge, Ronald Garcia, Alexander J. Summers

    Abstract: SMT-based program analysis and verification often involve reasoning about program features that have been specified using quantifiers; incorporating quantifiers into SMT-based reasoning is, however, known to be challenging. If quantifier instantiation is not carefully controlled, then runtime and outcomes can be brittle and hard to predict. In particular, uncontrolled quantifier instantiation can… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: extended version of IJCAR 2024 publication

  11. arXiv:2403.12401  [pdf, other

    cs.CV

    VQ-NeRV: A Vector Quantized Neural Representation for Videos

    Authors: Yunjie Xu, Xiang Feng, Feiwei Qin, Ruiquan Ge, Yong Peng, Changmiao Wang

    Abstract: Implicit neural representations (INR) excel in encoding videos within neural networks, showcasing promise in computer vision tasks like video compression and denoising. INR-based approaches reconstruct video frames from content-agnostic embeddings, which hampers their efficacy in video frame regression and restricts their generalization ability for video interpolation. To address these deficiencie… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Under Review

  12. arXiv:2403.10547  [pdf, ps, other

    math.OC cs.AI cs.DS cs.LG

    Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

    Authors: Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright

    Abstract: Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, limiting the use of existing nonconvex algorithms in adversarial settings. In this paper, we study the problem of finding SOSPs in the strong c… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  13. arXiv:2402.17187  [pdf, other

    eess.IV cs.CV

    PE-MVCNet: Multi-view and Cross-modal Fusion Network for Pulmonary Embolism Prediction

    Authors: Zhaoxin Guo, Zhipeng Wang, Ruiquan Ge, Jianxun Yu, Feiwei Qin, Yuan Tian, Yuqing Peng, Yonghong Li, Changmiao Wang

    Abstract: The early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates. Both image-based and non-image-based features are of utmost importance in medical classification tasks. In a clinical setting, physicians tend to rely on the contextual information provided by Electronic Medical Records (EMR) to interpret medical imaging. However, very few models effectively integrat… ▽ More

    Submitted 17 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  14. arXiv:2402.14180  [pdf, other

    cs.LG

    Linear Transformers are Versatile In-Context Learners

    Authors: Max Vladymyrov, Johannes von Oswald, Mark Sandler, Rong Ge

    Abstract: Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in handling more complex problems remains unexplored. In this paper, we prove that any linear transformer maintains an implicit linear model and can be interpreted as… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  15. arXiv:2402.11307  [pdf, other

    cs.CV

    ICHPro: Intracerebral Hemorrhage Prognosis Classification Via Joint-attention Fusion-based 3d Cross-modal Network

    Authors: Xinlei Yu, Xinyang Li, Ruiquan Ge, Shibin Wu, Ahmed Elazab, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Taosheng Xu, Xiang Wan, Changmiao Wang

    Abstract: Intracerebral Hemorrhage (ICH) is the deadliest subtype of stroke, necessitating timely and accurate prognostic evaluation to reduce mortality and disability. However, the multi-factorial nature and complexity of ICH make methods based solely on computed tomography (CT) image features inadequate. Despite the capacity of cross-modal networks to fuse additional information, the effective combination… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 6 pages,4 figures, 4 tables, accepted by ISBI

  16. arXiv:2402.11274  [pdf, other

    eess.IV cs.CV cs.LG

    TC-DiffRecon: Texture coordination MRI reconstruction method based on diffusion model and modified MF-UNet method

    Authors: Chenyan Zhang, Yifei Chen, Zhenxiong Fan, Yiyu Huang, Wenchao Weng, Ruiquan Ge, Dong Zeng, Changmiao Wang

    Abstract: Recently, diffusion models have gained significant attention as a novel set of deep learning-based generative methods. These models attempt to sample data from a Gaussian distribution that adheres to a target distribution, and have been successfully adapted to the reconstruction of MRI data. However, as an unconditional generative model, the diffusion model typically disrupts image coordination be… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 5 pages, 2 figures, accept ISBI2024

    Journal ref: ISBI 2024

  17. arXiv:2402.08948  [pdf, ps, other

    cs.LG math.AP

    Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

    Authors: Ziang Chen, Rong Ge

    Abstract: In this work, we study the mean-field flow for learning subspace-sparse polynomials using stochastic gradient descent and two-layer neural networks, where the input distribution is standard Gaussian and the output only depends on the projection of the input onto a low-dimensional subspace. We propose a basis-free generalization of the merged-staircase property in Abbe et al. (2022) and establish a… ▽ More

    Submitted 8 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  18. arXiv:2402.06855  [pdf, other

    cs.LG cs.CV

    For Better or For Worse? Learning Minimum Variance Features With Label Augmentation

    Authors: Muthu Chidambaram, Rong Ge

    Abstract: Data augmentation has been pivotal in successfully training deep learning models on classification tasks over the past decade. An important subclass of data augmentation techniques - which includes both label smoothing and Mixup - involves modifying not only the input data but also the input label during model training. In this work, we analyze the role played by the label augmentation aspect of s… ▽ More

    Submitted 27 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: 18 pages, 3 figures

  19. arXiv:2401.11859  [pdf, other

    eess.IV cs.CV

    LKFormer: Large Kernel Transformer for Infrared Image Super-Resolution

    Authors: Feiwei Qin, Kang Yan, Changmiao Wang, Ruiquan Ge, Yong Peng, Kai Zhang

    Abstract: Given the broad application of infrared technology across diverse fields, there is an increasing emphasis on investigating super-resolution techniques for infrared images within the realm of deep learning. Despite the impressive results of current Transformer-based methods in image super-resolution tasks, their reliance on the self-attentive mechanism intrinsic to the Transformer architecture resu… ▽ More

    Submitted 24 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: 14 pages, 4 figures, accept Multimedia Tools and Applications

  20. Transfer-Learning-Based Autotuning Using Gaussian Copula

    Authors: Thomas Randall, Jaehoon Koo, Brice Videau, Michael Kruse, Xingfu Wu, Paul Hovland, Mary Hall, Rong Ge, Prasanna Balaprakash

    Abstract: As diverse high-performance computing (HPC) systems are built, many opportunities arise for applications to solve larger problems than ever before. Given the significantly increased complexity of these HPC systems and application tuning, empirical performance tuning, such as autotuning, has emerged as a promising approach in recent years. Despite its effectiveness, autotuning is often a computatio… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 13 pages, 5 figures, 7 tables, the definitive version of this work is published in the Proceedings of the ACM International Conference on Supercomputing 2023, available at https://dl.acm.org/doi/10.1145/3577193.3593712

    ACM Class: I.2.4; G.3; D.2.8

    Journal ref: Proceedings of the 37th International Conference on Supercomputing (2023) 37-49

  21. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  22. arXiv:2312.07743  [pdf, other

    cs.LG cs.CL cs.DC

    FULL-W2V: Fully Exploiting Data Reuse for W2V on GPU-Accelerated Systems

    Authors: Thomas Randall, Tyler Allen, Rong Ge

    Abstract: Word2Vec remains one of the highly-impactful innovations in the field of Natural Language Processing (NLP) that represents latent grammatical and syntactical information in human text with dense vectors in a low dimension. Word2Vec has high computational cost due to the algorithm's inherent sequentiality, intensive memory accesses, and the large vocabularies it represents. While prior studies have… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 12 pages, 7 figures, 7 tables, the definitive version of this work is published in the Proceedings of the ACM International Conference on Supercomputing 2021, available at https://doi.org/10.1145/3447818.3460373

    ACM Class: I.2.7; D.1.3; G.4

    Journal ref: Proceedings of the ACM International Conference on Supercomputing (2021) 455-466

  23. arXiv:2311.15328  [pdf, other

    eess.IV cs.CV

    BS-Diff: Effective Bone Suppression Using Conditional Diffusion Models from Chest X-Ray Images

    Authors: Zhanghao Chen, Yifei Sun, Wenjian Qin, Ruiquan Ge, Cheng Pan, Wenming Deng, Zhou Liu, Wenwen Min, Ahmed Elazab, Xiang Wan, Changmiao Wang

    Abstract: Chest X-rays (CXRs) are commonly utilized as a low-dose modality for lung screening. Nonetheless, the efficacy of CXRs is somewhat impeded, given that approximately 75% of the lung area overlaps with bone, which in turn hampers the detection and diagnosis of diseases. As a remedial measure, bone suppression techniques have been introduced. The current dual-energy subtraction imaging technique in t… ▽ More

    Submitted 28 February, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: 5 pages, 2 figures, accepted by IEEE ISBI 2024

  24. arXiv:2311.07033  [pdf, other

    eess.IV cs.CV

    TTMFN: Two-stream Transformer-based Multimodal Fusion Network for Survival Prediction

    Authors: Ruiquan Ge, Xiangyang Hu, Rungen Huang, Gangyong Jia, Yaqi Wang, Renshu Gu, Changmiao Wang, Elazab Ahmed, Linyan Wang, Juan Ye, Ye Li

    Abstract: Survival prediction plays a crucial role in assisting clinicians with the development of cancer treatment protocols. Recent evidence shows that multimodal data can help in the diagnosis of cancer disease and improve survival prediction. Currently, deep learning-based approaches have experienced increasing success in survival prediction by integrating pathological images and gene expression data. H… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  25. arXiv:2311.04772  [pdf, other

    eess.IV cs.CV

    GCS-ICHNet: Assessment of Intracerebral Hemorrhage Prognosis using Self-Attention with Domain Knowledge Integration

    Authors: Xuhao Shan, Xinyang Li, Ruiquan Ge, Shibin Wu, Ahmed Elazab, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Qingying Xiao, Xiang Wan, Changmiao Wang

    Abstract: Intracerebral Hemorrhage (ICH) is a severe condition resulting from damaged brain blood vessel ruptures, often leading to complications and fatalities. Timely and accurate prognosis and management are essential due to its high mortality rate. However, conventional methods heavily rely on subjective clinician expertise, which can lead to inaccurate diagnoses and delays in treatment. Artificial inte… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 6 pages, 3 figures, 5 tables, published to BIBM 2023

  26. arXiv:2310.02777  [pdf, other

    cs.CL

    The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models

    Authors: Chenwei Wu, Li Erran Li, Stefano Ermon, Patrick Haffner, Rong Ge, Zaiwei Zhang

    Abstract: Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood. In this paper, we identify two sources of visual-linguistic compositionality: linguistic priors and the interplay between images and texts. We show that current attempts to improve compositional generalization rely on li… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  27. arXiv:2307.11530  [pdf, other

    eess.IV cs.CV

    UWAT-GAN: Fundus Fluorescein Angiography Synthesis via Ultra-wide-angle Transformation Multi-scale GAN

    Authors: Zhaojie Fang, Zhanghao Chen, Pengxue Wei, Wangting Li, Shaochong Zhang, Ahmed Elazab, Gangyong Jia, Ruiquan Ge, Changmiao Wang

    Abstract: Fundus photography is an essential examination for clinical and differential diagnosis of fundus diseases. Recently, Ultra-Wide-angle Fundus (UWF) techniques, UWF Fluorescein Angiography (UWF-FA) and UWF Scanning Laser Ophthalmoscopy (UWF-SLO) have been gradually put into use. However, Fluorescein Angiography (FA) and UWF-FA require injecting sodium fluorescein which may have detrimental influence… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: 26th International Conference on Medical Image Computing and Computer Assisted Intervention

  28. arXiv:2307.07246  [pdf, other

    cs.CV cs.LG

    Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training

    Authors: Xiaofei Chen, Yuting He, Cheng Xue, Rongjun Ge, Shuo Li, Guanyu Yang

    Abstract: The foundation models based on pre-training technology have significantly advanced artificial intelligence from theoretical to practical applications. These models have facilitated the feasibility of computer-aided diagnosis for widespread use. Medical contrastive vision-language pre-training, which does not require human annotations, is an effective approach for guiding representation learning us… ▽ More

    Submitted 17 July, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: accepted by MICCAI 2023

  29. arXiv:2306.00740  [pdf, other

    cs.LG stat.ML

    On the Limitations of Temperature Scaling for Distributions with Overlaps

    Authors: Muthu Chidambaram, Rong Ge

    Abstract: Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to be overconfident when they are wrong. Fixing this issue is known as model calibration, and has consequently received much attention in the form of modified training schemes and post-training calibration procedures such as temperature scaling. While temperature scaling is frequently used b… ▽ More

    Submitted 13 February, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: 27 pages, 9 Figures, published in ICLR 2024

  30. arXiv:2305.10633  [pdf, other

    cs.LG cs.IT stat.ML

    Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

    Authors: Alex Damian, Eshaan Nichani, Rong Ge, Jason D. Lee

    Abstract: We focus on the task of learning a single index model $σ(w^\star \cdot x)$ with respect to the isotropic Gaussian distribution in $d$ dimensions. Prior work has shown that the sample complexity of learning $w^\star$ is governed by the information exponent $k^\star$ of the link function $σ$, which is defined as the index of the first nonzero Hermite coefficient of $σ$. Ben Arous et al. (2021) showe… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  31. arXiv:2304.03834  [pdf, other

    cs.CV

    WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting

    Authors: Kan Chen, Runzhou Ge, Hang Qiu, Rami AI-Rfou, Charles R. Qi, Xuanyu Zhou, Zoey Yang, Scott Ettinger, Pei Sun, Zhaoqi Leng, Mustafa Baniodeh, Ivan Bogun, Weiyue Wang, Mingxing Tan, Dragomir Anguelov

    Abstract: Widely adopted motion forecasting datasets substitute the observed sensory inputs with higher-level abstractions such as 3D boxes and polylines. These sparse shapes are inferred through annotating the original scenes with perception systems' predictions. Such intermediate representations tie the quality of the motion forecasting models to the performance of computer vision models. Moreover, the hu… ▽ More

    Submitted 18 February, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: ICRA 2024 camera ready version. Dataset website: https://waymo.com/open/data/motion/

  32. arXiv:2304.01063  [pdf, other

    cs.LG math.OC

    Depth Separation with Multilayer Mean-Field Networks

    Authors: Yunwei Ren, Mo Zhou, Rong Ge

    Abstract: Depth separation -- why a deeper network is more powerful than a shallower one -- has been a major problem in deep learning theory. Previous results often focus on representation power. For example, arXiv:1904.06984 constructed a function that is easy to approximate using a 3-layer network but not approximable by any 2-layer network. In this paper, we show that this separation is in fact algorithm… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: ICLR 2023

  33. arXiv:2303.08117  [pdf, other

    cs.CL cs.LG

    Do Transformers Parse while Predicting the Masked Word?

    Authors: Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

    Abstract: Pre-trained language models have been shown to encode linguistic structures, e.g. dependency and constituency parse trees, in their embeddings while being trained on unsupervised loss functions like masked language modeling. Some doubts have been raised whether the models actually are doing parsing or only some computation weakly correlated with it. We study questions: (a) Is it possible to explic… ▽ More

    Submitted 15 October, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: Accpeted to EMNLP 2023, 30 pages

  34. arXiv:2303.00874  [pdf, other

    cs.CV cs.AI

    Geometric Visual Similarity Learning in 3D Medical Image Self-supervised Pre-training

    Authors: Yuting He, Guanyu Yang, Rongjun Ge, Yang Chen, Jean-Louis Coatrieux, Boyu Wang, Shuo Li

    Abstract: Learning inter-image similarity is crucial for 3D medical images self-supervised pre-training, due to their sharing of numerous same semantic regions. However, the lack of the semantic prior in metrics and the semantic-independent variation in 3D medical images make it challenging to get a reliable measurement for the inter-image similarity, hindering the learning of consistent representation for… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023

  35. arXiv:2302.12715  [pdf, other

    cs.LG cs.AI

    Hiding Data Helps: On the Benefits of Masking for Sparse Coding

    Authors: Muthu Chidambaram, Chenwei Wu, Yu Cheng, Rong Ge

    Abstract: Sparse coding, which refers to modeling a signal as sparse linear combinations of the elements of a learned dictionary, has proven to be a successful (and interpretable) approach in applications such as signal processing, computer vision, and medical imaging. While this success has spurred much work on provable guarantees for dictionary recovery when the learned dictionary is the same size as the… ▽ More

    Submitted 1 June, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: 16 pages, 1 figure, ICML 2023

  36. arXiv:2302.00257  [pdf, other

    cs.LG stat.ML

    Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

    Authors: Mo Zhou, Rong Ge

    Abstract: In deep learning, often the training process finds an interpolator (a solution with 0 training loss), but the test loss is still low. This phenomenon, known as benign overfitting, is a major mystery that received a lot of recent attention. One common mechanism for benign overfitting is implicit regularization, where the training process leads to additional properties for the interpolator, often ch… ▽ More

    Submitted 25 May, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: ICML 2023 camera ready version

  37. arXiv:2210.13512  [pdf, other

    cs.LG cs.AI cs.CV math.OC stat.ML

    Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

    Authors: Muthu Chidambaram, Xiang Wang, Chenwei Wu, Rong Ge

    Abstract: Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels. In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain so… ▽ More

    Submitted 1 June, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: 37 pages, 2 figures, ICML 2023

  38. arXiv:2210.03294  [pdf, other

    cs.LG math.OC stat.ML

    Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

    Authors: Xingyu Zhu, Zixuan Wang, Xiang Wang, Mo Zhou, Rong Ge

    Abstract: Recently, researchers observed that gradient descent for deep neural networks operates in an ``edge-of-stability'' (EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is often larger than stability threshold $2/η$ (where $η$ is the step size). Despite this, the loss oscillates and converges in the long run, and the sharpness at the end is just slightly below $2/η$. While many other wel… ▽ More

    Submitted 21 February, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: 53 pages, 19 figures

    ACM Class: I.2.6

  39. arXiv:2210.01019  [pdf, other

    stat.ML cs.LG

    Plateau in Monotonic Linear Interpolation -- A "Biased" View of Loss Landscape for Deep Networks

    Authors: Xiang Wang, Annie N. Wang, Mo Zhou, Rong Ge

    Abstract: Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks. Such a phenomenon may seem to suggest that optimization of neural networks is easy. In this paper, we show that the MLI property is not necessarily related to the… ▽ More

    Submitted 14 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: ICLR 2023

  40. arXiv:2207.08301  [pdf, other

    cs.RO

    Vision-based Relative Detection and Tracking for Teams of Micro Aerial Vehicles

    Authors: Rundong Ge, Moonyoung Lee, Vivek Radhakrishnan, Yang Zhou, Guanrui Li, Giuseppe Loianno

    Abstract: In this paper, we address the vision-based detection and tracking problems of multiple aerial vehicles using a single camera and Inertial Measurement Unit (IMU) as well as the corresponding perception consensus problem (i.e., uniqueness and identical IDs across all observing agents). We design several vision-based decentralized Bayesian multi-tracking filtering strategies to resolve the associatio… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

  41. arXiv:2207.06965  [pdf, other

    cs.RO cs.CV

    AutoMerge: A Framework for Map Assembling and Smoothing in City-scale Environments

    Authors: Peng Yin, Haowen Lai, Shiqi Zhao, Ruohai Ge, Ji Zhang, Howie Choset, Sebastian Scherer

    Abstract: We present AutoMerge, a LiDAR data processing framework for assembling a large number of map segments into a complete map. Traditional large-scale map merging methods are fragile to incorrect data associations, and are primarily limited to working only offline. AutoMerge utilizes multi-perspective fusion and adaptive loop closure detection for accurate data associations, and it uses incremental me… ▽ More

    Submitted 26 June, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: 19 pages, 20 figures, IEEE Transactions on Robotics (T-RO) 2023

  42. arXiv:2206.08524  [pdf, other

    cs.CV

    CDNet: Contrastive Disentangled Network for Fine-Grained Image Categorization of Ocular B-Scan Ultrasound

    Authors: Ruilong Dan, Yunxiang Li, Yijie Wang, Gangyong Jia, Ruiquan Ge, Juan Ye, Qun **, Yaqi Wang

    Abstract: Precise and rapid categorization of images in the B-scan ultrasound modality is vital for diagnosing ocular diseases. Nevertheless, distinguishing various diseases in ultrasound still challenges experienced ophthalmologists. Thus a novel contrastive disentangled network (CDNet) is developed in this work, aiming to tackle the fine-grained image categorization (FGIC) challenges of ocular abnormaliti… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  43. arXiv:2205.10737  [pdf, other

    cs.RO

    ALITA: A Large-scale Incremental Dataset for Long-term Autonomy

    Authors: Peng Yin, Shiqi Zhao, Ruohai Ge, Ivan Cisneros, Ruijie Fu, Ji Zhang, Howie Choset, Sebastian Scherer

    Abstract: For long-term autonomy, most place recognition methods are mainly evaluated on simplified scenarios or simulated datasets, which cannot provide solid evidence to evaluate the readiness for current Simultaneous Localization and Map** (SLAM). In this paper, we present a long-term place recognition dataset for use in mobile localization under large-scale dynamic environments. This dataset includes… ▽ More

    Submitted 9 September, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

    Comments: 6 pages, 5 figures, Submitted for IJRR dataset paper

  44. arXiv:2205.08717  [pdf, other

    cs.LG cs.DS

    A Regression Approach to Learning-Augmented Online Algorithms

    Authors: Keerti Anand, Rong Ge, Amit Kumar, Debmalya Panigrahi

    Abstract: The emerging field of learning-augmented online algorithms uses ML techniques to predict future input parameters and thereby improve the performance of online algorithms. Since these parameters are, in general, real-valued functions, a natural approach is to use regression techniques to make these predictions. We introduce this approach in this paper, and explore it in the context of a general onl… ▽ More

    Submitted 24 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

  45. arXiv:2205.08715  [pdf, other

    cs.LG cs.DS

    Customizing ML Predictions for Online Algorithms

    Authors: Keerti Anand, Rong Ge, Debmalya Panigrahi

    Abstract: A popular line of recent research incorporates ML advice in the design of online algorithms to improve their performance in typical instances. These papers treat the ML algorithm as a black-box, and redesign online algorithms to take advantage of ML predictions. In this paper, we ask the complementary question: can we redesign ML algorithms to provide better predictions for online algorithms? We e… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

  46. arXiv:2205.03921  [pdf, ps, other

    cs.LG cs.DS

    Online Algorithms with Multiple Predictions

    Authors: Keerti Anand, Rong Ge, Amit Kumar, Debmalya Panigrahi

    Abstract: This paper studies online algorithms augmented with multiple machine-learned predictions. While online algorithms augmented with a single prediction have been extensively studied in recent years, the literature for the multiple predictions setting is sparse. In this paper, we give a generic algorithmic framework for online covering problems with multiple predictions that obtains an online solution… ▽ More

    Submitted 12 July, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: ICML 2022

  47. arXiv:2204.03163  [pdf, other

    eess.IV cs.CV

    Low-Dose CT Denoising via Sinogram Inner-Structure Transformer

    Authors: Liutao Yang, Zhongnian Li, Rongjun Ge, Junyong Zhao, Haipeng Si, Daoqiang Zhang

    Abstract: Low-Dose Computed Tomography (LDCT) technique, which reduces the radiation harm to human bodies, is now attracting increasing interest in the medical imaging field. As the image quality is degraded by low dose radiation, LDCT exams require specialized reconstruction methods or denoising algorithms. However, most of the recent effective methods overlook the inner-structure of the original projectio… ▽ More

    Submitted 18 April, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

  48. arXiv:2203.03539  [pdf, other

    cs.CL cs.LG stat.ML

    Understanding The Robustness of Self-supervised Learning Through Topic Modeling

    Authors: Ze** Luo, Shiyou Wu, Cindy Weng, Mo Zhou, Rong Ge

    Abstract: Self-supervised learning has significantly improved the performance of many NLP tasks. However, how can self-supervised learning discover useful representations, and why is it better than traditional approaches such as probabilistic models are still largely unknown. In this paper, we focus on the context of topic modeling and highlight a key advantage of self-supervised learning - when applied to… ▽ More

    Submitted 27 February, 2023; v1 submitted 2 February, 2022; originally announced March 2022.

    Comments: Accepted at ICLR 2023. Camera ready version

  49. arXiv:2112.09205  [pdf, other

    cs.CV

    AFDetV2: Rethinking the Necessity of the Second Stage for Object Detection from Point Clouds

    Authors: Yihan Hu, Zhuangzhuang Ding, Runzhou Ge, Wenxin Shao, Li Huang, Kun Li, Qiang Liu

    Abstract: There have been two streams in the 3D detection from point clouds: single-stage methods and two-stage methods. While the former is more computationally efficient, the latter usually provides better detection accuracy. By carefully examining the two-stage approaches, we have found that if appropriately designed, the first stage can produce accurate box regression. In this scenario, the second stage… ▽ More

    Submitted 18 July, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: AAAI 2022; 1st Place Solution for the Real-time 3D Detection and the Most Efficient Model of the Waymo Open Dataset Challenges 2021 (http://cvpr2021.wad.vision/)

  50. arXiv:2110.07647  [pdf, other

    cs.LG cs.AI

    Towards Understanding the Data Dependency of Mixup-style Training

    Authors: Muthu Chidambaram, Xiang Wang, Yuzheng Hu, Chenwei Wu, Rong Ge

    Abstract: In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training. In this paper, we investigate how these b… ▽ More

    Submitted 19 February, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 26 pages, 14 figures, Accepted to ICLR 2022 (Spotlight)