Skip to main content

Showing 1–50 of 803 results for author: Xingyu

.
  1. arXiv:2407.08931  [pdf, other

    cs.CV

    Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

    Authors: Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

    Abstract: Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous l… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV 2024

  2. arXiv:2407.06042  [pdf, ps, other

    eess.SP cs.IT

    Near-Optimal MIMO Detection Using Gradient-Based MCMC in Discrete Spaces

    Authors: Xingyu Zhou, Le Liang, **g Zhang, Chao-Kai Wen, Shi **

    Abstract: The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  3. arXiv:2407.05286  [pdf, other

    cs.LG math.OC

    Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

    Authors: Xiaokang Pan, Xingyu Li, ** Liu, Tao Sun, Kai Sun, Lixing Chen, Zhe Qu

    Abstract: STOchastic Recursive Momentum (STORM)-based algorithms have been widely developed to solve one to $K$-level ($K \geq 3$) stochastic optimization problems. Specifically, they use estimators to mitigate the biased gradient issue and achieve near-optimal convergence results. However, there is relatively little work on understanding their generalization performance, particularly evident during the tra… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  4. arXiv:2407.05232  [pdf, other

    cs.LG

    PAPM: A Physics-aware Proxy Model for Process Systems

    Authors: Pengwei Liu, Zhongkai Hao, Xingyu Ren, Hangjie Yuan, Jiayang Ren, Dong Ni

    Abstract: In the context of proxy modeling for process systems, traditional data-driven deep learning approaches frequently encounter significant challenges, such as substantial training costs induced by large amounts of data, and limited generalization capabilities. As a promising alternative, physics-aware models incorporate partial physics knowledge to ameliorate these challenges. Although demonstrating… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  5. arXiv:2407.04480  [pdf, other

    cs.LG math.OC

    LoCo: Low-Bit Communication Adaptor for Large-scale Model Training

    Authors: Xingyu Xie, Zhijie Lin, Kim-Chuan Toh, Pan Zhou

    Abstract: To efficiently train large-scale models, low-bit gradient communication compresses full-precision gradients on local GPU nodes into low-precision ones for higher gradient synchronization efficiency among GPU nodes. However, it often degrades training quality due to compression information loss. To address this, we propose the Low-bit Communication Adaptor (LoCo), which compensates gradients on loc… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  6. arXiv:2407.03233  [pdf, other

    math.OC

    Asynchronous Parallel Policy Gradient Methods for the Linear Quadratic Regulator

    Authors: Xingyu Sha, Feiran Zhao, Keyou You

    Abstract: Learning policies in an asynchronous parallel way is essential to the numerous successes of RL for solving large-scale problems. However, their convergence performance is still not rigorously evaluated. To this end, we adopt the asynchronous parallel zero-order policy gradient (AZOPG) method to solve the continuous-time linear quadratic regulation problem. Specifically, as in the celebrated A3C al… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: This article was submitted to IEEE TAC on Jan. 10, 2024

  7. arXiv:2407.03096  [pdf, ps, other

    quant-ph cond-mat.stat-mech

    Collective advantages in qubit reset: effect of coherent qubits

    Authors: Yue Liu, Chenlong Huang, Xingyu Zhang, Dahai He

    Abstract: The Landauer principle sets a lower bound on the thermodynamic cost of qubit reset, which is only attainable for the quasistatic process. In this Letter, we explore the collective advantage of qubit reset of coherent qubits in three aspects. First, for the quasistatic process, the thermodynamic cost of collective reset is remarkably lower than parallel reset because of the reduced Hilbert space di… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures

  8. arXiv:2407.02657  [pdf, other

    cs.LG stat.ME

    Large Scale Hierarchical Industrial Demand Time-Series Forecasting incorporating Sparsity

    Authors: Harshavardhan Kamarthi, Aditya B. Sasanur, Xinjie Tong, Xingyu Zhou, James Peters, Joe Czyzyk, B. Aditya Prakash

    Abstract: Hierarchical time-series forecasting (HTSF) is an important problem for many real-world business applications where the goal is to simultaneously forecast multiple time-series that are related to each other via a hierarchical relation. Recent works, however, do not address two important challenges that are typically observed in many demand forecasting applications at large companies. First, many t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at KDD 2024

  9. arXiv:2407.01893  [pdf, other

    cs.HC

    CausalPrism: A Visual Analytics Approach for Subgroup-based Causal Heterogeneity Exploration

    Authors: Jiehui Zhou, Xumeng Wang, Wong Kam-Kwai, Wei Zhang, Xingyu Liu, Juntian Zhang, Minfeng Zhu, Wei Chen

    Abstract: In causal inference, estimating Heterogeneous Treatment Effects (HTEs) from observational data is critical for understanding how different subgroups respond to treatments, with broad applications such as precision medicine and targeted advertising. However, existing work on HTE, subgroup discovery, and causal visualization is insufficient to address two challenges: first, the sheer number of poten… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 12 pages, 7 figures

  10. arXiv:2407.01445  [pdf, other

    cs.LG cs.CV

    FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources

    Authors: Xiyuan Wei, Fanjiang Ye, Ori Yonay, Xingyu Chen, Baixi Sun, Dingwen Tao, Tianbao Yang

    Abstract: Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstra… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 23 pages

  11. arXiv:2407.01004  [pdf, other

    cs.LG stat.ME

    CURLS: Causal Rule Learning for Subgroups with Significant Treatment Effect

    Authors: Jiehui Zhou, Linxiao Yang, Xingyu Liu, Xinyue Gu, Liang Sun, Wei Chen

    Abstract: In causal inference, estimating heterogeneous treatment effects (HTE) is critical for identifying how different subgroups respond to interventions, with broad applications in fields such as precision medicine and personalized advertising. Although HTE estimation methods aim to improve accuracy, how to provide explicit subgroup descriptions remains unclear, hindering data interpretation and strateg… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures

  12. arXiv:2406.16605  [pdf, other

    cs.CL cs.AI cs.LG stat.ME

    CLEAR: Can Language Models Really Understand Causal Graphs?

    Authors: Sirui Chen, Mengying Xu, Kun Wang, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Chaochao Lu

    Abstract: Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we devel… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  13. arXiv:2406.15720  [pdf, other

    cs.CL

    Scaling Laws for Fact Memorization of Large Language Models

    Authors: Xingyu Lu, Xiaonan Li, Qinyuan Cheng, Kai Ding, Xuan**g Huang, Xipeng Qiu

    Abstract: Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law r… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  14. arXiv:2406.14401  [pdf, other

    cs.LG cs.AI

    Fair Streaming Feature Selection

    Authors: Zhangling Duan, Tianci Li, Xingyu Wu, Zhaolong Ling, **gye Yang, Zhaohong Jia

    Abstract: Streaming feature selection techniques have become essential in processing real-time data streams, as they facilitate the identification of the most relevant attributes from continuously updating information. Despite their performance, current algorithms to streaming feature selection frequently fall short in managing biases and avoiding discrimination that could be perpetuated by sensitive attrib… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 30 pages, 10 figures

  15. arXiv:2406.14359  [pdf, other

    cs.NE

    Learning to Transfer for Evolutionary Multitasking

    Authors: Sheng-Hao Wu, Yuxiao Huang, Xingyu Wu, Liang Feng, Zhi-Hui Zhan, Kay Chen Tan

    Abstract: Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited numbe… ▽ More

    Submitted 22 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review

  16. arXiv:2406.13149  [pdf, other

    cs.CV

    High-Fidelity Facial Albedo Estimation via Texture Quantization

    Authors: Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, Jiankang Deng

    Abstract: Recent 3D face reconstruction methods have made significant progress in shape estimation, but high-fidelity facial albedo reconstruction remains challenging. Existing methods depend on expensive light-stage captured data to learn facial albedo maps. However, a lack of diversity in subjects limits their ability to recover high-fidelity results. In this paper, we present a novel facial albedo recons… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  17. arXiv:2406.11243  [pdf, other

    cs.CL cs.AI

    FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation

    Authors: Bangzheng Li, Ben Zhou, Xingyu Fu, Fei Wang, Dan Roth, Muhao Chen

    Abstract: Language models have shown impressive in-context-learning capabilities, which allow them to benefit from input prompts and perform better on downstream end tasks. Existing works investigate the mechanisms behind this observation, and propose label-agnostic prompt metrics that can better estimate end-task performances. One popular approach is using perplexity as a way to measure models' familiarity… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  18. arXiv:2406.10189  [pdf, ps, other

    math.DG math.MG

    Topological rigidity of small RCD(K,N) spaces with maximal rank

    Authors: Sergio Zamora, Xingyu Zhu

    Abstract: For a polycyclic group $Λ$, rank$(Λ)$ is defined as the number of $\mathbb{Z}$ factors in a polycyclic decomposition of $Λ$. For a finitely generated group $G$, rank$(G)$ is defined as the infimum of rank$(Λ)$ among finite index polycyclic subgroups $Λ\leq G$. For a compact RCD$(K,N)$ space $(X,\mathsf{d}, \mathfrak{m} )$ with diam$(X) \leq \varepsilon (K,N)$, the rank of $π_1(X)$ is at most… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Report number: MPIM-Bonn-2024 MSC Class: 53C23; 53C21

  19. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: typos corrected, references added, Project Page: https://muirbench.github.io/

  20. arXiv:2406.09403  [pdf, other

    cs.CV cs.CL

    Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

    Authors: Yushi Hu, Weijia Shi, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna

    Abstract: Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In t… ▽ More

    Submitted 10 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project and codes url: https://visualsketchpad.github.io/

  21. arXiv:2406.08948  [pdf, other

    cond-mat.str-el cond-mat.quant-gas quant-ph

    Validity of the Lieb-Schultz-Mattis Theorem in Long-Range Interacting Systems

    Authors: Yi-Neng Zhou, Xingyu Li

    Abstract: The Lieb-Schultz-Mattis (LSM) theorem asserts that microscopic details of the system can impose non-trivial constraints on the system's low-energy properties. While traditionally applied to short-range interaction systems, where locality ensures a vanishing spectral gap in large system size limit, the impact of long-range interactions on the LSM theorem remains an open question. Long-range interac… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 6 pages, 2 figures

  22. arXiv:2406.07546  [pdf, other

    cs.CV cs.AI cs.CL

    Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

    Authors: Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth

    Abstract: We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that fit commonsense in real life, which we call Commonsense-T2I. Given two adversarial text prompts containing an identical set of action words with minor differences, such as "a lightbulb without electricity" v.s. "a lightbulb with electricity", we evaluate whether T2I model… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Text-to-Image Generation, Commonsense, Project Url: https://zeyofu.github.io/CommonsenseT2I/

  23. arXiv:2406.07411  [pdf, other

    cs.SE cs.CL

    VersiCode: Towards Version-controllable Code Generation

    Authors: Tongtong Wu, Weigang Wu, Xingyu Wang, Kang Xu, Suyu Ma, Bo Jiang, ** Yang, Zhenchang Xing, Yuan-Fang Li, Gholamreza Haffari

    Abstract: Significant research has focused on improving the performance of large language model on code-related tasks due to their practical importance. Although performance is typically evaluated using public benchmark datasets, the existing datasets do not account for the concept of \emph{version}, which is crucial in professional software development. In this paper, we introduce VersiCode, the first comp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  24. arXiv:2406.03274  [pdf, other

    eess.AS cs.AI cs.SD

    Enhancing CTC-based speech recognition with diverse modeling units

    Authors: Shiyi Han, Zhihong Lei, Mingbin Xu, Xingyu Na, Zhen Huang

    Abstract: In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer. On top of E2E systems, researchers have achieved substantial accuracy improvement by rescoring E2E model's N-best hypotheses with a phoneme-based model. This raises an interesting question about where the improvem… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  25. arXiv:2406.01003  [pdf, other

    cs.CV

    Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras

    Authors: Lingen Li, Mingde Yao, Xingyu Meng, Muquan Yu, Tianfan Xue, **wei Gu

    Abstract: Modern end-to-end image signal processors (ISPs) can learn complex map**s from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, develo** and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  26. arXiv:2406.00440  [pdf, other

    cs.CV

    Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

    Authors: Xuanchen Li, Yuhao Cheng, Xingyu Ren, Haozhe Jia, Di Xu, Wenhan Zhu, Yichao Yan

    Abstract: 4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant… ▽ More

    Submitted 1 July, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  27. arXiv:2405.21013  [pdf, other

    cs.CV

    StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

    Authors: Pengyuan Lyu, Yulin Li, Hao Zhou, Weihong Ma, Xingyu Wan, Qunyi Xie, Liang Wu, Chengquan Zhang, Kun Yao, Errui Ding, **gdong Wang

    Abstract: Text-rich images have significant and extensive value, deeply integrated into various aspects of human life. Notably, both visual cues and linguistic symbols in text-rich images play crucial roles in information transmission but are accompanied by diverse challenges. Therefore, the efficient and effective understanding of text-rich images is a crucial litmus test for the capability of Vision-Langu… ▽ More

    Submitted 4 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  28. arXiv:2405.19765  [pdf, other

    cs.CV cs.AI

    Towards Unified Multi-granularity Text Detection with Interactive Attention

    Authors: Xingyu Wan, Chengquan Zhang, Pengyuan Lyu, Sen Fan, Zihan Ni, Kun Yao, Errui Ding, **gdong Wang

    Abstract: Existing OCR engines or document image analysis systems typically rely on training separate models for text detection in varying scenarios and granularities, leading to significant computational complexity and resource demands. In this paper, we introduce "Detect Any Text" (DAT), an advanced paradigm that seamlessly unifies scene text detection, layout analysis, and document page detection into a… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  29. arXiv:2405.19568  [pdf, other

    cs.CV

    Organizing Background to Explore Latent Classes for Incremental Few-shot Semantic Segmentation

    Authors: Lianlei Shan, Wenzhang Zhou, Wei Li, Xingyu Ding

    Abstract: The goal of incremental Few-shot Semantic Segmentation (iFSS) is to extend pre-trained segmentation models to new classes via few annotated images without access to old training data. During incrementally learning novel classes, the data distribution of old classes will be destroyed, leading to catastrophic forgetting. Meanwhile, the novel classes have only few samples, making models impossible to… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 10 pages, 5 figures

  30. arXiv:2405.18663  [pdf, other

    cs.AI

    Lifelong Learning and Selective Forgetting via Contrastive Strategy

    Authors: Lianlei Shan, Wenzhang Zhou, Wei Li, Xingyu Ding

    Abstract: Lifelong learning aims to train a model with good performance for new tasks while retaining the capacity of previous tasks. However, some practical scenarios require the system to forget undesirable knowledge due to privacy issues, which is called selective forgetting. The joint task of the two is dubbed Learning with Selective Forgetting (LSF). In this paper, we propose a new framework based on c… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 5 figure

  31. arXiv:2405.17776  [pdf, other

    cs.LG

    The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention

    Authors: Xingyu Ding, Lianlei Shan, Guiqin Zhao, Meiqi Wu, Wenzhang Zhou, Wei Li

    Abstract: Deep learning-based information processing consumes long time and requires huge computing resources, especially for dense prediction tasks which require an output for each pixel, like semantic segmentation and salient object detection. There are mainly two challenges for quantization of dense prediction tasks. Firstly, directly applying the upsampling operation that dense prediction tasks require… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 30 pages, 6 figures

  32. arXiv:2405.16778  [pdf, other

    cond-mat.supr-con

    Unusual switch from low-temperature T-quadratic resistivity in the underdoped pseudogap phase of cuprate superconductors to low-temperature T-linear resistivity in the overdoped strange-metal phase

    Authors: Xingyu Ma, Minghuan Zeng, Huaiming Guo, Shi** Feng

    Abstract: The transport experiments demonstrate a dramatic switch from the low-temperature T-linear resistivity in the overdoped strange-metal phase to the T-quadratic resistivity in the underdoped pseudogap phase of cuprate superconductors, however, a consensus on the origin of this switch is still lacking. Here the low-temperature resistivity in the underdoped pseudogap phase of cuprate superconductors is… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  33. arXiv:2405.16455  [pdf, other

    stat.ML cs.LG stat.ME

    On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization

    Authors: Jiancong Xiao, Ziniu Li, Xingyu Xie, Emily Getzen, Cong Fang, Qi Long, Weijie J. Su

    Abstract: Accurately aligning large language models (LLMs) with human preferences is crucial for informing fair, economically sound, and statistically efficient decision-making processes. However, we argue that reinforcement learning from human feedback (RLHF) -- the predominant approach for aligning LLMs with human preferences through a reward model -- suffers from an inherent algorithmic bias due to its K… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  34. arXiv:2405.15914  [pdf, other

    cs.CV

    ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching

    Authors: Yumin Zhang, Xingyu Miao, Haoran Duan, Bo Wei, Tejal Shah, Yang Long, Rajiv Ranjan

    Abstract: Text-to-3D content creation is a rapidly evolving research area. Given the scarcity of 3D data, current approaches often adapt pre-trained 2D diffusion models for 3D synthesis. Among these approaches, Score Distillation Sampling (SDS) has been widely adopted. However, the issue of over-smoothing poses a significant limitation on the high-fidelity generation of 3D models. To address this challenge,… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  35. arXiv:2405.15682  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Road Less Scheduled

    Authors: Aaron Defazio, Xingyu, Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

    Abstract: Existing learning rate schedules that do not require specification of the optimization stop** step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stop** time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from c… ▽ More

    Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  36. arXiv:2405.15289  [pdf, other

    cs.CV

    Learning Invariant Causal Mechanism from Vision-Language Models

    Authors: Zeen Song, Siyu Zhao, Xingyu Zhang, Jiangmeng Li, Changwen Zheng, Wenwen Qiang

    Abstract: Pre-trained large-scale models have become a major research focus, but their effectiveness is limited in real-world applications due to diverse data distributions. In contrast, humans excel at decision-making across various domains by learning reusable knowledge that remains invariant despite environmental changes in a complex world. Although CLIP, as a successful vision-language pre-trained model… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  37. arXiv:2405.11792  [pdf, other

    eess.AS

    Source Localization by Multidimensional Steered Response Power Map** with Sparse Bayesian Learning

    Authors: Wei-Ting Lai, Lachlan Birnie, Xingyu Chen, Amy Bastine, Thushara D. Abhayapala, Prasanga N. Samarasinghe

    Abstract: We propose an advance Steered Response Power (SRP) method for localizing multiple sources. While conventional SRP performs well in adverse conditions, it remains to struggle in scenarios with closely neighboring sources, resulting in ambiguous SRP maps. We address this issue by applying sparsity optimization in SRP to obtain high-resolution maps. Our approach represents SRP maps as multidimensiona… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  38. arXiv:2405.11349  [pdf, other

    cs.LG

    Unlock the Power of Algorithm Features: A Generalization Analysis for Algorithm Selection

    Authors: Xingyu Wu, Yan Zhong, Jibin Wu, Yuxiao Huang, Sheng-hao Wu, Kay Chen Tan

    Abstract: In the algorithm selection research, the discussion surrounding algorithm features has been significantly overshadowed by the emphasis on problem features. Although a few empirical studies have yielded evidence regarding the effectiveness of algorithm features, the potential benefits of incorporating algorithm features into algorithm selection models and their suitability for different scenarios r… ▽ More

    Submitted 3 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  39. arXiv:2405.11252  [pdf, other

    cs.CV

    Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

    Authors: Xingyu Miao, Haoran Duan, Varun Ojha, Jun Song, Tejal Shah, Yang Long, Rajiv Ranjan

    Abstract: In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversi… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  40. arXiv:2405.10022  [pdf, other

    eess.AS cs.SD

    Monaural speech enhancement on drone via Adapter based transfer learning

    Authors: Xingyu Chen, Hanwen Bi, Wei-Ting Lai, Fei Ma

    Abstract: Monaural Speech enhancement on drones is challenging because the ego-noise from the rotating motors and propellers leads to extremely low signal-to-noise ratios at onboard microphones. Although recent masking-based deep neural network methods excel in monaural speech enhancement, they struggle in the challenging drone noise scenario. Furthermore, existing drone noise datasets are limited, causing… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  41. arXiv:2405.09492  [pdf, other

    cs.LG

    MGSER-SAM: Memory-Guided Soft Experience Replay with Sharpness-Aware Optimization for Enhanced Continual Learning

    Authors: Xingyu Li, Bo Tang

    Abstract: Deep neural networks suffer from the catastrophic forgetting problem in the field of continual learning (CL). To address this challenge, we propose MGSER-SAM, a novel memory replay-based algorithm specifically engineered to enhance the generalization capabilities of CL models. We first intergrate the SAM optimizer, a component designed for optimizing flatness, which seamlessly fits into well-known… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures

  42. arXiv:2405.09312  [pdf, ps, other

    cs.LG

    Agnostic Active Learning of Single Index Models with Linear Sample Complexity

    Authors: Aarshvi Gajjar, Wai Ming Tai, Xingyu Xu, Chinmay Hegde, Yi Li, Christopher Musco

    Abstract: We study active learning methods for single index models of the form $F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf x,\mathbf w} \in \mathbb{R}^d$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientif… ▽ More

    Submitted 9 July, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  43. arXiv:2405.08340  [pdf, other

    cs.CR cs.CV

    Achieving Resolution-Agnostic DNN-based Image Watermarking:A Novel Perspective of Implicit Neural Representation

    Authors: Yuchen Wang, Xingyu Zhu, Guanhui Ye, Shiyao Zhang, Xuetao Wei

    Abstract: DNN-based watermarking methods are rapidly develo** and delivering impressive performances. Recent advances achieve resolution-agnostic image watermarking by reducing the variant resolution watermarking problem to a fixed resolution watermarking problem. However, such a reduction process can potentially introduce artifacts and low robustness. To address this issue, we propose the first, to the b… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  44. arXiv:2405.07801  [pdf, other

    cs.CV

    Deep Learning-Based Object Pose Estimation: A Comprehensive Survey

    Authors: Jian Liu, Wei Sun, Hui Yang, Zhiwen Zeng, Chongpei Liu, ** Zheng, Xingyu Liu, Hossein Rahmani, Nicu Sebe, Ajmal Mian

    Abstract: Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependen… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 27 pages, 7 figures

  45. arXiv:2405.07059  [pdf, other

    math.NA

    Numerical Analysis of Finite Dimensional Approximations in Finite Temperature DFT

    Authors: Ge Xu, Huajie Chen, Xingyu Gao

    Abstract: In this paper, we study numerical approximations of the ground states in finite temperature density functional theory. We formulate the problem with respect to the density matrices and justify the convergence of the finite dimensional approximations. Moreover, we provide an optimal a priori error estimate under some mild assumptions and present some numerical experiments to support the theory.

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 20 pages, 6 figures

  46. arXiv:2405.06784  [pdf, other

    cs.LG

    Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare

    Authors: Xingyu Li, Lu Peng, Yu** Wang, Weihua Zhang

    Abstract: This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) for advancing biomedical research. Foundation models such as ChatGPT, LLaMa, and CLIP, which are trained on vast datasets through methods including unsupervised pretraining, self-supervised learning, instructed fine-tuning, and reinforceme… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 42 pages

  47. arXiv:2405.05993  [pdf

    cs.LG cs.AI

    Precision Rehabilitation for Patients Post-Stroke based on Electronic Health Records and Machine Learning

    Authors: Fengyi Gao, Xingyu Zhang, Sonish Sivarajkumar, Parker Denny, Bayan Aldhahwani, Shyam Visweswaran, Ryan Shi, William Hogan, Allyn Bove, Yanshan Wang

    Abstract: In this study, we utilized statistical analysis and machine learning methods to examine whether rehabilitation exercises can improve patients post-stroke functional abilities, as well as forecast the improvement in functional abilities. Our dataset is patients' rehabilitation exercises and demographic information recorded in the unstructured electronic health records (EHRs) data and free-text reha… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  48. arXiv:2405.04861  [pdf, other

    cs.SE

    Insights into Deep Learning Refactoring: Bridging the Gap Between Practices and Expectations

    Authors: SiQi Wang, Xing Hu, Bei Wang, WenXin Yao, Xin Xia, XingYu Wang

    Abstract: With the rapid development of deep learning, the implementation of intricate algorithms and substantial data processing have become standard elements of deep learning projects. As a result, the code has become progressively complex as the software evolves, which is difficult to maintain and understand. Existing studies have investigated the impact of refactoring on software quality within traditio… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 24 pages, 18 figures

  49. arXiv:2405.04782  [pdf, other

    cs.CV

    Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection

    Authors: Zhaoxiang Zhang, Hanqiu Deng, **an Bao, Xingyu Li

    Abstract: Image Anomaly Detection has been a challenging task in Computer Vision field. The advent of Vision-Language models, particularly the rise of CLIP-based frameworks, has opened new avenues for zero-shot anomaly detection. Recent studies have explored the use of CLIP by aligning images with normal and prompt descriptions. However, the exclusive dependence on textual guidance often falls short, highli… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  50. arXiv:2405.03534  [pdf, other

    cs.RO cs.AI cs.LG cs.NE

    Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer

    Authors: Xingyu Liu, Deepak Pathak, Ding Zhao

    Abstract: We investigate the problem of transferring an expert policy from a source robot to multiple different robots. To solve this problem, we propose a method named $Meta$-$Evolve$ that uses continuous robot evolution to efficiently transfer the policy to each target robot through a set of tree-structured evolutionary robot sequences. The robot evolution tree allows the robot evolution paths to be share… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: ICLR 2024