Skip to main content

Showing 1–50 of 100 results for author: Lyu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00203  [pdf, other

    cs.CV

    PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

    Authors: Yuxuan Sun, Yunlong Zhang, Yixuan Si, Chenglu Zhu, Zhongyi Shui, Kai Zhang, **gxiong Li, Xingheng Lyu, Tao Lin, Lin Yang

    Abstract: Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology imag… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 13 pages, 3 figures

  2. arXiv:2406.19568  [pdf, other

    cs.CV cs.AI

    What Matters in Detecting AI-Generated Videos like Sora?

    Authors: Chirui Chang, Zhengzhe Liu, Xiaoyang Lyu, Xiaojuan Qi

    Abstract: Recent advancements in diffusion-based video generation have showcased remarkable results, yet the gap between synthetic and real-world videos remains under-explored. In this study, we examine this gap from three fundamental perspectives: appearance, motion, and geometry, comparing real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion. To achieve this, we tr… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.15217  [pdf, other

    eess.SP cs.IT

    Rate-Splitting Multiple Access for Overloaded Multi-group Multicast: A First Experimental Study

    Authors: Xinze Lyu, Sundar Aditya, Bruno Clerckx

    Abstract: Multi-group multicast (MGM) is an increasingly important form of multi-user wireless communications with several potential applications, such as video streaming, federated learning, safety-critical vehicular communications, etc. Rate-Splitting Multiple Access (RSMA) is a powerful interference management technique that can, in principle, achieve higher data rates and greater fairness for all types… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE Transactions on Broadcasting

  4. arXiv:2406.13870  [pdf, other

    cs.CV

    Splatter a Video: Video Gaussian Representation for Versatile Processing

    Authors: Yang-Tian Sun, Yi-Hua Huang, Lin Ma, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi

    Abstract: Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due to the absence of 3D structure or rely on implicit 3D representations that are ill-suited for manipulation tasks. To address these challenges, we introduce a no… ▽ More

    Submitted 26 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2406.06207  [pdf, other

    cs.LG cs.CR

    Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning

    Authors: Xiaoting Lyu, Yufei Han, Wei Wang, **gkai Liu, Yongsheng Zhu, Guangquan Xu, Jiqiang Liu, Xiangliang Zhang

    Abstract: Federated Learning (FL) is a collaborative machine learning technique where multiple clients work together with a central server to train a global model without sharing their private data. However, the distribution shift across non-IID datasets of clients poses a challenge to this one-model-fits-all method hindering the ability of the global model to effectively adapt to each client's unique local… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Usenix Security 2024

  6. arXiv:2405.17984  [pdf, other

    cs.LG

    Cross-Context Backdoor Attacks against Graph Prompt Learning

    Authors: Xiaoting Lyu, Yufei Han, Wei Wang, Hangwei Qian, Ivor Tsang, Xiangliang Zhang

    Abstract: Graph Prompt Learning (GPL) bridges significant disparities between pretraining and downstream applications to alleviate the knowledge transfer bottleneck in real-world graph learning. While GPL offers superior effectiveness in graph knowledge transfer and computational efficiency, the security risks posed by backdoor poisoning effects embedded in pretrained models remain largely unexplored. Our s… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  7. arXiv:2405.15356  [pdf, other

    cs.CV

    Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

    Authors: Beitao Chen, Xinyu Lyu, Lianli Gao, **gkuan Song, Heng Tao Shen

    Abstract: Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information that appropri… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 10 pages. arXiv admin note: text overlap with arXiv:2311.16922 by other authors

  8. arXiv:2405.13923  [pdf, other

    cs.CL

    Why Not Transform Chat Large Language Models to Non-English?

    Authors: Xiang Geng, Ming Zhu, Jiahuan Li, Zhejian Lai, Wei Zou, Shuaijie She, Jiaxin Guo, Xiaofeng Zhao, Yinglu Li, Yuang Li, Chang Su, Yanqing Zhao, Xinglin Lyu, Min Zhang, Jiajun Chen, Hao Yang, Shujian Huang

    Abstract: The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized fo… ▽ More

    Submitted 31 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  9. arXiv:2405.12710  [pdf, other

    cs.CV

    Text-Video Retrieval with Global-Local Semantic Consistent Learning

    Authors: Haonan Zhang, Pengpeng Zeng, Lianli Gao, **gkuan Song, Yihang Duan, Xinyu Lyu, Hengtao Shen

    Abstract: Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, l… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 9 pages

  10. arXiv:2405.01142  [pdf, other

    cs.LG

    Sharp Bounds for Sequential Federated Learning on Heterogeneous Data

    Authors: Yipeng Li, Xinchen Lyu

    Abstract: There are two paradigms in Federated Learning (FL): parallel FL (PFL), where models are trained in a parallel manner across clients; and sequential FL (SFL), where models are trained in a sequential manner across clients. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. To resolve the theoretical dilemma of SFL, we establish sharp convergence guaran… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.03154

  11. arXiv:2404.11401  [pdf, other

    cs.CV

    RainyScape: Unsupervised Rainy Scene Reconstruction using Decoupled Neural Rendering

    Authors: Xianqiang Lyu, Hui Liu, Junhui Hou

    Abstract: We propose RainyScape, an unsupervised framework for reconstructing clean scenes from a collection of multi-view rainy images. RainyScape consists of two main modules: a neural rendering module and a rain-prediction module that incorporates a predictor network and a learnable latent embedding that captures the rain characteristics of the scene. Specifically, based on the spectral bias property of… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  12. arXiv:2404.00409  [pdf, other

    cs.CV cs.GR

    3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting

    Authors: Xiaoyang Lyu, Yang-Tian Sun, Yi-Hua Huang, Xiuzhe Wu, Ziyi Yang, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi

    Abstract: In this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR, that allows for accurate 3D reconstruction with intricate details while inheriting the high efficiency and rendering quality of 3DGS. The key insight is incorporating an implicit signed distance field (SDF) within 3D Gaussians to enable them to be aligned and jointly optimized. Firs… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  13. arXiv:2403.19314  [pdf, other

    cs.CV

    Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction

    Authors: Xiaoyang Lyu, Chirui Chang, Peng Dai, Yang-Tian Sun, Xiaojuan Qi

    Abstract: Scene reconstruction from multi-view images is a fundamental problem in computer vision and graphics. Recent neural implicit surface reconstruction methods have achieved high-quality results; however, editing and manipulating the 3D geometry of reconstructed scenes remains challenging due to the absence of naturally decomposed object entities and complex object/background compositions. In this pap… ▽ More

    Submitted 30 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, accepted by CVPR 2024

  14. arXiv:2403.16446  [pdf, other

    cs.CL

    Towards Automatic Evaluation for LLMs' Clinical Capabilities: Metric, Data, and Algorithm

    Authors: Lei Liu, Xiaoyan Yang, Fangzhou Li, Chenfei Chi, Yue Shen, Shiwei Lyu Ming Zhang, Xiaowei Ma, Xiangguo Lyu, Liya Ma, Zhiqiang Zhang, Wei Xue, Yiran Huang, **jie Gu

    Abstract: Large language models (LLMs) are gaining increasing interests to improve clinical efficiency for medical diagnosis, owing to their unprecedented performance in modelling natural language. Ensuring the safe and reliable clinical applications, the evaluation of LLMs indeed becomes critical for better mitigating the potential risks, e.g., hallucinations. However, current evaluation methods heavily re… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  15. arXiv:2403.10805  [pdf, other

    cs.SD cs.AI cs.CV cs.GR cs.HC eess.AS

    Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference

    Authors: Fan Zhang, Zhaohan Wang, Xin Lyu, Siyuan Zhao, Mengjian Li, Weidong Geng, Naye Ji, Hui Du, Fuxing Gao, Hao Wu, Shunman Li

    Abstract: Speech-driven gesture generation is an emerging field within virtual human creation. However, a significant challenge lies in accurately determining and processing the multitude of input features (such as acoustic, semantic, emotional, personality, and even subtle unknown features). Traditional approaches, reliant on various explicit feature inputs and complex multimodal processing, constrain the… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 12 pages,

  16. arXiv:2403.05895  [pdf, other

    cs.CV

    DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos

    Authors: Xiuzhe Wu, Xiaoyang Lyu, Qihao Huang, Yong Liu, Yang Wu, Ying Shan, Xiaojuan Qi

    Abstract: Although considerable advancements have been attained in self-supervised depth estimation from monocular videos, most existing methods often treat all objects in a video as static entities, which however violates the dynamic nature of real-world scenes and fails to model the geometry and motion of moving objects. In this paper, we propose a self-supervised method to jointly learn 3D motion and dep… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: 24 pages, 14 figures, Tech Report

  17. arXiv:2403.00028  [pdf, ps, other

    cs.CR cs.LG

    Lower Bounds for Differential Privacy Under Continual Observation and Online Threshold Queries

    Authors: Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, Uri Stemmer

    Abstract: One of the most basic problems for studying the "price of privacy over time" is the so called private counter problem, introduced by Dwork et al. (2010) and Chan et al. (2010). In this problem, we aim to track the number of events that occur over time, while hiding the existence of every single event. More specifically, in every time step $t\in[T]$ we learn (in an online fashion) that $Δ_t\geq 0$… ▽ More

    Submitted 17 April, 2024; v1 submitted 28 February, 2024; originally announced March 2024.

  18. arXiv:2402.15870  [pdf, other

    cs.CV

    Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting

    Authors: Ziyi Yang, Xinyu Gao, Yangtian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, Xiaogang **

    Abstract: The recent advancements in 3D Gaussian splatting (3D-GS) have not only facilitated real-time rendering through modern GPU rasterization pipelines but have also attained state-of-the-art rendering quality. Nevertheless, despite its exceptional rendering quality and performance on standard datasets, 3D-GS frequently encounters difficulties in accurately modeling specular and anisotropic components.… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  19. arXiv:2402.15200  [pdf, other

    cs.CL

    DeMPT: Decoding-enhanced Multi-phase Prompt Tuning for Making LLMs Be Better Context-aware Translators

    Authors: Xinglin Lyu, Junhui Li, Yanqing Zhao, Min Zhang, Daimeng Wei, Shimin Tao, Hao Yang, Min Zhang

    Abstract: Generally, the decoder-only large language models (LLMs) are adapted to context-aware neural machine translation (NMT) in a concatenating way, where LLMs take the concatenation of the source sentence (i.e., intra-sentence context) and the inter-sentence context as the input, and then to generate the target tokens sequentially. This adaptation strategy, i.e., concatenation mode, considers intra-sen… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: under reviewing

  20. arXiv:2402.15145  [pdf, ps, other

    cs.LG cs.DS

    The Cost of Parallelizing Boosting

    Authors: Xin Lyu, Hongxun Wu, Junzhao Yang

    Abstract: We study the cost of parallelizing weak-to-strong boosting algorithms for learning, following the recent work of Karbasi and Larsen. Our main results are two-fold: - First, we prove a tight lower bound, showing that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training. Specifically, let $γ$ be the weak learner's advantage over random guessing.… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: appeared in SODA 2024

  21. arXiv:2402.03908  [pdf, other

    cs.CV

    EscherNet: A Generative Model for Scalable View Synthesis

    Authors: Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher, Xiaojuan Qi, Andrew J. Davison

    Abstract: We introduce EscherNet, a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with a specialised camera positional encoding, allowing precise and continuous relative control of the camera transformation between an arbitrary number of reference and target views. EscherNet offers exceptional generality, flexibility, and scala… ▽ More

    Submitted 19 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: CVPR2024 Project Page: https://kxhit.github.io/EscherNet

  22. arXiv:2402.00159  [pdf, other

    cs.CL

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

    Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

  23. arXiv:2401.16355  [pdf, other

    cs.CV

    PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

    Authors: Yuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Dan Wan, Xiaoxiao Lan, Mengyue Zheng, **gxiong Li, Xinheng Lyu, Tao Lin, Lin Yang

    Abstract: The emergence of large multimodal models has unlocked remarkable potential in AI, particularly in pathology. However, the lack of specialized, high-quality benchmark impeded their development and precise evaluation. To address this, we introduce PathMMU, the largest and highest-quality expert-validated pathology benchmark for Large Multimodal Models (LMMs). It comprises 33,428 multimodal multi-cho… ▽ More

    Submitted 20 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 27 pages, 12 figures

  24. arXiv:2401.13205  [pdf, other

    cs.CV cs.AI

    Boosting the Transferability of Adversarial Examples via Local Mixup and Adaptive Step Size

    Authors: Junlin Liu, Xinchen Lyu

    Abstract: Adversarial examples are one critical security threat to various visual applications, where injected human-imperceptible perturbations can confuse the output.Generating transferable adversarial examples in the black-box setting is crucial but challenging in practice. Existing input-diversity-based methods adopt different image transformations, but may be inefficient due to insufficient input diver… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  25. arXiv:2401.08739  [pdf, other

    cs.CV cs.AI

    EgoGen: An Egocentric Synthetic Data Generator

    Authors: Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys, Siyu Tang

    Abstract: Understanding the world in first-person view is fundamental in Augmented Reality (AR). This immersive perspective brings dramatic visual changes and unique challenges compared to third-person views. Synthetic data has empowered third-person-view vision models, but its application to embodied egocentric perception tasks remains largely unexplored. A critical challenge lies in simulating natural hum… ▽ More

    Submitted 11 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted by CVPR 2024 (Oral). 23 pages, 17 figures. Project page: https://ego-gen.github.io/

  26. arXiv:2312.17425  [pdf, other

    cs.CV cs.AI

    ALF: Adaptive Label Finetuning for Scene Graph Generation

    Authors: Qishen Chen, Jianzhi Liu, Xinyu Lyu, Lianli Gao, Heng Tao Shen, **gkuan Song

    Abstract: Scene Graph Generation (SGG) endeavors to predict the relationships between subjects and objects in a given image. Nevertheless, the long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG. To address this issue, researchers focus on unbiased SGG and introduce data transfer methods to transfer coarse-grained predicates into fine… ▽ More

    Submitted 23 May, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  27. arXiv:2312.14937  [pdf, other

    cs.CV cs.GR

    SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

    Authors: Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi

    Abstract: Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse c… ▽ More

    Submitted 31 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: Code link: https://github.com/yihua7/SC-GS

  28. arXiv:2312.09785  [pdf, other

    cs.CL

    RJUA-QA: A Comprehensive QA Dataset for Urology

    Authors: Shiwei Lyu, Chenfei Chi, Hongbo Cai, Lei Shi, Xiaoyan Yang, Lei Liu, Xiang Chen, Deng Zhao, Zhiqiang Zhang, Xianguo Lyu, Ming Zhang, Fangzhou Li, Xiaowei Ma, Yue Shen, **jie Gu, Wei Xue, Yiran Huang

    Abstract: We introduce RJUA-QA, a novel medical dataset for question answering (QA) and reasoning with clinical evidence, contributing to bridge the gap between general large language models (LLMs) and medical-specific LLM applications. RJUA-QA is derived from realistic clinical scenarios and aims to facilitate LLMs in generating reliable diagnostic and advice. The dataset contains 2,132 curated Question-Co… ▽ More

    Submitted 7 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: An initial version

  29. arXiv:2312.02132  [pdf, other

    cs.LG cs.AI cs.CR cs.DS

    Hot PATE: Private Aggregation of Distributions for Diverse Task

    Authors: Edith Cohen, Benjamin Cohen-Wang, Xin Lyu, Jelani Nelson, Tamas Sarlos, Uri Stemmer

    Abstract: The Private Aggregation of Teacher Ensembles (PATE) framework is a versatile approach to privacy-preserving machine learning. In PATE, teacher models that are not privacy-preserving are trained on distinct portions of sensitive data. Privacy-preserving knowledge transfer to a student model is then facilitated by privately aggregating teachers' predictions on new examples. Employing PATE with gener… ▽ More

    Submitted 17 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  30. arXiv:2311.17319  [pdf, other

    cs.CE

    Microstructure reconstruction of 2D/3D random materials via diffusion-based deep generative models

    Authors: Xianrui Lyu, Xiaodan Ren

    Abstract: Microstructure reconstruction serves as a crucial foundation for establishing Process-Structure-Property (PSP) relationship in material design. Confronting the limitations of variational autoencoder and generative adversarial network within generative modeling, this study adopted the denoising diffusion probability model (DDPM) to learn the probability distribution of high-dimensional raw data and… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  31. arXiv:2311.03154  [pdf, other

    cs.LG

    Convergence Analysis of Sequential Federated Learning on Heterogeneous Data

    Authors: Yipeng Li, Xinchen Lyu

    Abstract: There are two categories of methods in Federated Learning (FL) for joint training across multiple clients: i) parallel FL (PFL), where clients train models in a parallel manner; and ii) sequential FL (SFL), where clients train models in a sequential manner. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. In this paper, we establish the convergence… ▽ More

    Submitted 8 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at NeurIPS 2023. arXiv admin note: text overlap with arXiv:2302.01633

  32. arXiv:2310.08070  [pdf, ps, other

    cs.LG cs.CC

    Tight Time-Space Lower Bounds for Constant-Pass Learning

    Authors: Xin Lyu, Avishay Tal, Hongxun Wu, Junzhao Yang

    Abstract: In his breakthrough paper, Raz showed that any parity learning algorithm requires either quadratic memory or an exponential number of samples [FOCS'16, JACM'19]. A line of work that followed extended this result to a large class of learning problems. Until recently, all these results considered learning in the streaming model, where each sample is drawn independently, and the learner is allowed a… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: To appear at FOCS 2023

  33. arXiv:2309.16247  [pdf, other

    eess.AS cs.SD

    PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

    Authors: Xiang Lyu, Yuhang Cao, Qing Wang, **g**g Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

    Abstract: Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts. However, SA-ASR poses unique challenges due to factors such as speaker overlap, speaker variability, background noise, and reverberation. In this study, we propose PP-MeT system, a real-world personal… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  34. arXiv:2309.16077  [pdf, other

    cs.RO cs.LG eess.SY

    Task-Oriented Koopman-Based Control with Contrastive Encoder

    Authors: Xubo Lyu, Hanyang Hu, Seth Siriya, Ye Pu, Mo Chen

    Abstract: We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator, and associated linear controller within an iterative loop. By prioritizing the task cost as the main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which, fo… ▽ More

    Submitted 1 November, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Comments: Accepted by the 7th Annual Conference on Robot Learning (CoRL), 2023 (oral spotlight)

  35. arXiv:2309.13559  [pdf, other

    cs.RO

    Swashplateless-elevon Actuation for a Dual-rotor Tail-sitter VTOL UAV

    Authors: Nan Chen, Fanze Kong, Haotian Li, Jiayuan Liu, Ziwei Ye, Wei Xu, Fangcheng Zhu, Ximin Lyu, Fu Zhang

    Abstract: In this paper, we propose a novel swashplateless-elevon actuation (SEA) for dual-rotor tail-sitter vertical takeoff and landing (VTOL) unmanned aerial vehicles (UAVs). In contrast to the conventional elevon actuation (CEA) which controls both pitch and yaw using elevons, the SEA adopts swashplateless mechanisms to generate an extra moment through motor speed modulation to control pitch and uses el… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: 8 pages, 13 figures

  36. arXiv:2309.04814  [pdf, other

    cs.CV

    Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video

    Authors: Xiuzhe Wu, Pengfei Hu, Yang Wu, Xiaoyang Lyu, Yan-Pei Cao, Ying Shan, Wenming Yang, Zhongqian Sun, Xiaojuan Qi

    Abstract: Synthesizing realistic videos according to a given speech is still an open challenge. Previous works have been plagued by issues such as inaccurate lip shape generation and poor image quality. The key reason is that only motions and appearances on limited facial areas (e.g., lip area) are mainly driven by the input speech. Therefore, directly learning a map** function from speech to the entire h… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  37. arXiv:2308.05404  [pdf, other

    cs.CV eess.IV

    Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network

    Authors: Xianqiang Lyu, Junhui Hou

    Abstract: This paper presents a novel and interpretable end-to-end learning framework, called the deep compensation unfolding network (DCUNet), for restoring light field (LF) images captured under low-light conditions. DCUNet is designed with a multi-stage architecture that mimics the optimization process of solving an inverse imaging problem in a data-driven fashion. The framework uses the intermediate enh… ▽ More

    Submitted 26 June, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

  38. arXiv:2308.05286  [pdf, other

    cs.CV

    Informative Scene Graph Generation via Debiasing

    Authors: Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen, **gkuan Song

    Abstract: Scene graph generation aims to detect visual relationship triplets, (subject, predicate, object). Due to biases in data, current models tend to predict common predicates, e.g. "on" and "at", instead of informative ones, e.g. "standing on" and "looking at". This tendency results in the loss of precise information and overall performance. If a model only uses "stone on road" rather than "stone block… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2108.13129

  39. arXiv:2308.05274   

    cs.CV

    Local-Global Information Interaction Debiasing for Dynamic Scene Graph Generation

    Authors: Xinyu Lyu, **gwei Liu, Yuyu Guo, Lianli Gao

    Abstract: The task of dynamic scene graph generation (DynSGG) aims to generate scene graphs for given videos, which involves modeling the spatial-temporal information in the video. However, due to the long-tailed distribution of samples in the dataset, previous DynSGG models fail to predict the tail predicates. We argue that this phenomenon is due to previous methods that only pay attention to the local spa… ▽ More

    Submitted 24 September, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

    Comments: The author has withdrawn this paper due to a critical definitional error in multi-task learning for dynamic SGG debiasing. This error aligned with the definition of dynamic SGG tasks, resulting in an unfair comparison with state-of-the-art (SOTA) methods, which in turn, hindered the ability to evaluate the paper's contributions

  40. arXiv:2308.04802  [pdf, other

    cs.CV

    Generalized Unbiased Scene Graph Generation

    Authors: Xinyu Lyu, Lianli Gao, Junlin Xie, Pengpeng Zeng, Yulu Tian, Jie Shao, Heng Tao Shen

    Abstract: Existing Unbiased Scene Graph Generation (USGG) methods only focus on addressing the predicate-level imbalance that high-frequency classes dominate predictions of rare ones, while overlooking the concept-level imbalance. Actually, even if predicates themselves are balanced, there is still a significant concept-imbalance within them due to the long-tailed distribution of contexts (i.e., subject-obj… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  41. arXiv:2307.03027  [pdf, other

    cs.LG cs.CL cs.IR

    Improving Retrieval-Augmented Large Language Models via Data Importance Learning

    Authors: Xiaozhong Lyu, Stefan Grafberger, Samantha Biegel, Shaopeng Wei, Meng Cao, Sebastian Schelter, Ce Zhang

    Abstract: Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the data quality of their underlying retrieval corpus. In this paper, we propose an algorithm based on multilinear extension for evaluating the data importance of ret… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  42. Probabilistic-based Feature Embedding of 4-D Light Fields for Compressive Imaging and Denoising

    Authors: Xianqiang Lyu, Junhui Hou

    Abstract: The high-dimensional nature of the 4-D light field (LF) poses great challenges in achieving efficient and effective feature embedding, that severely impacts the performance of downstream tasks. To tackle this crucial issue, in contrast to existing methods with empirically-designed architectures, we propose a probabilistic-based feature embedding (PFE), which learns a feature embedding architecture… ▽ More

    Submitted 10 January, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Journal ref: International Journal of Computer Vision (2023)

  43. arXiv:2305.14876  [pdf, other

    cs.LG cs.CR

    Reconstructive Neuron Pruning for Backdoor Defense

    Authors: Yige Li, Xixiang Lyu, Xingjun Ma, Nodens Koren, Lingjuan Lyu, Bo Li, Yu-Gang Jiang

    Abstract: Deep neural networks (DNNs) have been found to be vulnerable to backdoor attacks, raising security concerns about their deployment in mission-critical applications. While existing defense methods have demonstrated promising results, it is still not clear how to effectively remove backdoor-associated neurons in backdoored DNNs. In this paper, we propose a novel defense called \emph{Reconstructive N… ▽ More

    Submitted 8 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted by ICML23

  44. arXiv:2305.14251  [pdf, other

    cs.CL cs.AI cs.LG

    FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

    Authors: Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi

    Abstract: Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly. In this paper, we introduce FACTSCORE, a new evaluation that breaks a generation into a series of… ▽ More

    Submitted 11 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 25 pages; 7 figures. Published as a main conference paper at EMNLP 2023. Code available at https://github.com/shmsw25/FActScore

  45. arXiv:2305.03240  [pdf, ps, other

    cs.DS

    Sum-of-Local-Effects Data Structures for Separable Graphs

    Authors: Xing Lyu, Travis Gagie, Meng He, Yakov Nekrich, Norbert Zeh

    Abstract: It is not difficult to think of applications that can be modelled as graph problems in which placing some facility or commodity at a vertex has some positive or negative effect on the values of all the vertices out to some distance, and we want to be able to calculate quickly the cumulative effect on any vertex's value at any time or the list of the most beneficial or most detrimential effects on… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  46. arXiv:2304.12652  [pdf, other

    cs.CV

    Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur

    Authors: Peng Dai, Yinda Zhang, Xin Yu, Xiaoyang Lyu, Xiaojuan Qi

    Abstract: Rendering novel view images is highly desirable for many applications. Despite recent progress, it remains challenging to render high-fidelity and view-consistent novel views of large-scale scenes from in-the-wild images with inevitable artifacts (e.g., motion blur). To this end, we develop a hybrid neural rendering model that makes image-based representation and neural 3D representation join forc… ▽ More

    Submitted 9 July, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

  47. arXiv:2303.09152  [pdf, other

    cs.CV

    Learning a Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation

    Authors: Xiaoyang Lyu, Peng Dai, Zizhang Li, Dongyu Yan, Yi Lin, Yifan Peng, Xiaojuan Qi

    Abstract: Implicit neural rendering, which uses signed distance function (SDF) representation with geometric priors (such as depth or surface normal), has led to impressive progress in the surface reconstruction of large-scale scenes. However, applying this method to reconstruct a room-level scene from images may miss structures in low-intensity areas or small and thin objects. We conducted experiments on t… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  48. arXiv:2303.08605  [pdf, other

    cs.CV

    RICO: Regularizing the Unobservable for Indoor Compositional Reconstruction

    Authors: Zizhang Li, Xiaoyang Lyu, Yuanyuan Ding, Mengmeng Wang, Yiyi Liao, Yong Liu

    Abstract: Recently, neural implicit surfaces have become popular for multi-view reconstruction. To facilitate practical applications like scene editing and manipulation, some works extend the framework with semantic masks input for the object-compositional reconstruction rather than the holistic perspective. Though achieving plausible disentanglement, the performance drops significantly when processing the… ▽ More

    Submitted 29 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

  49. arXiv:2303.07096  [pdf, other

    cs.CV

    Prototype-based Embedding Network for Scene Graph Generation

    Authors: Chaofan Zheng, Xinyu Lyu, Lianli Gao, Bo Dai, **gkuan Song

    Abstract: Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. However, due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category, e.g., "man-eating-pizza, giraffe-eating-leaf", and the severe inter-class similarity between different classes,… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023

  50. arXiv:2302.14363  [pdf, other

    cs.RO cs.CV

    Efficient Implicit Neural Reconstruction Using LiDAR

    Authors: Dongyu Yan, Xiaoyang Lyu, Jieqi Shi, Yi Lin

    Abstract: Modeling scene geometry using implicit neural representation has revealed its advantages in accuracy, flexibility, and low memory usage. Previous approaches have demonstrated impressive results using color or depth images but still have difficulty handling poor light conditions and large-scale scenes. Methods taking global point cloud as input require accurate registration and ground truth coordin… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: 6+2 pages, 8 figures, Accepted for publication at IEEE International Conference on Robotics and Automation (ICRA) 2023