Skip to main content

Showing 1–50 of 78 results for author: Du, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00367  [pdf, other

    cs.CV

    SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix

    Authors: Peng Dai, Feitong Tan, Qiangeng Xu, David Futschik, Ruofei Du, Sean Fanello, Xiaojuan Qi, Yinda Zhang

    Abstract: Video generation models have demonstrated great capabilities of producing impressive monocular videos, however, the generation of 3D stereoscopic video remains under-explored. We propose a pose-free and training-free approach for generating 3D stereoscopic videos using an off-the-shelf monocular video generation model. Our method warps a generated monocular video into camera views on stereoscopic… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 3D stereoscopic video generation, video diffusion, inpainting

  2. arXiv:2406.18583  [pdf, other

    cs.CV cs.LG

    Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

    Authors: Le Zhuo, Ruoyi Du, Han Xiao, Yangguang Li, Dongyang Liu, Rongjie Huang, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao

    Abstract: Lumina-T2X is a nascent family of Flow-based Large Diffusion Transformers that establishes a unified framework for transforming noise into various modalities, such as images and videos, conditioned on text instructions. Despite its promising capabilities, Lumina-T2X still encounters challenges including training instability, slow inference, and extrapolation artifacts. In this paper, we present Lu… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  3. arXiv:2406.06295  [pdf, other

    cs.SD eess.AS

    Zero-Shot Audio Captioning Using Soft and Hard Prompts

    Authors: Yiming Zhang, Xuenan Xu, Ruoyi Du, Haohe Liu, Yuan Dong, Zheng-Hua Tan, Wenwu Wang, Zhanyu Ma

    Abstract: In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these model… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing

  4. arXiv:2406.00921  [pdf, other

    cs.SE

    Towards Effective Detection of Ponzi schemes on Ethereum with Contract Runtime Behavior Graph

    Authors: Ruichao Liang, **g Chen, Cong Wu, Kun He, Yueming Wu, Weisong Sun, Ruiying Du, Qingchuan Zhao, Yang Liu

    Abstract: Ponzi schemes, a form of scam, have been discovered in Ethereum smart contracts in recent years, causing massive financial losses. Existing detection methods primarily focus on rule-based approaches and machine learning techniques that utilize static information as features. However, these methods have significant limitations. Rule-based approaches rely on pre-defined rules with limited capabiliti… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Submitted to ACM Transactions on Software Engineering and Methodology

  5. arXiv:2405.17100  [pdf, other

    cs.CR cs.SD eess.AS

    Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

    Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, **g Chen, Kun He, Ruiying Du

    Abstract: The integration of Voice Control Systems (VCS) into smart devices and their growing presence in daily life accentuate the importance of their security. Current research has uncovered numerous vulnerabilities in VCS, presenting significant risks to user privacy and security. However, a cohesive and systematic examination of these vulnerabilities and the corresponding solutions is still absent. This… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  6. arXiv:2405.05945  [pdf, other

    cs.CV

    Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

    Authors: Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, **gwen He, Yu Qiao, Hongsheng Li

    Abstract: Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Technical Report; Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  7. arXiv:2405.02774  [pdf, other

    cs.LG cs.AI cs.CL

    Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs

    Authors: Feiyang Kang, Hoang Anh Just, Yifan Sun, Himanshu Jahagirdar, Yuanzhi Zhang, Rongxing Du, Anit Kumar Sahu, Ruoxi Jia

    Abstract: This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model. The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels. While many data selection algorithms have been designed for small-scale applications, rendering them unsuitable for our context, some emerg… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICLR 2024

  8. arXiv:2404.15854  [pdf, other

    cs.CR cs.LG cs.SD eess.AS

    CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning

    Authors: Haolin Wu, **g Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu

    Abstract: The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE TDSC

  9. arXiv:2404.15000  [pdf, other

    cs.CR

    EarPass: Secure and Implicit Call Receiver Authentication Using Ear Acoustic Sensing

    Authors: Xi** Sun, **g Chen, Kun He, Zhixiang He, Ruiying Du, Yebo Feng, Qingchuan Zhao, Cong Wu

    Abstract: Private voice communication often contains sensitive information, making it critical to ensure that only authorized users have access to such calls. Unfortunately, current authentication mechanisms, such as PIN-based passwords, fingerprint recognition, and face recognition, fail to authenticate the call receiver, leaving a gap in security. To fill the gap, we present EarPass, a secure and implicit… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  10. arXiv:2404.14991  [pdf, other

    cs.IR

    A Short Review for Ontology Learning: Stride to Large Language Models Trend

    Authors: Rick Du, Huilong An, Keyu Wang, Weidong Liu

    Abstract: Ontologies provide formal representation of knowledge shared within Semantic Web applications. Ontology learning involves the construction of ontologies from a given corpus. In the past years, ontology learning has traversed through shallow learning and deep learning methodologies, each offering distinct advantages and limitations in the quest for knowledge extraction and representation. A new tre… ▽ More

    Submitted 17 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  11. arXiv:2404.13807  [pdf, other

    cs.CV cs.GR

    FaceFolds: Meshed Radiance Manifolds for Efficient Volumetric Rendering of Dynamic Faces

    Authors: Safa C. Medin, Gengyan Li, Ruofei Du, Stephan Garbin, Philip Davidson, Gregory W. Wornell, Thabo Beeler, Abhimitra Meka

    Abstract: 3D rendering of dynamic face captures is a challenging problem, and it demands improvements on several fronts$\unicode{x2014}$photorealism, efficiency, compatibility, and configurability. We present a novel representation that enables high-quality volumetric rendering of an actor's dynamic facial performances with minimal compute and memory footprint. It runs natively on commodity graphics soft- a… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: In Proceedings of the ACM in Computer Graphics and Interactive Techniques, 2024

  12. arXiv:2404.13274  [pdf, other

    cs.HC cs.AI

    Augmented Object Intelligence: Making the Analog World Interactable with XR-Objects

    Authors: Mustafa Doga Dogan, Eric J. Gonzalez, Andrea Colaco, Karan Ahuja, Ruofei Du, Johnny Lee, Mar Gonzalez-Franco, David Kim

    Abstract: Seamless integration of physical objects as interactive digital entities remains a challenge for spatial computing. This paper introduces Augmented Object Intelligence (AOI), a novel XR interaction paradigm designed to blur the lines between digital and physical by equip** real-world objects with the ability to interact as if they were digital, where every object has the potential to serve as a… ▽ More

    Submitted 22 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  13. arXiv:2404.09408  [pdf, other

    cs.NI

    A Distributed Scalable Cross-chain State Channel Scheme Based on Recursive State Synchronization

    Authors: Xinyu Liang, Ruiying Du, **g Chen, Yu Zhang, Meng Jia, Shuangxi Cao, Yufeng Wei, Shixiong Yao

    Abstract: As cross-chain technology continues to advance, the scale of cross-chain transactions is experiencing significant expansion. To improve scalability, researchers have turned to the study of cross-chain state channels. However, most of the existing schemes rely on trusted parties to support channel operations. To address this issue, we present Interpipe: a distributed cross-chain state channel schem… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  14. arXiv:2404.01106  [pdf, other

    cs.CR

    MagLive: Near-Field Magnetic Sensing-Based Voice Liveness Detection on Smartphones

    Authors: Xi** Sun, **g Chen, Cong Wu, Kun He, Haozhe Xu, Yebo Feng, Ruiying Du, Xianhao Chen

    Abstract: Voice authentication has been widely used on smartphones. However, it remains vulnerable to spoofing attacks, where the attacker replays recorded voice samples from authentic humans using loudspeakers to bypass the voice authentication system. In this paper, we present MagLive, a robust voice liveness detection scheme designed for smartphones to mitigate such spoofing attacks. MagLive leverages di… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  15. arXiv:2403.10313  [pdf, other

    cs.CR cs.DB

    Interactive Trimming against Evasive Online Data Manipulation Attacks: A Game-Theoretic Approach

    Authors: Yue Fu, Qingqing Ye, Rong Du, Haibo Hu

    Abstract: With the exponential growth of data and its crucial impact on our lives and decision-making, the integrity of data has become a significant concern. Malicious data poisoning attacks, where false values are injected into the data, can disrupt machine learning processes and lead to severe consequences. To mitigate these attacks, distance-based defenses, such as trimming, have been proposed, but they… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: This manuscript is accepted by ICDE '24

  16. Human I/O: Towards a Unified Approach to Detecting Situational Impairments

    Authors: Xingyu Bruce Liu, Jiahao Nick Li, David Kim, Xiang 'Anthony' Chen, Ruofei Du

    Abstract: Situationally Induced Impairments and Disabilities (SIIDs) can significantly hinder user experience in contexts such as poor lighting, noise, and multi-tasking. While prior research has introduced algorithms and systems to address these impairments, they predominantly cater to specific tasks or environments and fail to accommodate the diverse and dynamic nature of SIIDs. We introduce Human I/O, a… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  17. arXiv:2402.09636  [pdf, other

    eess.IV cs.CV

    Spatiotemporal Disentanglement of Arteriovenous Malformations in Digital Subtraction Angiography

    Authors: Kathleen Baur, Xin Xiong, Erickson Torio, Rose Du, Parikshit Juvekar, Reuben Dorent, Alexandra Golby, Sarah Frisken, Nazim Haouchine

    Abstract: Although Digital Subtraction Angiography (DSA) is the most important imaging for visualizing cerebrovascular anatomy, its interpretation by clinicians remains difficult. This is particularly true when treating arteriovenous malformations (AVMs), where entangled vasculature connecting arteries and veins needs to be carefully identified.The presented method aims to enhance DSA image series by highli… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Paper accepted for publication at SPIE Medical Imaging 2024

  18. arXiv:2402.05887  [pdf, other

    eess.IV cs.MM

    Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers

    Authors: Onur G. Guleryuz, Philip A. Chou, Berivan Isik, Hugues Hoppe, Danhang Tang, Ruofei Du, Jonathan Taylor, Philip Davidson, Sean Fanello

    Abstract: We propose sandwiching standard image and video codecs between pre- and post-processing neural networks. The networks are jointly trained through a differentiable codec proxy to minimize a given rate-distortion loss. This sandwich architecture not only improves the standard codec's performance on its intended content, it can effectively adapt the codec to other types of image/video content and to… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  19. arXiv:2401.02552  [pdf, other

    cs.LG cs.CY

    Long-term Fairness For Real-time Decision Making: A Constrained Online Optimization Approach

    Authors: Ruijie Du, Deepan Muthirayan, Pramod P. Khargonekar, Yanning Shen

    Abstract: Machine learning (ML) has demonstrated remarkable capabilities across many real-world systems, from predictive modeling to intelligent automation. However, the widespread integration of machine learning also makes it necessary to ensure machine learning-driven decision-making systems do not violate ethical principles and values of society in which they operate. As ML-driven decisions proliferate,… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  20. arXiv:2312.09672  [pdf, other

    cs.HC cs.AI

    InstructPipe: Building Visual Programming Pipelines with Human Instructions

    Authors: Zhongyi Zhou, **g **, Vrushank Phadnis, Xiuxiu Yuan, Jun Jiang, Xun Qian, **gtao Zhou, Yiyi Huang, Zheng Xu, Yinda Zhang, Kristen Wright, Jason Mayes, Mark Sherwood, Johnny Lee, Alex Olwal, David Kim, Ram Iyengar, Na Li, Ruofei Du

    Abstract: Visual programming provides beginner-level programmers with a coding-free experience to build their customized pipelines. Existing systems require users to build a pipeline entirely from scratch, implying that novice users need to set up and link appropriate nodes all by themselves, starting from a blank workspace. We present InstructPipe, an AI assistant that enables users to start prototy** ma… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  21. CAT: A Causally Graph Attention Network for Trimming Heterophilic Graph

    Authors: Silu He, Qinyao Luo, Xinsha Fu, Ling Zhao, Ronghua Du, Haifeng Li

    Abstract: Local Attention-guided Message Passing Mechanism (LAMP) adopted in Graph Attention Networks (GATs) is designed to adaptively learn the importance of neighboring nodes for better local aggregation on the graph, which can bring the representations of similar neighbors closer effectively, thus showing stronger discrimination ability. However, existing GATs suffer from a significant discrimination abi… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 25 pages, 18 figures, 5 tables

    Journal ref: Information Science 2024

  22. arXiv:2311.16973  [pdf, other

    cs.CV cs.AI cs.LG

    DemoFusion: Democratising High-Resolution Image Generation With No $$$

    Authors: Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma

    Abstract: High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls. This paper aims to democratise high-resolution GenAI by advancing the frontier of high-resolution generation while remaining accessible to a b… ▽ More

    Submitted 14 December, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: Project Page: https://ruoyidu.github.io/demofusion/demofusion.html

  23. arXiv:2311.03707  [pdf, other

    cs.AI cs.LG cs.MA

    The NeurIPS 2022 Neural MMO Challenge: A Massively Multiagent Competition with Specialization and Trade

    Authors: Enhong Liu, Joseph Suarez, Chenhui You, Bo Wu, Bingcheng Chen, Jun Hu, Jiaxin Chen, Xiaolong Zhu, Clare Zhu, Julian Togelius, Sharada Mohanty, Weijun Hong, Rui Du, Yibing Zhang, Qinwen Wang, Xinhang Li, Zheng Yuan, Xiang Li, Yuejia Huang, Kun Zhang, Hanhui Yang, Shiqi Tang, Phillip Isola

    Abstract: In this paper, we present the results of the NeurIPS-2022 Neural MMO Challenge, which attracted 500 participants and received over 1,600 submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved agents from 16 populations surviving in procedurally generated worlds by collecting resources and defeating opponents. This year's competition runs on the latest v1.6 Neural MMO, which in… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  24. arXiv:2310.17661  [pdf, other

    eess.SP cs.NI

    An Overview on IEEE 802.11bf: WLAN Sensing

    Authors: Rui Du, Haocheng Hua, Hailiang Xie, Xianxin Song, Zhonghao Lyu, Mengshi Hu, Narengerile, Yan Xin, Stephen McCann, Michael Montemurro, Tony Xiao Han, Jie Xu

    Abstract: With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications.… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 31 pages, 25 figures, this is a significant updated version of arXiv:2207.04859

  25. arXiv:2309.00790  [pdf

    cs.RO

    PFL-LSTR: A privacy-preserving framework for driver intention inference based on in-vehicle and out-vehicle information

    Authors: Runjia Du, Pei Li, Sikai Chen, Samuel Labi

    Abstract: Intelligent vehicle anticipation of the movement intentions of other drivers can reduce collisions. Typically, when a human driver of another vehicle (referred to as the target vehicle) engages in specific behaviors such as checking the rearview mirror prior to lane change, a valuable clue is therein provided on the intentions of the target vehicle's driver. Furthermore, the target driver's intent… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

    Comments: Submitted for presentation only at the 2024 Annual Meeting of the Transportation Research Board

  26. arXiv:2308.15802  [pdf, other

    cs.AI

    Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO

    Authors: Yangkun Chen, Joseph Suarez, Junjie Zhang, Chenghui Yu, Bo Wu, Hanmo Chen, Hengman Zhu, Rui Du, Shanliang Qian, Shuai Liu, Weijun Hong, **ke He, Yibing Zhang, Liang Zhao, Clare Zhu, Julian Togelius, Sharada Mohanty, Jiaxin Chen, Xiu Li, Xiaolong Zhu, Phillip Isola

    Abstract: We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems: participants train teams of agents to complete a multi-task objective against opponents not seen during training. The competition combines relatively complex environment design with large numbers of agents… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  27. arXiv:2306.15774  [pdf

    cs.HC cs.CL cs.CV cs.LG

    Next Steps for Human-Centered Generative AI: A Technical Perspective

    Authors: Xiang 'Anthony' Chen, Jeff Burke, Ruofei Du, Matthew K. Hong, Jennifer Jacobs, Philippe Laban, Dingzeyu Li, Nanyun Peng, Karl D. D. Willis, Chien-Sheng Wu, Bolei Zhou

    Abstract: Through iterative, cross-disciplinary discussions, we define and propose next-steps for Human-centered Generative AI (HGAI). We contribute a comprehensive research agenda that lays out future directions of Generative AI spanning three levels: aligning with human values; assimilating human intents; and augmenting human abilities. By identifying these next-steps, we intend to draw interdisciplinary… ▽ More

    Submitted 22 December, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  28. arXiv:2305.06777  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Generating high-quality 3DMPCs by adaptive data acquisition and NeREF-based radiometric calibration with UGV plant phenoty** system

    Authors: Pengyao Xie, Zhihong Ma, Ruiming Du, Xin Yang, Haiyan Cen

    Abstract: Fusion of 3D and MS imaging data has a great potential for high-throughput plant phenoty** of structural and biochemical as well as physiological traits simultaneously, which is important for decision support in agriculture and for crop breeders in selecting the best genotypes. However, lacking of 3D data integrity of various plant canopy structures and low-quality of MS images caused by the com… ▽ More

    Submitted 1 December, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  29. arXiv:2304.01436  [pdf, other

    cs.CV cs.GR

    Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos

    Authors: Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka, Ruofei Du, Mingsong Dou, Sergio Orts-Escolano, Rohit Pandey, ** Tan, Thabo Beeler, Sean Fanello, Yinda Zhang

    Abstract: We propose a method to learn a high-quality implicit 3D head avatar from a monocular RGB video captured in the wild. The learnt avatar is driven by a parametric face model to achieve user-controlled facial expressions and head poses. Our hybrid pipeline combines the geometry prior and dynamic tracking of a 3DMM with a neural radiance field to achieve fine-grained control and photorealism. To reduc… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: In CVPR2023. Project page: https://augmentedperception.github.io/monoavatar/

  30. arXiv:2302.09315  [pdf, other

    cs.CR

    Differential Aggregation against General Colluding Attackers

    Authors: Rong Du, Qingqing Ye, Yue Fu, Haibo Hu, ** Li, Chengfang Fang, Jie Shi

    Abstract: Local Differential Privacy (LDP) is now widely adopted in large-scale systems to collect and analyze sensitive data while preserving users' privacy. However, almost all LDP protocols rely on a semi-trust model where users are curious-but-honest, which rarely holds in real-world scenarios. Recent works show poor estimation accuracy of many LDP protocols under malicious threat models. Although a few… ▽ More

    Submitted 20 March, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: This paper has been accepted by ICDE2023

  31. arXiv:2301.08910  [pdf, other

    cs.IT

    Capacity-CRB Tradeoff in OFDM Integrated Sensing and Communication Systems

    Authors: Zhe Huang, An liu, Rui Du, Tony Xiao Han

    Abstract: Integrated sensing and communication (ISAC) has emerged as a key technology for future communication systems. In this paper, we provide a general framework to reveal the fundamental tradeoff between sensing and communication in OFDM systems, where a unified ISAC waveform is exploited to perform both tasks. In particular, we define the Capacity-Bayesian Cramer Rao Bound (BCRB) region in the asympto… ▽ More

    Submitted 21 January, 2023; originally announced January 2023.

  32. arXiv:2212.04365  [pdf, other

    cs.LG cs.AI cs.NI

    Alleviating neighbor bias: augmenting graph self-supervise learning with structural equivalent positive samples

    Authors: Jiawei Zhu, Mei Hong, Ronghua Du, Haifeng Li

    Abstract: In recent years, using a self-supervised learning framework to learn the general characteristics of graphs has been considered a promising paradigm for graph representation learning. The core of self-supervised learning strategies for graph neural networks lies in constructing suitable positive sample selection strategies. However, existing GNNs typically aggregate information from neighboring nod… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: 8 pages, 5 figures, 8 tables

  33. STGC-GNNs: A GNN-based traffic prediction framework with a spatial-temporal Granger causality graph

    Authors: Silu He, Qinyao Luo, Ronghua Du, Ling Zhao, Haifeng Li

    Abstract: The key to traffic prediction is to accurately depict the temporal dynamics of traffic flow traveling in a road network, so it is important to model the spatial dependence of the road network. The essence of spatial dependence is to accurately describe how traffic information transmission is affected by other nodes in the road network, and the GNN-based traffic prediction model, as a benchmark for… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: 14 pages, 16 figures, 4 tables

  34. arXiv:2208.06143  [pdf, other

    cs.CV cs.GR cs.LG

    PRIF: Primary Ray-based Implicit Function

    Authors: Brandon Yushan Feng, Yinda Zhang, Danhang Tang, Ruofei Du, Amitabh Varshney

    Abstract: We introduce a new implicit shape representation called Primary Ray-based Implicit Function (PRIF). In contrast to most existing approaches based on the signed distance function (SDF) which handles spatial locations, our representation operates on oriented rays. Specifically, PRIF is formulated to directly produce the surface hit point of a given input ray, without the expensive sphere-tracing ope… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

    Comments: ECCV 2022. Project Page: https://augmentariumlab.github.io/PRIF/

  35. arXiv:2207.04859  [pdf, ps, other

    cs.NI eess.SP

    An Overview on IEEE 802.11bf: WLAN Sensing

    Authors: Rui Du, Hailiang Xie, Mengshi Hu, Narengerile, Yan Xin, Stephen McCann, Michael Montemurro, Tony Xiao Han, Jie Xu

    Abstract: With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent sensing requirements in emerging applications. T… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

  36. PST: Plant segmentation transformer for 3D point clouds of rapeseed plants at the podding stage

    Authors: Ruiming Du, Zhihong Ma, Pengyao Xie, Yong He, Haiyan Cen

    Abstract: Segmentation of plant point clouds to obtain high-precise morphological traits is essential for plant phenoty**. Although the fast development of deep learning has boosted much research on segmentation of plant point clouds, previous studies mainly focus on the hard voxelization-based or down-sampling-based methods, which are limited to segmenting simple plant organs. Segmentation of complex pla… ▽ More

    Submitted 19 January, 2024; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: 46 pages, 10 figures

  37. arXiv:2206.01153  [pdf, other

    cs.CV

    Multi-View Active Fine-Grained Recognition

    Authors: Ruoyi Du, Wenqing Yu, Heqing Wang, Dongliang Chang, Ting-En Lin, Yongbin Li, Zhanyu Ma

    Abstract: As fine-grained visual classification (FGVC) being developed for decades, great works related have exposed a key direction -- finding discriminative local regions and revealing subtle differences. However, unlike identifying visual contents within static images, for recognizing objects in the real physical world, discriminative information is not only present within seen local regions but also hid… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  38. arXiv:2206.00493  [pdf, ps, other

    eess.SP cs.IT

    Networked Sensing in 6G Cellular Networks: Opportunities and Challenges

    Authors: Liang Liu, Shuowen Zhang, Rui Du, Tong Xiao Han, Shuguang Cui

    Abstract: Radar and wireless communication are widely acknowledged as the two most successful applications of the radio technology over the past decades. Recently, there is a trend in both academia and industry to achieve integrated sensing and communication (ISAC) in one system via utilizing a common radio spectrum and the same hardware platform. This article will discuss about the possibility of exploitin… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

  39. arXiv:2206.00415  [pdf, other

    cs.CV

    Learning Invariant Visual Representations for Compositional Zero-Shot Learning

    Authors: Tian Zhang, Kongming Liang, Ruoyi Du, Xian Sun, Zhanyu Ma, Jun Guo

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen attribute-object compositions in the training set. Previous works mainly project an image and a composition into a common embedding space to measure their compatibility score. However, both attributes and objects share the visual representations learned above, leading the model to exploit… ▽ More

    Submitted 18 July, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

  40. arXiv:2204.08409  [pdf, other

    cs.SD cs.CL eess.AS

    Caption Feature Space Regularization for Audio Captioning

    Authors: Yiming Zhang, Hong Yu, Ruoyi Du, Zhanyu Ma, Yuan Dong

    Abstract: Audio captioning aims at describing the content of audio clips with human language. Due to the ambiguity of audio, different people may perceive the same audio differently, resulting in caption disparities (i.e., one audio may correlate to several captions with diverse semantics). For that, general audio captioning models achieve the one-to-many training by randomly selecting a correlated caption… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

  41. arXiv:2203.02130  [pdf

    econ.GN cs.SI physics.soc-ph

    Map** evolving population geography in China

    Authors: Lei Dong, Rui Du, Yu Liu

    Abstract: China's demographic changes have important global economic and geopolitical implications. Yet, our understanding of such transitions at the micro-spatial scale remains limited due to spatial inconsistency of the census data caused by administrative boundary adjustments. To fill this gap, we manually collected and built a population census panel from 2010 to 2020 at both the county and prefectural-… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

  42. arXiv:2202.11134  [pdf

    cs.HC cs.LG cs.SD eess.AS

    ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users

    Authors: Dhruv Jain, Khoa Huynh Anh Nguyen, Steven Goodman, Rachel Grossman-Kahn, Hung Ngo, Aditya Kusupati, Ruofei Du, Alex Olwal, Leah Findlater, Jon E. Froehlich

    Abstract: Recent advances have enabled automatic sound recognition systems for deaf and hard of hearing (DHH) users on mobile devices. However, these tools use pre-trained, generic sound recognition models, which do not meet the diverse needs of DHH users. We introduce ProtoSound, an interactive system for customizing sound recognition models by recording a few examples, thereby enabling personalized and fi… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: Published at the ACM CHI Conference on Human Factors in Computing Systems (CHI) 2022

  43. arXiv:2202.08752  [pdf, other

    cs.CV cs.GR

    OmniSyn: Synthesizing 360 Videos with Wide-baseline Panoramas

    Authors: David Li, Yinda Zhang, Christian Häne, Danhang Tang, Amitabh Varshney, Ruofei Du

    Abstract: Immersive maps such as Google Street View and Bing Streetside provide true-to-life views with a massive collection of panoramas. However, these panoramas are only available at sparse intervals along the path they are taken, resulting in visual discontinuities during navigation. Prior art in view synthesis is usually built upon a set of perspective images, a pair of stereoscopic images, or a monocu… ▽ More

    Submitted 22 February, 2022; v1 submitted 17 February, 2022; originally announced February 2022.

    Comments: Updated related works

  44. arXiv:2202.02688  [pdf, ps, other

    cs.IT

    Joint Pilot Optimization, Target Detection and Channel Estimation for Integrated Sensing and Communication Systems

    Authors: Zhe Huang, Kexuan Wang, An Liu, Yunlong Cai, Rui Du, Tony Xiao Han

    Abstract: Radar sensing will be integrated into the 6G communication system to support various applications. In this integrated sensing and communication system, a radar target may also be a communication channel scatterer. In this case, the radar and communication channels exhibit certain joint burst sparsity. We propose a two-stage joint pilot optimization, target detection and channel estimation scheme t… ▽ More

    Submitted 5 February, 2022; originally announced February 2022.

    Comments: 30 pages, 8 figures, submitted to IEEE Transactions on Wireless Communications

  45. Domain Generalization via Frequency-domain-based Feature Disentanglement and Interaction

    Authors: **gye Wang, Ruoyi Du, Dongliang Chang, Kongming Liang, Zhanyu Ma

    Abstract: Adaptation to out-of-distribution data is a meta-challenge for all statistical learning algorithms that strongly rely on the i.i.d. assumption. It leads to unavoidable labor costs and confidence crises in realistic applications. For that, domain generalization aims at mining domain-irrelevant knowledge from multiple source domains that can generalize to unseen target domains. In this paper, by lev… ▽ More

    Submitted 25 July, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: The paper is accepted by ACM Multimedia 2022

  46. arXiv:2112.14239  [pdf, other

    cs.CV

    TAGPerson: A Target-Aware Generation Pipeline for Person Re-identification

    Authors: Kai Chen, Weihua Chen, Tao He, Rong Du, Fan Wang, Xiuyu Sun, Yuchen Guo, Guiguang Ding

    Abstract: Nowadays, real data in person re-identification (ReID) task is facing privacy issues, e.g., the banned dataset DukeMTMC-ReID. Thus it becomes much harder to collect real data for ReID task. Meanwhile, the labor cost of labeling ReID data is still very high and further hinders the development of the ReID research. Therefore, many methods turn to generate synthetic images for ReID algorithms as alte… ▽ More

    Submitted 28 December, 2021; originally announced December 2021.

  47. arXiv:2112.02825  [pdf, other

    cs.CV

    Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data

    Authors: Ruoyi Du, Dongliang Chang, Zhanyu Ma, Yi-Zhe Song, Jun Guo

    Abstract: Despite great strides made on fine-grained visual classification (FGVC), current methods are still heavily reliant on fully-supervised paradigms where ample expert labels are called for. Semi-supervised learning (SSL) techniques, acquiring knowledge from unlabeled data, provide a considerable means forward and have shown great promise for coarse-grained problems. However, exiting SSL paradigms mos… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  48. arXiv:2112.02747  [pdf, other

    cs.CV

    Making a Bird AI Expert Work for You and Me

    Authors: Dongliang Chang, Kaiyue Pang, Ruoyi Du, Zhanyu Ma, Yi-Zhe Song, Jun Guo

    Abstract: As powerful as fine-grained visual classification (FGVC) is, responding your query with a bird name of "Whip-poor-will" or "Mallard" probably does not make much sense. This however commonly accepted in the literature, underlines a fundamental question interfacing AI and human -- what constitutes transferable knowledge for human to learn from AI? This paper sets out to answer this very question usi… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

  49. arXiv:2110.07380  [pdf

    cs.CV

    Reason induced visual attention for explainable autonomous driving

    Authors: Sikai Chen, Jiqian Dong, Runjia Du, Yujie Li, Samuel Labi

    Abstract: Deep learning (DL) based computer vision (CV) models are generally considered as black boxes due to poor interpretability. This limitation impedes efficient diagnoses or predictions of system failure, thereby precluding the widespread deployment of DLCV models in safety-critical tasks such as autonomous driving. This study is motivated by the need to enhance the interpretability of DL model in aut… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Under review for presentation at TRB 2022 Annual Meeting

  50. arXiv:2110.05564  [pdf

    cs.LG cs.AI

    Scalable Traffic Signal Controls using Fog-Cloud Based Multiagent Reinforcement Learning

    Authors: Paul, Ha, Sikai Chen, Runjia Du, Samuel Labi

    Abstract: Optimizing traffic signal control (TSC) at intersections continues to pose a challenging problem, particularly for large-scale traffic networks. It has been shown in past research that it is feasible to optimize the operations of individual TSC systems or a small number of such systems. However, it has been computationally difficult to scale these solution approaches to large networks partly due t… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: Under review for presentation at TRB 2022 Annual Meeting