Skip to main content

Showing 1–50 of 71 results for author: Gu, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.10246  [pdf, other

    eess.IV cs.CV

    A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts

    Authors: Xinru Zhang, Ni Ou, Berke Doga Basaran, Marco Visentin, Mengyun Qiao, Renyang Gu, Cheng Ouyang, Yaou Liu, Paul M. Matthew, Chuyang Ye, Wenjia Bai

    Abstract: Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: The work has been early accepted by MICCAI 2024

  2. arXiv:2404.04947  [pdf, other

    eess.AS cs.AI cs.LG cs.SD eess.SP

    Gull: A Generative Multifunctional Audio Codec

    Authors: Yi Luo, Jianwei Yu, Hangting Chen, Rongzhi Gu, Chao Weng

    Abstract: We introduce Gull, a generative multifunctional audio codec. Gull is a general purpose neural audio compression and decompression model which can be applied to a wide range of tasks and applications such as real-time communication, audio super-resolution, and codec language models. The key components of Gull include (1) universal-sample-rate modeling via subband modeling schemes motivated by recen… ▽ More

    Submitted 7 June, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Demo page: https://yluo42.github.io/Gull/

  3. arXiv:2404.00950  [pdf, other

    cs.CL

    AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text

    Authors: Renhua Gu, Xiangfeng Meng

    Abstract: SemEval-2024 Task 8 provides a challenge to detect human-written and machine-generated text. There are 3 subtasks for different detection scenarios. This paper proposes a system that mainly deals with Subtask B. It aims to detect if given full text is written by human or is generated by a specific Large Language Model (LLM), which is actually a multi-class text classification task. Our team AISPAC… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 1st place at SemEval-2024 Task 8, Subtask B, to appear in SemEval-2024 proceedings

  4. arXiv:2402.12756  [pdf, other

    cs.LG cs.NI

    Static vs. Dynamic Databases for Indoor Localization based on Wi-Fi Fingerprinting: A Discussion from a Data Perspective

    Authors: Zhe Tang, Ruocheng Gu, Sihao Li, Kyeong Soo Kim, Jeremy S. Smith

    Abstract: Wi-Fi fingerprinting has emerged as the most popular approach to indoor localization. The use of ML algorithms has greatly improved the localization performance of Wi-Fi fingerprinting, but its success depends on the availability of fingerprint databases composed of a large number of RSSIs, the MAC addresses of access points, and the other measurement information. However, most fingerprint databas… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 10 pages, 5 figures, Invited paper with Excellent Paper Award to be presented at ICAIIC 2024, Osaka, Japan, Feb. 19--22, 2023

  5. arXiv:2401.08636   

    cs.DC cs.AI

    MLCommons Cloud Masking Benchmark with Early Stop**

    Authors: Varshitha Chennamsetti, Gregor von Laszewski, Ruochen Gu, Laiba Mehnaz, Juri Papay, Samuel Jackson, Jeyan Thiyagalingam, Sergey V. Samsonau, Geoffrey C. Fox

    Abstract: In this paper, we report on work performed for the MLCommons Science Working Group on the cloud masking benchmark. MLCommons is a consortium that develops and maintains several scientific benchmarks that aim to benefit developments in AI. The benchmarks are conducted on the High Performance Computing (HPC) Clusters of New York University and University of Virginia, as well as a commodity desktop.… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 December, 2023; originally announced January 2024.

    Comments: NYU did not approve the publication of the paper

  6. arXiv:2312.10381  [pdf, other

    cs.SD eess.AS

    SECap: Speech Emotion Captioning with Large Language Model

    Authors: Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shixiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu

    Abstract: Speech emotions are crucial in human communication and are extensively used in fields like speech synthesis and natural language understanding. Most prior studies, such as speech emotion recognition, have categorized speech emotions into a fixed set of classes. Yet, emotions expressed in human speech are often complex, and categorizing them into predefined groups can be insufficient to adequately… ▽ More

    Submitted 23 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  7. arXiv:2312.04799  [pdf, other

    cs.DC cs.AI

    An Overview of MLCommons Cloud Mask Benchmark: Related Research and Data

    Authors: Gregor von Laszewski, Ruochen Gu

    Abstract: Cloud masking is a crucial task that is well-motivated for meteorology and its applications in environmental and atmospheric sciences. Its goal is, given satellite images, to accurately generate cloud masks that identify each pixel in image to contain either cloud or clear sky. In this paper, we summarize some of the ongoing research activities in cloud masking, with a focus on the research and be… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 13 pages, 2 tables 7 figures, 3 appendix

  8. arXiv:2311.07033  [pdf, other

    eess.IV cs.CV

    TTMFN: Two-stream Transformer-based Multimodal Fusion Network for Survival Prediction

    Authors: Ruiquan Ge, Xiangyang Hu, Rungen Huang, Gangyong Jia, Yaqi Wang, Renshu Gu, Changmiao Wang, Elazab Ahmed, Linyan Wang, Juan Ye, Ye Li

    Abstract: Survival prediction plays a crucial role in assisting clinicians with the development of cancer treatment protocols. Recent evidence shows that multimodal data can help in the diagnosis of cancer disease and improve survival prediction. Currently, deep learning-based approaches have experienced increasing success in survival prediction by integrating pathological images and gene expression data. H… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  9. arXiv:2310.08997  [pdf, other

    cs.GR cs.CG

    A Parallel Feature-preserving Mesh Variable Offsetting Method with Dynamic Programming

    Authors: Hongyi Cao, Gang Xu, Renshu Gu, **lan Xu, Xiaoyu Zhang, Timon Rabczuk

    Abstract: Mesh offsetting plays an important role in discrete geometric processing. In this paper, we propose a parallel feature-preserving mesh offsetting framework with variable distance. Different from the traditional method based on distance and normal vector, a new calculation of offset position is proposed by using dynamic programming and quadratic programming, and the sharp feature can be preserved a… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  10. arXiv:2309.14329  [pdf, other

    cs.HC cs.AI cs.CV cs.GR cs.MM

    Innovative Digital Storytelling with AIGC: Exploration and Discussion of Recent Advances

    Authors: Rongzhang Gu, Hui Li, Changyue Su, Wayne Wu

    Abstract: Digital storytelling, as an art form, has struggled with cost-quality balance. The emergence of AI-generated Content (AIGC) is considered as a potential solution for efficient digital storytelling production. However, the specific form, effects, and impacts of this fusion remain unclear, leaving the boundaries of AIGC combined with storytelling undefined. This work explores the current integration… ▽ More

    Submitted 28 September, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Project page: https://lsgm-demo.github.io/Leveraging-recent-advances-of-foundation-models-for-story-telling/

  11. UPL-SFDA: Uncertainty-aware Pseudo Label Guided Source-Free Domain Adaptation for Medical Image Segmentation

    Authors: Jianghao Wu, Guotai Wang, Ran Gu, Tao Lu, Yinan Chen, Wentao Zhu, Tom Vercauteren, Sébastien Ourselin, Shaoting Zhang

    Abstract: Domain Adaptation (DA) is important for deep learning-based medical image segmentation models to deal with testing images from a new target domain. As the source-domain data are usually unavailable when a trained model is deployed at a new center, Source-Free Domain Adaptation (SFDA) is appealing for data and annotation-efficient adaptation to the target domain. However, existing SFDA methods have… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 12 pages, 6 figures, to be published on IEEE TMI

  12. arXiv:2308.16892  [pdf, other

    eess.AS cs.AI cs.SD

    ReZero: Region-customizable Sound Extraction

    Authors: Rongzhi Gu, Yi Luo

    Abstract: We introduce region-customizable sound extraction (ReZero), a general and flexible framework for the multi-channel region-wise sound extraction (R-SE) task. R-SE task aims at extracting all active target sounds (e.g., human speech) within a specific, user-defined spatial region, which is different from conventional and existing tasks where a blind separation or a fixed, predefined spatial region a… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 13 pages, 11 figures

  13. Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression

    Authors: Hangting Chen, Jianwei Yu, Yi Luo, Rongzhi Gu, Weihua Li, Zhuocheng Lu, Chao Weng

    Abstract: Echo cancellation and noise reduction are essential for full-duplex communication, yet most existing neural networks have high computational costs and are inflexible in tuning model complexity. In this paper, we introduce time-frequency dual-path compression to achieve a wide range of compression ratios on computational cost. Specifically, for frequency compression, trainable filters are used to r… ▽ More

    Submitted 10 October, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Proceedings of INTERSPEECH

  14. arXiv:2308.06981  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

    Authors: Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

    Abstract: This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most succes… ▽ More

    Submitted 18 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted for Transactions of the International Society for Music Information Retrieval

  15. arXiv:2304.08052  [pdf, other

    cs.SD eess.AS

    Fast Random Approximation of Multi-channel Room Impulse Response

    Authors: Yi Luo, Rongzhi Gu

    Abstract: Modern neural-network-based speech processing systems are typically required to be robust against reverberation, and the training of such systems thus needs a large amount of reverberant data. During the training of the systems, on-the-fly simulation pipeline is nowadays preferred as it allows the model to train on infinite number of data samples without pre-generating and saving them on harddisk.… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  16. arXiv:2302.13462  [pdf, other

    cs.SD eess.AS

    3D Neural Beamforming for Multi-channel Speech Separation Against Location Uncertainty

    Authors: Rongzhi Gu, Shi-Xiong Zhang, Dong Yu

    Abstract: Multi-channel speech separation using speaker's directional information has demonstrated significant gains over blind speech separation. However, it has two limitations. First, substantial performance degradation is observed when the coming directions of two sounds are close. Second, the result highly relies on the precise estimation of the speaker's direction. To overcome these issues, this paper… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

  17. arXiv:2212.08348  [pdf, other

    cs.SD eess.AS

    Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

    Authors: Rongzhi Gu, Shi-Xiong Zhang, Yuexian Zou, Dong Yu

    Abstract: Recently, frequency domain all-neural beamforming methods have achieved remarkable progress for multichannel speech separation. In parallel, the integration of time domain network structure and beamforming also gains significant attention. This study proposes a novel all-neural beamforming method in time domain and makes an attempt to unify the all-neural beamforming pipelines for time domain and… ▽ More

    Submitted 23 December, 2022; v1 submitted 16 December, 2022; originally announced December 2022.

  18. arXiv:2211.12081  [pdf, other

    cs.CV

    CDDSA: Contrastive Domain Disentanglement and Style Augmentation for Generalizable Medical Image Segmentation

    Authors: Ran Gu, Guotai Wang, Jiangshan Lu, **gyang Zhang, Wenhui Lei, Yinan Chen, Wenjun Liao, Shichuan Zhang, Kang Li, Dimitris N. Metaxas, Shaoting Zhang

    Abstract: Generalization to previously unseen images with potential domain shifts and different styles is essential for clinically applicable medical image segmentation, and the ability to disentangle domain-specific and domain-invariant features is key for achieving Domain Generalization (DG). However, existing DG methods can hardly achieve effective disentanglement to get high generalizability. To deal wi… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: 14 pages, 8 figures

  19. arXiv:2210.16032  [pdf, other

    eess.AS cs.SD eess.SP

    Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters

    Authors: Junyi Peng, Themos Stafylakis, Rongzhi Gu, Oldřich Plchot, Ladislav Mošner, Lukáš Burget, Jan Černocký

    Abstract: Recently, the pre-trained Transformer models have received a rising interest in the field of speech processing thanks to their great success in various downstream tasks. However, most fine-tuning approaches update all the parameters of the pre-trained model, which becomes prohibitive as the model size grows and sometimes results in overfitting on small datasets. In this paper, we conduct a compreh… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: submitted to ICASSP2023

  20. PyMIC: A deep learning toolkit for annotation-efficient medical image segmentation

    Authors: Guotai Wang, Xiangde Luo, Ran Gu, Shuojue Yang, Yijie Qu, Shuwei Zhai, Qianfei Zhao, Kang Li, Shaoting Zhang

    Abstract: Background and Objective: Open-source deep learning toolkits are one of the driving forces for develo** medical image segmentation models. Existing toolkits mainly focus on fully supervised segmentation and require full and accurate pixel-level annotations that are time-consuming and difficult to acquire for segmentation tasks, which makes learning from imperfect labels highly desired for reduci… ▽ More

    Submitted 4 February, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: 12 pages, 6 figures

    Journal ref: Computer Methods and Programs in Biomedicine, Volume 231, April 2023, 107398

  21. arXiv:2208.08605  [pdf, other

    cs.CV

    Contrastive Semi-supervised Learning for Domain Adaptive Segmentation Across Similar Anatomical Structures

    Authors: Ran Gu, **gyang Zhang, Guotai Wang, Wenhui Lei, Tao Song, Xiaofan Zhang, Kang Li, Shaoting Zhang

    Abstract: Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance for medical image segmentation, yet need plenty of manual annotations for training. Semi-Supervised Learning (SSL) methods are promising to reduce the requirement of annotations, but their performance is still limited when the dataset size and the number of annotated images are small. Leveraging existing annotated data… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 12 pages, 6 figures

  22. arXiv:2206.06813  [pdf, other

    eess.IV cs.CV cs.LG

    Learning towards Synchronous Network Memorizability and Generalizability for Continual Segmentation across Multiple Sites

    Authors: **gyang Zhang, Peng Xue, Ran Gu, Yuning Gu, Mianxin Liu, Yongsheng Pan, Zhiming Cui, Jiawei Huang, Lei Ma, Dinggang Shen

    Abstract: In clinical practice, a segmentation network is often required to continually learn on a sequential data stream from multiple sites rather than a consolidated set, due to the storage cost and privacy restriction. However, during the continual learning process, existing methods are usually restricted in either network memorizability on previous sites or generalizability on unseen sites. This paper… ▽ More

    Submitted 27 June, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Early accepted in MICCAI2022

  23. arXiv:2205.14833  [pdf, other

    cs.LG cs.DC eess.SY

    Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning

    Authors: Chengfei Lv, Chaoyue Niu, Renjie Gu, Xiaotang Jiang, Zhaode Wang, Bin Liu, Ziqi Wu, Qiulin Yao, Congyu Huang, Panos Huang, Tao Huang, Hui Shu, **de Song, Bin Zou, Peng Lan, Guohuan Xu, Fei Wu, Shaojie Tang, Fan Wu, Guihai Chen

    Abstract: To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a deployment platform, distributing ML tasks to billion-scale devices in time; a data pipeline, efficiently preparing task input; and a compute container, providing a c… ▽ More

    Submitted 29 May, 2022; originally announced May 2022.

    Comments: Accepted by OSDI 2022

  24. arXiv:2205.12602  [pdf, other

    cs.CV

    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

    Authors: Yuxing Chen, Renshu Gu, Ouhan Huang, Gangyong Jia

    Abstract: This paper presents Volumetric Transformer Pose estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flat… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  25. arXiv:2205.11252  [pdf

    cs.LG stat.AP

    Exploring the stimulative effect on following drivers in a consecutive lane-change using microscopic vehicle trajectory data

    Authors: Ruifeng Gu

    Abstract: Improper lane-changing behaviors may result in breakdown of traffic flow and the occurrence of various types of collisions. This study investigates lane-changing behaviors of multiple vehicles and the stimulative effect on following drivers in a consecutive lane-changing scenario. The microscopic trajectory data from the dataset are used for driving behavior analysis.Two discretionary lane-changin… ▽ More

    Submitted 4 June, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: 22 PAGES

  26. arXiv:2205.06551  [pdf, other

    cs.CV

    Contrastive Domain Disentanglement for Generalizable Medical Image Segmentation

    Authors: Ran Gu, Jiangshan Lu, **gyang Zhang, Wenhui Lei, Xiaofan Zhang, Guotai Wang, Shaoting Zhang

    Abstract: Efficiently utilizing discriminative features is crucial for convolutional neural networks to achieve remarkable performance in medical image segmentation and is also important for model generalization across multiple domains, where letting model recognize domain-specific and domain-invariant information among multi-site datasets is a reasonable strategy for domain generalization. Unfortunately, m… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: 10 pages, 2 figures

  27. arXiv:2205.01727  [pdf

    cs.RO stat.AP

    How to choose features to improve prediction performance in lane-changing intention: A meta-analysis

    Authors: Ruifeng Gu

    Abstract: Lane-change is a fundamental driving behavior and highly associated with various types of collisions, such as rear-end collisions, sideswipe collisions, and angle collisions and the increased risk of a traffic crash. This study investigates effectiveness of different features categories combination in lane-changing intention prediction. Studies related to lane-changing intention prediction have be… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: 15pages

  28. Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention

    Authors: Xinmeng Xu, Rongzhi Gu, Yuexian Zou

    Abstract: Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dual-microphone speech enhancement (DMSE) systems. However, learning the mutual relationship between artificially designed spatial and spectral features is hard in the end-to-end DMSE. In this work, a novel architecture for… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: Accepted by ICASSP 2022

  29. arXiv:2205.00661  [pdf, other

    cs.PL quant-ph

    Giallar: Push-Button Verification for the Qiskit Quantum Compiler

    Authors: Runzhou Tao, Yunong Shi, Jianan Yao, Xupeng Li, Ali Javadi-Abhari, Andrew W. Cross, Frederic T. Chong, Ronghui Gu

    Abstract: This paper presents Giallar, a fully-automated verification toolkit for quantum compilers. Giallar requires no manual specifications, invariants, or proofs, and can automatically verify that a compiler pass preserves the semantics of quantum circuits. To deal with unbounded loops in quantum compilers, Giallar abstracts three loop templates, whose loop invariants can be automatically inferred. To e… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: PLDI 2022; Improves arXiv:1908.08963

  30. arXiv:2204.07375  [pdf, other

    eess.AS cs.SD

    Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

    Authors: Zifeng Zhao, Rongzhi Gu, Dongchao Yang, **chuan Tian, Yuexian Zou

    Abstract: Dominant researches adopt supervised training for speaker extraction, while the scarcity of ideally clean corpus and channel mismatch problem are rarely considered. To this end, we propose speaker-aware mixture of mixtures training (SAMoM), utilizing the consistency of speaker identity among target source, enrollment utterance and target estimate to weakly supervise the training of a deep speaker… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

    Comments: 5 pages, 4 tables, 4 figures. Submitted to INTERSPEECH 2022

  31. arXiv:2204.01355  [pdf, other

    eess.AS cs.SD

    Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

    Authors: Zifeng Zhao, Dongchao Yang, Rongzhi Gu, Haoran Zhang, Yuexian Zou

    Abstract: Recently, end-to-end speaker extraction has attracted increasing attention and shown promising results. However, its performance is often inferior to that of a blind source separation (BSS) counterpart with a similar network architecture, due to the auxiliary speaker encoder may sometimes generate ambiguous speaker embeddings. Such ambiguous guidance information may confuse the separation network… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: 5 pages, 1 table, 5 figures. Submitted to INTERSPEECH 2022

  32. arXiv:2203.16772  [pdf, other

    cs.SD cs.AI eess.AS

    Learning Decoupling Features Through Orthogonality Regularization

    Authors: Li Wang, Rongzhi Gu, Weiji Zhuang, Peng Gao, Yujun Wang, Yuexian Zou

    Abstract: Keyword spotting (KWS) and speaker verification (SV) are two important tasks in speech applications. Research shows that the state-of-art KWS and SV models are trained independently using different datasets since they expect to learn distinctive acoustic features. However, humans can distinguish language content and the speaker identity simultaneously. Motivated by this, we believe it is important… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted at ICASSP 2022

  33. arXiv:2201.10382  [pdf, other

    cs.LG cs.AI

    On-Device Learning with Cloud-Coordinated Data Augmentation for Extreme Model Personalization in Recommender Systems

    Authors: Renjie Gu, Chaoyue Niu, Yikai Yan, Fan Wu, Shaojie Tang, Rongfeng Jia, Chengfei Lyu, Guihai Chen

    Abstract: Data heterogeneity is an intrinsic property of recommender systems, making models trained over the global data on the cloud, which is the mainstream in industry, non-optimal to each individual user's local data distribution. To deal with data heterogeneity, model personalization with on-device learning is a potential solution. However, on-device training using a user's small size of local samples… ▽ More

    Submitted 23 January, 2022; originally announced January 2022.

  34. arXiv:2111.10773  [pdf, other

    eess.IV cs.CV

    One-shot Weakly-Supervised Segmentation in Medical Images

    Authors: Wenhui Lei, Qi Su, Ran Gu, Na Wang, Xinglong Liu, Guotai Wang, Xiaofan Zhang, Shaoting Zhang

    Abstract: Deep neural networks usually require accurate and a large number of annotations to achieve outstanding performance in medical image segmentation. One-shot segmentation and weakly-supervised learning are promising research directions that lower labeling effort by learning a new class from only one annotated image and utilizing coarse labels instead, respectively. Previous works usually fail to leve… ▽ More

    Submitted 21 November, 2021; originally announced November 2021.

  35. arXiv:2111.10372  [pdf, other

    eess.IV cs.CV

    Resistance-Time Co-Modulated PointNet for Temporal Super-Resolution Simulation of Blood Vessel Flows

    Authors: Zhizheng Jiang, Fei Gao, Renshu Gu, **lan Xu, Gang Xu, Timon Rabczuk

    Abstract: In this paper, a novel deep learning framework is proposed for temporal super-resolution simulation of blood vessel flows, in which a high-temporal-resolution time-varying blood vessel flow simulation is generated from a low-temporal-resolution flow simulation result. In our framework, point-cloud is used to represent the complex blood vessel model, resistance-time aided PointNet model is proposed… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

  36. arXiv:2109.08852  [pdf, other

    eess.IV cs.CV

    Domain Composition and Attention for Unseen-Domain Generalizable Medical Image Segmentation

    Authors: Ran Gu, **gyang Zhang, Rui Huang, Wenhui Lei, Guotai Wang, Shaoting Zhang

    Abstract: Domain generalizable model is attracting increasing attention in medical image analysis since data is commonly acquired from different institutes with various imaging protocols and scanners. To tackle this challenging domain generalization problem, we propose a Domain Composition and Attention-based network (DCA-Net) to improve the ability of domain representation and generalization. First, we pre… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: Accepted by MICCAI 2021

  37. arXiv:2108.07014  [pdf, ps, other

    cs.IT eess.SY

    Robust Beamforming Design for Rate Splitting Multiple Access-Aided MISO Visible Light Communications

    Authors: Shuai Ma, Guanjie Zhang, Zhi Zhang, Rongyan Gu

    Abstract: In this paper, we focus on the optimal beamformer design for rate splitting multiple access (RSMA)-aided multipleinput single-output (MISO) visible light communication (VLC) networks. First, we derive the closed-form lower bounds of the achievable rate of each user, which are the first theoretical bound of achievable rate for RSMA-aided VLC networks. Second, we investigate the optimal beamformer d… ▽ More

    Submitted 1 November, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

  38. arXiv:2108.06029  [pdf, other

    cs.CV

    Track without Appearance: Learn Box and Tracklet Embedding with Local and Global Motion Patterns for Vehicle Tracking

    Authors: Gaoang Wang, Renshu Gu, Zuozhu Liu, Weijie Hu, Mingli Song, Jenq-Neng Hwang

    Abstract: Vehicle tracking is an essential task in the multi-object tracking (MOT) field. A distinct characteristic in vehicle tracking is that the trajectories of vehicles are fairly smooth in both the world coordinate and the image coordinate. Hence, models that capture motion consistencies are of high necessity. However, tracking with the standalone motion-based trackers is quite challenging because targ… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

  39. arXiv:2108.05516  [pdf, other

    cs.SD cs.AI

    Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

    Authors: Li Wang, Rongzhi Gu, Nuo Chen, Yuexian Zou

    Abstract: Keyword Spotting (KWS) remains challenging to achieve the trade-off between small footprint and high accuracy. Recently proposed metric learning approaches improved the generalizability of models for the KWS task, and 1D-CNN based KWS models have achieved the state-of-the-arts (SOTA) in terms of model size. However, for metric learning, due to data limitations, the speech anchor is highly suscepti… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: Accepted for Interspeech2021

  40. LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic Occlusion-Aware Data and Neural Mesh Rendering

    Authors: Kaibing Yang, Renshu Gu, Maoyu Wang, Masahiro Toyoura, Gang Xu

    Abstract: A key challenge in the task of human pose and shape estimation is occlusion, including self-occlusions, object-human occlusions, and inter-person occlusions. The lack of diverse and accurate pose and shape training data becomes a major bottleneck, especially for scenes with occlusions in the wild. In this paper, we focus on the estimation of human pose and shape in the case of inter-person occlusi… ▽ More

    Submitted 29 January, 2022; v1 submitted 31 July, 2021; originally announced August 2021.

  41. arXiv:2107.11992  [pdf, other

    cs.CV

    HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration

    Authors: Fan Lu, Guang Chen, Yinlong Liu, Lijun Zhang, Sanqing Qu, Shu Liu, Rongqi Gu

    Abstract: Point cloud registration is a fundamental problem in 3D computer vision. Outdoor LiDAR point clouds are typically large-scale and complexly distributed, which makes the registration challenging. In this paper, we propose an efficient hierarchical network named HRegNet for large-scale outdoor LiDAR point cloud registration. Instead of using all points in the point clouds, HRegNet performs registrat… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

    Comments: Accepted to ICCV 2021

  42. arXiv:2106.13834  [pdf, other

    cs.LG cs.NE

    Ladder Polynomial Neural Networks

    Authors: Li-** Liu, Ruiyuan Gu, Xiaozhe Hu

    Abstract: Polynomial functions have plenty of useful analytical properties, but they are rarely used as learning models because their function class is considered to be restricted. This work shows that when trained properly polynomial functions can be strong learning models. Particularly this work constructs polynomial feedforward neural networks using the product activation, a new activation function const… ▽ More

    Submitted 29 June, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: The work has been first submitted to ICLR 2019 (submission link). Unfortunately the contribution was not sufficiently appreciated by reviewers

  43. arXiv:2106.07037  [pdf, other

    cs.DB

    Hash Adaptive Bloom Filter

    Authors: Rongbiao Xie, Meng Li, Zheyu Miao, Rong Gu, He Huang, Haipeng Dai, Guihai Chen

    Abstract: Bloom filter is a compact memory-efficient probabilistic data structure supporting membership testing, i.e., to check whether an element is in a given set. However, as Bloom filter maps each element with uniformly random hash functions, few flexibilities are provided even if the information of negative keys (elements are not in the set) are available. The problem gets worse when the misidentificat… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

    Comments: 11 pages, accepted by ICDE 2021

  44. arXiv:2105.02674  [pdf, other

    eess.IV cs.CV

    SS-CADA: A Semi-Supervised Cross-Anatomy Domain Adaptation for Coronary Artery Segmentation

    Authors: **gyang Zhang, Ran Gu, Guotai Wang, Hongzhi Xie, Lixu Gu

    Abstract: The segmentation of coronary arteries by convolutional neural network is promising yet requires a large amount of labor-intensive manual annotations. Transferring knowledge from retinal vessels in widely-available public labeled fundus images (FIs) has a potential to reduce the annotation requirement for coronary artery segmentation in X-ray angiograms (XAs) due to their common tubular structures.… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

  45. arXiv:2105.02426  [pdf, other

    cs.CV

    Split and Connect: A Universal Tracklet Booster for Multi-Object Tracking

    Authors: Gaoang Wang, Yizhou Wang, Renshu Gu, Weijie Hu, Jenq-Neng Hwang

    Abstract: Multi-object tracking (MOT) is an essential task in the computer vision field. With the fast development of deep learning technology in recent years, MOT has achieved great improvement. However, some challenges still remain, such as sensitiveness to occlusion, instability under different lighting conditions, non-robustness to deformable objects, etc. To address such common challenges in most of th… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

  46. arXiv:2105.00812  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency

    Authors: **chuan Tian, Rongzhi Gu, Helin Wang, Yuexian Zou

    Abstract: Transformer-based self-supervised models are trained as feature extractors and have empowered many downstream speech tasks to achieve state-of-the-art performance. However, both the training and inference process of these models may encounter prohibitively high computational cost and large parameter budget. Although Parameter Sharing Strategy (PSS) proposed in ALBERT paves the way for parameter re… ▽ More

    Submitted 8 April, 2021; originally announced May 2021.

    Comments: 5 pages, 3 figures, submit to Interspeech2021

  47. Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain

    Authors: Rongzhi Gu, Shi-Xiong Zhang, Yuexian Zou, Dong Yu

    Abstract: To date, mainstream target speech separation (TSS) approaches are formulated to estimate the complex ratio mask (cRM) of the target speech in time-frequency domain under supervised deep learning framework. However, the existing deep models for estimating cRM are designed in the way that the real and imaginary parts of the cRM are separately modeled using real-valued training data pairs. The resear… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: 5 pages, 3 figures

  48. arXiv:2104.06349  [pdf, other

    cs.PL quant-ph

    Gleipnir: Toward Practical Error Analysis for Quantum Programs (Extended Version)

    Authors: Runzhou Tao, Yunong Shi, Jianan Yao, John Hui, Frederic T. Chong, Ronghui Gu

    Abstract: Practical error analysis is essential for the design, optimization, and evaluation of Noisy Intermediate-Scale Quantum(NISQ) computing. However, bounding errors in quantum programs is a grand challenge, because the effects of quantum errors depend on exponentially large quantum states. In this work, we present Gleipnir, a novel methodology toward practically computing verified error bounds in quan… ▽ More

    Submitted 19 April, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

    Comments: typos corrected

  49. arXiv:2103.02209  [pdf, other

    cs.PL cs.CR

    SciviK: A Versatile Framework for Specifying and Verifying Smart Contracts

    Authors: Shaokai Lin, Xinyuan Sun, Jianan Yao, Ronghui Gu

    Abstract: The growing adoption of smart contracts on blockchains poses new security risks that can lead to significant monetary loss, while existing approaches either provide no (or partial) security guarantees for smart contracts or require huge proof effort. To address this challenge, we present SciviK, a versatile framework for specifying and verifying industrial-grade smart contracts. SciviK's versatile… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: 22 pages, 16 figures

  50. arXiv:2102.01897  [pdf, other

    eess.IV cs.CV

    Automatic Segmentation of Organs-at-Risk from Head-and-Neck CT using Separable Convolutional Neural Network with Hard-Region-Weighted Loss

    Authors: Wenhui Lei, Haochen Mei, Zhengwentai Sun, Shan Ye, Ran Gu, Huan Wang, Rui Huang, Shichuan Zhang, Shaoting Zhang, Guotai Wang

    Abstract: Nasopharyngeal Carcinoma (NPC) is a leading form of Head-and-Neck (HAN) cancer in the Arctic, China, Southeast Asia, and the Middle East/North Africa. Accurate segmentation of Organs-at-Risk (OAR) from Computed Tomography (CT) images with uncertainty information is critical for effective planning of radiation therapy for NPC treatment. Despite the stateof-the-art performance achieved by Convolutio… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Comments: Accepted by Neurocomputing