Skip to main content

Showing 1–23 of 23 results for author: Wan, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.01584  [pdf, other

    cs.CL cs.LG eess.SP

    Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression

    Authors: Li Wan, Tansu Alpcan, Margreta Kuijper, Emanuele Viterbo

    Abstract: We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a dictionary from text datasets, focusing on the conceptual significance of dictionary elements. Subsequently, dictionaries are refined considering label data, opti… ▽ More

    Submitted 28 April, 2024; originally announced May 2024.

    Comments: 12 pages, TKDE format

  2. arXiv:2401.04283  [pdf, ps, other

    eess.AS cs.SD

    FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation

    Authors: Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei

    Abstract: Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted. In this paper, we propose DI-AEC, pioneering a diffusion-based stochastic regeneration approach dedicated to AEC. Further, we propose FADI-AEC, fast score-based diffusion AEC framework to save computational demands, making it favorable for edge devices. It stan… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  3. arXiv:2309.10993  [pdf, other

    cs.SD cs.HC eess.AS

    Directional Source Separation for Robust Speech Recognition on Smart Glasses

    Authors: Tiantian Feng, Ju Lin, Yiteng Huang, Weipeng He, Kaustubh Kalgaonkar, Niko Moritz, Li Wan, Xin Lei, Ming Sun, Frank Seide

    Abstract: Modern smart glasses leverage advanced audio sensing and machine learning technologies to offer real-time transcribing and captioning services, considerably enriching human experiences in daily communications. However, such systems frequently encounter challenges related to environmental noises, resulting in degradation to speech recognition and speaker change detection. To improve voice quality,… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  4. arXiv:2308.10601  [pdf, other

    cs.CV cs.CR cs.LG eess.IV

    Improving the Transferability of Adversarial Examples with Arbitrary Style Transfer

    Authors: Zhi** Ge, Fanhua Shang, Hongying Liu, Yuanyuan Liu, Liang Wan, Wei Feng, Xiaosen Wang

    Abstract: Deep neural networks are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on clean inputs. Although many attack methods can achieve high success rates in the white-box setting, they also exhibit weak transferability in the black-box setting. Recently, various methods have been proposed to improve adversarial transferability, in which the input transformation… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: 10 pages, 2 figures, accepted by the 31st ACM International Conference on Multimedia (MM '23)

  5. arXiv:2308.05784  [pdf, other

    eess.IV cs.CV

    High-performance Data Management for Whole Slide Image Analysis in Digital Pathology

    Authors: Haoju Leng, Ruining Deng, Shunxing Bao, Dazheng Fang, Bryan A. Millis, Yucheng Tang, Haichun Yang, Xiao Wang, Yifan Peng, Lipeng Wan, Yuankai Huo

    Abstract: When dealing with giga-pixel digital pathology in whole-slide imaging, a notable proportion of data records holds relevance during each analysis operation. For instance, when deploying an image analysis algorithm on whole-slide images (WSI), the computational bottleneck often lies in the input-output (I/O) system. This is particularly notable as patch-level processing introduces a considerable I/O… ▽ More

    Submitted 20 August, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

  6. arXiv:2307.00638  [pdf, other

    eess.SY

    Semi-automated Thermal Envelope Model Setup for Adaptive Model Predictive Control with Event-triggered System Identification

    Authors: Lu Wan, Xiaobing Dai, Torsten Welfonder, Ekaterina Petrova, Pieter Pauwels

    Abstract: To reach carbon neutrality in the middle of this century, smart controls for building energy systems are urgently required. Model predictive control (MPC) demonstrates great potential in improving the performance of heating ventilation and air-conditioning (HVAC) systems, whereas its wide application in the building sector is impeded by the considerable manual efforts involved in setting up the co… ▽ More

    Submitted 11 July, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

  7. arXiv:2306.08956  [pdf, other

    cs.SD eess.AS stat.ML

    Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement

    Authors: Liang Wan, Hongqing Liu, Yi Zhou, Jie Ji

    Abstract: The Dual-Path Convolution Recurrent Network (DPCRN) was proposed to effectively exploit time-frequency domain information. By combining the DPRNN module with Convolution Recurrent Network (CRN), the DPCRN obtained a promising performance in speech separation with a limited model size. In this paper, we explore self-attention in the DPCRN module and design a model called Multi-Loss Convolutional Ne… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  8. arXiv:2305.14566  [pdf, other

    eess.IV cs.CV

    An Accelerated Pipeline for Multi-label Renal Pathology Image Segmentation at the Whole Slide Image Level

    Authors: Haoju Leng, Ruining Deng, Zuhayr Asad, R. Michael Womick, Haichun Yang, Lipeng Wan, Yuankai Huo

    Abstract: Deep-learning techniques have been used widely to alleviate the labour-intensive and time-consuming manual annotation required for pixel-level tissue characterization. Our previous study introduced an efficient single dynamic network - Omni-Seg - that achieved multi-class multi-scale pathological segmentation with less computational complexity. However, the patch-wise segmentation paradigm still a… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  9. arXiv:2303.10326  [pdf, other

    eess.IV cs.CV

    Diff-UNet: A Diffusion Embedded Network for Volumetric Segmentation

    Authors: Zhaohu Xing, Liang Wan, Huazhu Fu, Guang Yang, Lei Zhu

    Abstract: In recent years, Denoising Diffusion Models have demonstrated remarkable success in generating semantically valuable pixel-wise representations for image generative modeling. In this study, we propose a novel end-to-end framework, called Diff-UNet, for medical volumetric segmentation. Our approach integrates the diffusion model into a standard U-shaped architecture to extract semantic information… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: 8 pages

  10. arXiv:2302.08950  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

    Authors: Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun

    Abstract: Wake word detection exists in most intelligent homes and portable devices. It offers these devices the ability to "wake up" when summoned at a low cost of power and computing. This paper focuses on understanding alignment's role in develo** a wake-word system that answers a generic phrase. We discuss three approaches. The first is alignment-based, where the model is trained with frame-wise cross… ▽ More

    Submitted 7 June, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted to Interspeech 2023

  11. arXiv:2211.04635  [pdf, other

    cs.LG cs.AI eess.AS

    LiCo-Net: Linearized Convolution Network for Hardware-efficient Keyword Spotting

    Authors: Haichuan Yang, Zhaojun Yang, Li Wan, Biqiao Zhang, Yangyang Shi, Yiteng Huang, Ivaylo Enchev, Limin Tang, Raziel Alvarez, Ming Sun, Xin Lei, Raghuraman Krishnamoorthi, Vikas Chandra

    Abstract: This paper proposes a hardware-efficient architecture, Linearized Convolution Network (LiCo-Net) for keyword spotting. It is optimized specifically for low-power processor units like microcontrollers. ML operators exhibit heterogeneous efficiency profiles on power-efficient hardware. Given the exact theoretical computation cost, int8 operators are more computation-effective than float operators, a… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

  12. arXiv:2208.14876  [pdf, other

    eess.IV cs.CV

    NestedFormer: Nested Modality-Aware Transformer for Brain Tumor Segmentation

    Authors: Zhaohu Xing, Lequan Yu, Liang Wan, Tong Han, Lei Zhu

    Abstract: Multi-modal MR imaging is routinely used in clinical practice to diagnose and investigate brain tumors by providing rich complementary information. Previous multi-modal MRI segmentation methods usually perform modal fusion by concatenating multi-modal MRIs at an early/middle stage of the network, which hardly explores non-linear dependencies between modalities. In this work, we propose a novel Nes… ▽ More

    Submitted 31 August, 2022; originally announced August 2022.

    Comments: MICCAI2022

  13. arXiv:2112.04459  [pdf, other

    eess.AS cs.LG cs.SD

    Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization

    Authors: Mufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold, Li Wan

    Abstract: Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. In this study, we propose an effective self-supervised learning framework and a novel regularization strategy to facilitate self-supervised speaker representation learning. Different from contrastive learning-based self-supervised learning methods, the prop… ▽ More

    Submitted 1 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Accepted to ICASSP 2022

  14. arXiv:2107.07101  [pdf, other

    cs.IT eess.SP

    Joint CFO, Gridless Channel Estimation and Data Detection for Underwater Acoustic OFDM Systems

    Authors: Lei Wan, Jiang Zhu, En Cheng, Zhiwei Xu

    Abstract: In this paper, we propose an iterative receiver based on gridless variational Bayesian line spectra estimation (VALSE) named JCCD-VALSE that \emph{j}ointly estimates the \emph{c}arrier frequency offset (CFO), the \emph{c}hannel with high resolution and carries out \emph{d}ata decoding. Based on a modularized point of view and motivated by the high resolution and low complexity gridless VALSE algor… ▽ More

    Submitted 14 July, 2021; originally announced July 2021.

  15. arXiv:2103.05532  [pdf, other

    eess.SP

    Potential Advantages of Peak Picking Multi-Voltage Threshold Digitizer in Energy Determination in Radiation Measurement

    Authors: Kezhang Zhu, Junhua Mei, Yuming Su, **** Dai, Nicola D'Ascenzo, Hao Wang, Peng Xiao, Lin Wan, Qingguo Xie

    Abstract: The Multi-voltage Threshold (MVT) method, which samples the signal by certain reference voltages, has been well developed as being adopted in pre-clinical and clinical digital positron emission tomography(PET) system. To improve its energy measurement performance, we propose a Peak Picking MVT(PP-MVT) Digitizer in this paper. Firstly, a sampled Peak Point(the highest point in pulse signal), which… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: 14 pages, 8 figures, 1 table

  16. arXiv:1910.09687  [pdf, other

    cs.LG eess.AS stat.ML

    Signal Combination for Language Identification

    Authors: Shengye Wang, Li Wan, Yang Yu, Ignacio Lopez Moreno

    Abstract: Google's multilingual speech recognition system combines low-level acoustic signals with language-specific recognizer signals to better predict the language of an utterance. This paper presents our experience with different signal combination methods to improve overall language identification accuracy. We compare the performance of a lattice-based ensemble model and a deep neural network model to… ▽ More

    Submitted 4 November, 2019; v1 submitted 21 October, 2019; originally announced October 2019.

  17. arXiv:1908.04284  [pdf, other

    eess.AS cs.LG stat.ML

    Personal VAD: Speaker-Conditioned Voice Activity Detection

    Authors: Shao** Ding, Quan Wang, Shuo-yiin Chang, Li Wan, Ignacio Lopez Moreno

    Abstract: In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level. This system is useful for gating the inputs to a streaming on-device speech recognition system, such that it only triggers for the target user, which helps reduce the computational cost and battery consumption, especially in scenarios where a keyword detector is unpreferable. We… ▽ More

    Submitted 8 April, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

    Comments: Speaker Odyssey 2020

  18. arXiv:1907.11458  [pdf, other

    cs.CV eess.IV

    Multiple Human Association between Top and Horizontal Views by Matching Subjects' Spatial Distributions

    Authors: Ruize Han, Yujun Zhang, Wei Feng, Chenxing Gong, Xiaoyu Zhang, Jiewen Zhao, Liang Wan, Song Wang

    Abstract: Video surveillance can be significantly enhanced by using both top-view data, e.g., those from drone-mounted cameras in the air, and horizontal-view data, e.g., those from wearable cameras on the ground. Collaborative analysis of different-view data can facilitate various kinds of applications, such as human tracking, person identification, and human activity recognition. However, for such collabo… ▽ More

    Submitted 26 July, 2019; originally announced July 2019.

  19. arXiv:1811.12290  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Tuplemax Loss for Language Identification

    Authors: Li Wan, Prashant Sridhar, Yang Yu, Quan Wang, Ignacio Lopez Moreno

    Abstract: In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages. We want to model such prior knowledge into the way we train our neural networks, by replacing the commonly used softmax loss function with a novel loss function named tuplemax loss. As a matter of fact, a typical language ident… ▽ More

    Submitted 17 February, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: Submitted to ICASSP 2019

  20. arXiv:1712.05587  [pdf, ps, other

    eess.SP

    DOA and Polarization Estimation for Non-Circular Signals in 3-D Millimeter Wave Polarized Massive MIMO Systems

    Authors: Liangtian Wan, Kaihui Liu, Ying-Chang Liang, Tong Zhu

    Abstract: In this paper, an algorithm of multiple signal classification (MUSIC) is proposed for two-dimensional (2-D) direction of- arrival (DOA) and polarization estimation of non-circular signal in three-dimensional (3-D) millimeter wave polarized largescale/ massive multiple-input-multiple-output (MIMO) systems. The traditional MUSIC-based algorithms can estimate either the DOA and polarization for circu… ▽ More

    Submitted 15 December, 2017; originally announced December 2017.

  21. arXiv:1710.10470  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Attention-Based Models for Text-Dependent Speaker Verification

    Authors: F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan

    Abstract: Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence. In this paper, we analyze the usage of attention mechanisms to the problem of sequence summarization in our end-to-end text-dependen… ▽ More

    Submitted 31 January, 2018; v1 submitted 28 October, 2017; originally announced October 2017.

    Comments: Submitted to ICASSP 2018

  22. arXiv:1710.10468  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Speaker Diarization with LSTM

    Authors: Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno

    Abstract: For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based audio embeddings, also known as d-vectors, have consistently demonstrated superior speaker verification performance. In this paper, we build on the success of d-vecto… ▽ More

    Submitted 23 January, 2022; v1 submitted 28 October, 2017; originally announced October 2017.

    Comments: Published at ICASSP 2018

  23. arXiv:1710.10467  [pdf, other

    eess.AS cs.CL cs.LG stat.ML

    Generalized End-to-End Loss for Speaker Verification

    Authors: Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno

    Abstract: In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function. Unlike TE2E, the GE2E loss function updates the network in a way that emphasizes examples that are difficult to verify at each step of the training process. Additionally, the GE… ▽ More

    Submitted 9 November, 2020; v1 submitted 28 October, 2017; originally announced October 2017.

    Comments: Published at ICASSP 2018