Skip to main content

Showing 1–38 of 38 results for author: Zheng, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.19706  [pdf, other

    cs.SD eess.AS

    SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR

    Authors: Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

    Abstract: Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker adaptive mixture of LoRA experts (SAML) approach, which uses low-rank adaptation (LoRA) modules as experts to reduce the number of trainable parameters… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 5 pages, accepted by Interspeech 2024. arXiv admin note: substantial text overlap with arXiv:2309.09136

  2. arXiv:2406.18361  [pdf, other

    cs.CV cs.AI eess.IV

    Stable Diffusion Segmentation for Biomedical Images with Single-step Reverse Process

    Authors: Tianyu Lin, Zhiguang Chen, Zhonghao Yan, Weijiang Yu, Fudan Zheng

    Abstract: Diffusion models have demonstrated their effectiveness across various generative tasks. However, when applied to medical image segmentation, these models encounter several challenges, including significant resource and time requirements. They also necessitate a multi-step reverse process and multiple samples to produce reliable predictions. To address these challenges, we introduce the first laten… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted at MICCAI 2024. Code and citation info see https://github.com/lin-tianyu/Stable-Diffusion-Seg

  3. arXiv:2404.03179  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

    Authors: Tiantian Geng, Teng Wang, Yanfu Zhang, **ming Duan, Weili Guan, Feng Zheng

    Abstract: Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  4. arXiv:2312.09464  [pdf, other

    eess.SP

    Enhanced Eye Diagram Estimation Method for Nonlinear Systems With Input Jitter

    Authors: Hanqing Zhang, Feijun Zheng

    Abstract: An enhanced multiple-edge response (MER) based eye diagram estimation method is proposed to evaluate the performance of nonlinear systems with input jitter. Compared with existing MER-based methods which only took into account the bit effect, the proposed method first determines both orders of bit effect and jitter effect. These decided orders can affirm the necessary MERs. Subsequently, the propo… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

    Comments: The article was accepted but not published by EMC+SIPI 2023 because of failure to attend the conference for personal emergency reason. The information is attached at the end of article

  5. arXiv:2311.03887  [pdf, other

    physics.optics eess.IV physics.med-ph

    Toward ground-truth optical coherence tomography via three-dimensional unsupervised deep learning processing and data

    Authors: Renxiong Wu, Fei Zheng, Meixuan Li, Shaoyan Huang, Xin Ge, Linbo Liu, Yong Liu, Guangming Ni

    Abstract: Optical coherence tomography (OCT) can perform non-invasive high-resolution three-dimensional (3D) imaging and has been widely used in biomedical fields, while it is inevitably affected by coherence speckle noise which degrades OCT imaging performance and restricts its applications. Here we present a novel speckle-free OCT imaging strategy, named toward-ground-truth OCT (tGT-OCT), that utilizes un… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  6. arXiv:2309.09136  [pdf, other

    cs.SD cs.AI eess.AS

    Enhancing Quantised End-to-End ASR Models via Personalisation

    Authors: Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

    Abstract: Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices. Model quantisation is an effective solution that sometimes causes the word error rate (WER) to increase. In this paper, a novel strategy of personalisation for a quantised model (PQM) is proposed, which combines speaker ad… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Comments: 5 pages, submitted to ICASSP 2024

  7. arXiv:2307.04827  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad

    Authors: Siting Xu, Yunlong Tang, Feng Zheng

    Abstract: Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the la… ▽ More

    Submitted 23 July, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted by International Computer Music Conference (ICMC) 2023

  8. arXiv:2307.04296  [pdf, other

    eess.IV cs.CV

    K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality Assessment

    Authors: Guoyang Xie, **bao Wang, Yawen Huang, Jiayi Lyu, Feng Zheng, Yefeng Zheng, Yaochu **

    Abstract: The problem of how to assess cross-modality medical image synthesis has been largely unexplored. The most used measures like PSNR and SSIM focus on analyzing the structural features but neglect the crucial lesion location and fundamental k-space speciality of medical images. To overcome this problem, we propose a new metric K-CROSS to spur progress on this challenging problem. Specifically, K-CROS… ▽ More

    Submitted 9 February, 2024; v1 submitted 9 July, 2023; originally announced July 2023.

  9. arXiv:2306.01458  [pdf, ps, other

    cs.IT eess.SP eess.SY

    Extremely Large-scale Array Systems: Near-Field Codebook Design and Performance Analysis

    Authors: Feng Zheng, Hongkang Yu, Chenchen Wang, Luyang Sun, Qingqing Wu, Yijian Chen

    Abstract: Extremely Large-scale Array (ELAA) promises to deliver ultra-high data rates with increased antenna elements. However, increasing antenna elements leads to a wider realm of near-field, which challenges the traditional design of codebooks. In this paper, we propose novel near-field codebook schemes based on the fitting formula of codewords' quantization performance. First, we analyze the quantizati… ▽ More

    Submitted 24 August, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

  10. arXiv:2305.16105  [pdf, ps, other

    eess.SP

    Joint Uplink and Downlink Resource Allocation Towards Energy-efficient Transmission for URLLC

    Authors: Kang Li, Pengcheng Zhu, Yan Wang, Fu-Chun Zheng, Xiaohu You

    Abstract: Ultra-reliable and low-latency communications (URLLC) is firstly proposed in 5G networks, and expected to support applications with the most stringent quality-of-service (QoS). However, since the wireless channels vary dynamically, the transmit power for ensuring the QoS requirements of URLLC may be very high, which conflicts with the power limitation of a real system. To fulfill the successful UR… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 16 pages, 11 figures

  11. arXiv:2303.12930  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

    Authors: Tiantian Geng, Teng Wang, **ming Duan, Runmin Cong, Feng Zheng

    Abstract: Existing audio-visual event localization (AVE) handles manually trimmed videos with only a single instance in each of them. However, this setting is unrealistic as natural videos often contain numerous audio-visual events with different categories. To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize a… ▽ More

    Submitted 24 March, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023

  12. arXiv:2211.07143  [pdf

    eess.IV cs.CV

    WSC-Trans: A 3D network model for automatic multi-structural segmentation of temporal bone CT

    Authors: Xin Hua, Zhijiang Du, Hongjian Yu, Jixin Ma, Fanjun Zheng, Cheng Zhang, Qiaohui Lu, Hui Zhao

    Abstract: Cochlear implantation is currently the most effective treatment for patients with severe deafness, but mastering cochlear implantation is extremely challenging because the temporal bone has extremely complex and small three-dimensional anatomical structures, and it is important to avoid damaging the corresponding structures when performing surgery. The spatial location of the relevant anatomical t… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: 10 pages,7 figures

  13. arXiv:2207.09647  [pdf, other

    eess.SP

    Deep Learning Based Automatic Modulation Recognition: Models, Datasets, and Challenges

    Authors: Fuxin Zhang, Chunbo Luo, Jialang Xu, Yang Luo, FuChun Zheng

    Abstract: Automatic modulation recognition (AMR) detects the modulation scheme of the received signals for further signal processing without needing prior information, and provides the essential function when such information is missing. Recent breakthroughs in deep learning (DL) have laid the foundation for develo** high-performance DL-AMR approaches for communications systems. Comparing with traditional… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  14. arXiv:2207.06918  [pdf, ps, other

    eess.SP cs.LG

    Interference-Limited Ultra-Reliable and Low-Latency Communications: Graph Neural Networks or Stochastic Geometry?

    Authors: Yuhong Liu, Changyang She, Yi Zhong, Wibowo Hardjawana, Fu-Chun Zheng, Branka Vucetic

    Abstract: In this paper, we aim to improve the Quality-of-Service (QoS) of Ultra-Reliability and Low-Latency Communications (URLLC) in interference-limited wireless networks. To obtain time diversity within the channel coherence time, we first put forward a random repetition scheme that randomizes the interference power. Then, we optimize the number of reserved slots and the number of repetitions for each p… ▽ More

    Submitted 18 July, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: Submitted to IEEE journal for possible publication

  15. arXiv:2207.06057  [pdf, other

    cs.SD cs.MM eess.AS

    Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion

    Authors: Jian Ma, Zhedong Zheng, Hao Fei, Feng Zheng, Tat-seng Chua, Yi Yang

    Abstract: Voice conversion is to generate a new speech with the source content and a target voice style. In this paper, we focus on one general setting, i.e., non-parallel many-to-many voice conversion, which is close to the real-world scenario. As the name implies, non-parallel many-to-many voice conversion does not require the paired source and reference speeches and can be applied to arbitrary voice tran… ▽ More

    Submitted 27 July, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

  16. arXiv:2206.13741  [pdf, other

    cs.NI eess.SP

    Social-aware Cooperative Caching in Fog Radio Access Networks

    Authors: Baotian Fan, Yanxiang Jiang, Fu-Chun Zheng, Mehdi Bennis, Xiaohu You

    Abstract: In this paper, the cooperative caching problem in fog radio access networks (F-RANs) is investigated to jointly optimize the transmission delay and energy consumption. Exploiting the potential social relationships among fog access points (F-APs), we firstly propose a clustering scheme based on hedonic coalition game (HCG) to improve the potential cooperation gain. Then, considering that the optimi… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 6 pages, 5 figures. This paper has been accepted by IEEE ICC 2022

  17. arXiv:2206.11556  [pdf, other

    eess.SP

    A Federated Reinforcement Learning Method with Quantization for Cooperative Edge Caching in Fog Radio Access Networks

    Authors: Yanxiang Jiang, Min Zhang, Fu-Chun Zheng, Yan Chen, Mehdi Bennis, Xiaohu You

    Abstract: In this paper, cooperative edge caching problem is studied in fog radio access networks (F-RANs). Given the non-deterministic polynomial hard (NP-hard) property of the problem, a dueling deep Q network (Dueling DQN) based caching update algorithm is proposed to make an optimal caching decision by learning the dynamic network environment. In order to protect user data privacy and solve the problem… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 14 pages,12 figures

  18. arXiv:2203.10897  [pdf, other

    cs.CV eess.IV

    Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression

    Authors: Xiaosu Zhu, **gkuan Song, Lianli Gao, Feng Zheng, Heng Tao Shen

    Abstract: Modeling latent variables with priors and hyperpriors is an essential problem in variational image compression. Formally, trade-off between rate and distortion is handled well if priors and hyperpriors precisely describe latent variables. Current practices only adopt univariate priors and process each variable individually. However, we find inter-correlations and intra-correlations exist when obse… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022

  19. arXiv:2202.06997  [pdf, other

    eess.IV cs.CV

    Cross-Modality Neuroimage Synthesis: A Survey

    Authors: Guoyang Xie, Yawen Huang, **bao Wang, Jiayi Lyu, Feng Zheng, Yefeng Zheng, Yaochu **

    Abstract: Multi-modality imaging improves disease diagnosis and reveals distinct deviations in tissues with anatomical properties. The existence of completely aligned and paired multi-modality neuroimaging data has proved its effectiveness in brain research. However, collecting fully aligned and paired data is expensive or even impractical, since it faces many difficulties, including high cost, long acquisi… ▽ More

    Submitted 21 September, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

  20. arXiv:2201.12589  [pdf, other

    eess.IV cs.CV

    FedMed-ATL: Misaligned Unpaired Brain Image Synthesis via Affine Transform Loss

    Authors: **bao Wang, Guoyang Xie, Yawen Huang, Yefeng Zheng, Yaochu **, Feng Zheng

    Abstract: The existence of completely aligned and paired multi-modal neuroimaging data has proved its effectiveness in the diagnosis of brain diseases. However, collecting the full set of well-aligned and paired data is impractical, since the practical difficulties may include high cost, long time acquisition, image corruption, and privacy issues. Previously, the misaligned unpaired neuroimaging data (terme… ▽ More

    Submitted 16 July, 2022; v1 submitted 29 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: text overlap with arXiv:2201.08953

  21. arXiv:2111.12324  [pdf, other

    cs.SD eess.AS

    How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

    Authors: Haoran Sun, Lantian Li, Thomas Fang Zheng, Dong Wang

    Abstract: The way that humans encode their emotion into speech signals is complex. For instance, an angry man may increase his pitch and speaking rate, and use impolite words. In this paper, we present a preliminary study on various emotional factors and investigate how each of them impacts modern emotion recognition systems. The key tool of our study is the SpeechFlow model presented recently, by which we… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

  22. arXiv:2110.05087  [pdf

    cs.SD eess.AS

    A Multi-Resolution Front-End for End-to-End Speech Anti-Spoofing

    Authors: Wei Liu, Meng Sun, Xiongwei Zhang, Hugo Van hamme, Thomas Fang Zheng

    Abstract: The choice of an optimal time-frequency resolution is usually a difficult but important step in tasks involving speech signal classification, e.g., speech anti-spoofing. The variations of the performance with different choices of timefrequency resolutions can be as large as those with different model architectures, which makes it difficult to judge what the improvement actually comes from when a n… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: submitted to ICASSP 2022

  23. Attack on practical speaker verification system using universal adversarial perturbations

    Authors: Weiyi Zhang, Shuning Zhao, Le Liu, Jianmin Li, Xingliang Cheng, Thomas Fang Zheng, Xiaolin Hu

    Abstract: In authentication scenarios, applications of practical speaker verification systems usually require a person to read a dynamic authentication text. Previous studies played an audio adversarial example as a digital signal to perform physical attacks, which would be easily rejected by audio replay detection modules. This work shows that by playing our crafted adversarial perturbation as a separate s… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: 6 pages, 2 figures

  24. arXiv:2103.11587  [pdf, other

    cs.CV eess.IV

    Brain Image Synthesis with Unsupervised Multivariate Canonical CSC$\ell_4$Net

    Authors: Yawen Huang, Feng Zheng, Danyang Wang, Weilin Huang, Matthew R. Scott, Ling Shao

    Abstract: Recent advances in neuroscience have highlighted the effectiveness of multi-modal medical data for investigating certain pathologies and understanding human cognition. However, obtaining full sets of different modalities is limited by various factors, such as long acquisition times, high examination costs and artifact suppression. In addition, the complexity, high dimensionality and heterogeneity… ▽ More

    Submitted 22 March, 2021; originally announced March 2021.

    Comments: 10 pages, 5 figures CVPR2021 oral

  25. arXiv:2012.12468  [pdf, other

    cs.SD eess.AS

    CN-Celeb: multi-genre speaker recognition

    Authors: Lantian Li, Ruiqi Liu, Jiawen Kang, Yue Fan, Hao Cui, Yunqi Cai, Ravichander Vipperla, Thomas Fang Zheng, Dong Wang

    Abstract: Research on speaker recognition is extending to address the vulnerability in the wild conditions, among which genre mismatch is perhaps the most challenging, for instance, enrollment with reading speech while testing with conversational or singing audio. This mismatch leads to complex and composite inter-session variations, both intrinsic (i.e., speaking style, physiological status) and extrinsic… ▽ More

    Submitted 24 November, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: submitted to Speech Communication

  26. arXiv:2010.14243  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Squeezing value of cross-domain labels: a decoupled scoring approach for speaker verification

    Authors: Lantian Li, Yang Zhang, Jiawen Kang, Thomas Fang Zheng, Dong Wang

    Abstract: Domain mismatch often occurs in real applications and causes serious performance reduction on speaker verification systems. The common wisdom is to collect cross-domain data and train a multi-domain PLDA model, with the hope to learn a domain-independent speaker subspace. In this paper, we firstly present an empirical study to show that simply adding cross-domain data does not help performance in… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Submitted to ICASSP 2021

  27. arXiv:2010.14242  [pdf, other

    cs.SD cs.LG eess.AS

    Deep generative factorization for speech signal

    Authors: Haoran Sun, Lantian Li, Yunqi Cai, Yang Zhang, Thomas Fang Zheng, Dong Wang

    Abstract: Various information factors are blended in speech signals, which forms the primary difficulty for most speech information processing tasks. An intuitive idea is to factorize speech signal into individual information factors (e.g., phonetic content and speaker trait), though it turns out to be highly challenging. This paper presents a speech factorization approach based on a novel factorial discrim… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Submitted to ICASSP 2021

  28. arXiv:2009.06863  [pdf

    eess.AS cs.CR cs.SD

    When Automatic Voice Disguise Meets Automatic Speaker Verification

    Authors: Linlin Zheng, Jiakang Li, Meng Sun, Xiongwei Zhang, Thomas Fang Zheng

    Abstract: The technique of transforming voices in order to hide the real identity of a speaker is called voice disguise, among which automatic voice disguise (AVD) by modifying the spectral and temporal characteristics of voices with miscellaneous algorithms are easily conducted with softwares accessible to the public. AVD has posed great threat to both human listening and automatic speaker verification (AS… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.

    Comments: accepted for publication

    Journal ref: IEEE Transactions on Information Forensics and Security, 2020

  29. arXiv:2005.11905  [pdf, other

    eess.AS

    Neural Discriminant Analysis for Deep Speaker Embedding

    Authors: Lantian Li, Dong Wang, Thomas Fang Zheng

    Abstract: Probabilistic Linear Discriminant Analysis (PLDA) is a popular tool in open-set classification/verification tasks. However, the Gaussian assumption underlying PLDA prevents it from being applied to situations where the data is clearly non-Gaussian. In this paper, we present a novel nonlinear version of PLDA named as Neural Discriminant Analysis (NDA). This model employs an invertible deep neural n… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

    Comments: submitted to INTERSPEECH 2020

  30. arXiv:2005.11902  [pdf, other

    eess.AS

    ASR-Free Pronunciation Assessment

    Authors: Sitong Cheng, Zhixin Liu, Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng

    Abstract: Most of the pronunciation assessment methods are based on local features derived from automatic speech recognition (ASR), e.g., the Goodness of Pronunciation (GOP) score. In this paper, we investigate an ASR-free scoring approach that is derived from the marginal distribution of raw speech signals. The hypothesis is that even if we have no knowledge of the language (so cannot recognize the phones/… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

    Comments: submitted to INTRESPEECH 2020

  31. arXiv:2005.11900  [pdf, other

    eess.AS

    Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning

    Authors: Jiawen Kang, Ruiqi Liu, Lantian Li, Yunqi Cai, Dong Wang, Thomas Fang Zheng

    Abstract: Domain generalization remains a critical problem for speaker recognition, even with the state-of-the-art architectures based on deep neural nets. For example, a model trained on reading speech may largely fail when applied to scenarios of singing or movie. In this paper, we propose a domain-invariant projection to improve the generalizability of speaker vectors. This projection is a simple neural… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

    Comments: submitted to INTERSPEECH 2020

  32. arXiv:2005.02627  [pdf, other

    cs.IT eess.SP

    Joint Optimal Software Caching, Computation Offloading and Communications Resource Allocation for Mobile Edge Computing

    Authors: Wanli Wen, Ying Cui, Tony Q. S. Quek, Fu-Chun Zheng, Shi **

    Abstract: As software may be used by multiple users, caching popular software at the wireless edge has been considered to save computation and communications resources for mobile edge computing (MEC). However, fetching uncached software from the core network and multicasting popular software to users have so far been ignored. Thus, existing design is incomplete and less practical. In this paper, we propose… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

    Comments: To appear in IEEE Trans. Veh. Technol., 2020

  33. arXiv:2002.00136  [pdf, ps, other

    eess.SP

    A Novel Massive MIMO Beam Domain Channel Model

    Authors: Fan Lai, Cheng-Xiang Wang, Jie Huang, Xiqi Gao, Fu-Chun Zheng

    Abstract: A novel beam domain channel model (BDCM) for massive multiple-input multiple-output (MIMO) communication systems has been proposed in this paper. The near-field effect and spherical wavefront are firstly assumed in the proposed model, which is different from the conventional BDCM for MIMO based on the far-field effect and plane wavefront assumption. The proposed novel BDCM is the transformation of… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

  34. arXiv:1912.01300  [pdf, other

    cs.CV cs.LG eess.IV

    Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification

    Authors: Zhihui Zhu, Xinyang Jiang, Feng Zheng, Xiaowei Guo, Feiyue Huang, Weishi Zheng, Xing Sun

    Abstract: Although great progress in supervised person re-identification (Re-ID) has been made recently, due to the viewpoint variation of a person, Re-ID remains a massive visual challenge. Most existing viewpoint-based person Re-ID methods project images from each viewpoint into separated and unrelated sub-feature spaces. They only model the identity-level distribution inside an individual viewpoint but i… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

  35. arXiv:1911.12512  [pdf, other

    cs.CV cs.LG eess.IV

    Rethinking Temporal Fusion for Video-based Person Re-identification on Semantic and Time Aspect

    Authors: Xinyang Jiang, Yifei Gong, Xiaowei Guo, Qize Yang, Feiyue Huang, Weishi Zheng, Feng Zheng, Xing Sun

    Abstract: Recently, the research interest of person re-identification (ReID) has gradually turned to video-based methods, which acquire a person representation by aggregating frame features of an entire video. However, existing video-based ReID methods do not consider the semantic difference brought by the outputs of different network stages, which potentially compromises the information richness of the per… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

  36. Traffic state estimation using stochastic Lagrangian dynamics

    Authors: Fangfang Zheng, Saif Eddin Jabari, Henry X. Liu, DianChao Lin

    Abstract: This paper proposes a new stochastic model of traffic dynamics in Lagrangian coordinates. The source of uncertainty is heterogeneity in driving behavior, captured using driver-specific speed-spacing relations, i.e., parametric uncertainty. It also results in smooth vehicle trajectories in a stochastic context, which is in agreement with real-world traffic dynamics and, thereby, overcoming issues w… ▽ More

    Submitted 31 May, 2018; originally announced June 2018.

    Journal ref: Transportation Research Part B: Methodological Volume 115, September 2018, Pages 143-165

  37. arXiv:1803.00886  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Deep factorization for speech signal

    Authors: Lantian Li, Dong Wang, Yixiang Chen, Ying Shi, Zhiyuan Tang, Thomas Fang Zheng

    Abstract: Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors. An intuitive idea is to factorize each speech frame into individual informative factors, though it turns out to be highly difficult. Recently, we found that speaker traits, which were assumed to be long-term distributional properties, are actually short-time patterns, and can be learn… ▽ More

    Submitted 27 February, 2018; originally announced March 2018.

    Comments: Accepted by ICASSP 2018. arXiv admin note: substantial text overlap with arXiv:1706.01777

  38. arXiv:1711.00366  [pdf, other

    cs.SD cs.LG eess.AS

    Full-info Training for Deep Speaker Feature Learning

    Authors: Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng

    Abstract: In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e.g., 0.3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model. By enforcing the model to discriminate the speakers in the training data, frame-level speaker features can be derived from the last hidden layer. In spite of its good performance, a potential… ▽ More

    Submitted 27 February, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: Accepted by ICASSP 2018