Skip to main content

Showing 1–15 of 15 results for author: Gong, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2405.03711  [pdf, other

    cs.LG cs.AI cs.NE eess.SY

    Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

    Authors: Xiao Hu, Tianshu Wang, Min Gong, Shaoshi Yang

    Abstract: Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates g… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures, accepted to appear on IEEE Access, Mar. 2024

    Journal ref: IEEE Access, vol. 12, pp. 48210-48222, Mar. 2024

  3. arXiv:2404.00362  [pdf, other

    cs.CV eess.IV

    STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario

    Authors: Renyang Liu, Kwok-Yan Lam, Wei Zhou, Sixing Wu, Jun Zhao, Dongting Hu, Mingming Gong

    Abstract: Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  4. arXiv:2402.16413  [pdf, other

    eess.SP

    AI-enabled STAR-RIS aided MISO ISAC Secure Communications

    Authors: Zhengyu Zhu, Mengfei Gong, Gangcan Sun, Peijia Liu, De Mi

    Abstract: A simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) aided integrated sensing and communication (ISAC) dual-secure communication system is studied in this paper. The sensed target and legitimate users (LUs) are situated on the opposite sides of the STAR-RIS, and the energy splitting and time switching protocols are applied in the STAR-RIS, respectively. The long… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  5. arXiv:2402.14401  [pdf, other

    cs.CV cs.LG eess.IV

    Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment

    Authors: Zhaoyang Wang, Bo Hu, Mingyang Zhang, Jie Li, Leida Li, Maoguo Gong, Xinbo Gao

    Abstract: Existing free-energy guided No-Reference Image Quality Assessment (NR-IQA) methods still suffer from finding a balance between learning feature information at the pixel level of the image and capturing high-level feature information and the efficient utilization of the obtained high-level feature information remains a challenge. As a novel class of state-of-the-art (SOTA) generative model, the dif… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  6. arXiv:2401.17669  [pdf, other

    eess.SP

    Compression before Fusion: Broadcast Semantic Communication System for Heterogeneous Tasks

    Authors: Mingze Gong, Shuoyao Wang, Fangwei Ye, Suzhi Bi

    Abstract: Semantic communication has emerged as new paradigm shifts in 6G from the conventional syntax-oriented communications. Recently, the wireless broadcast technology has been introduced to support semantic communication system toward higher communication efficiency. Nevertheless, existing broadcast semantic communication systems target on general representation within one stage and fail to balance the… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  7. arXiv:2401.03476  [pdf, other

    cs.MM cs.AI cs.HC cs.SD eess.AS

    Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness

    Authors: Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu

    Abstract: Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tac… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 6 pages, 3 figures, ICASSP 2024

  8. arXiv:2401.02566  [pdf

    cs.SD cs.LG cs.MM eess.AS

    Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment

    Authors: Xiaoquan Li, Stephan Weiss, Yijun Yan, Yinhe Li, **chang Ren, John Soraghan, Ming Gong

    Abstract: Understanding and identifying musical shape plays an important role in music education and performance assessment. To simplify the otherwise time- and cost-intensive musical shape evaluation, in this paper we explore how artificial intelligence (AI) driven models can be applied. Considering musical shape evaluation as a classification problem, a light-weight Siamese residual neural network (S-ResN… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: X.Li, S.Weiss, Y.Yan, Y.Li, J.Ren, J.Soraghan, M.Gong,"Siamese residual neural network for musical shape evaluation in piano performance assessment" in Proc. of the 31st European Signal Processing Conference, Helsinki, Finland

  9. arXiv:2302.04609  [pdf, ps, other

    eess.SP math.OC stat.AP

    Stochastic Maximum Likelihood Direction Finding in the Presence of Nonuniform Noise Fields

    Authors: Ming-yan Gong, Bin Lyu

    Abstract: In this letter, we employ and design the expectation--conditional maximization either (ECME) algorithm, a generalisation of the EM algorithm, for solving the maximum likelihood direction finding problem of stochastic sources, which may be correlated, in unknown nonuniform noise. Unlike alternating maximization, the ECME algorithm updates both the source and noise covariance matrix estimates by exp… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  10. arXiv:2211.02458  [pdf, ps, other

    eess.SP

    EM-Type Algorithms for DOA Estimation in Unknown Nonuniform Noise

    Authors: Ming-yan Gong, Bin Lyu

    Abstract: The expectation--maximization (EM) algorithm updates all of the parameter estimates simultaneously, which is not applicable to direction of arrival (DOA) estimation in unknown nonuniform noise. In this work, we present several efficient EM-type algorithms, which update the parameter estimates sequentially, for solving both the deterministic and stochastic maximum--likelihood (ML) direction finding… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2208.07510

  11. arXiv:2208.07510  [pdf, ps, other

    eess.SP

    EM and SAGE algorithms for DOA Estimation in the Presence of Unknown Uniform Noise

    Authors: Ming-yan Gong, Bin Lyu

    Abstract: The expectation-maximization (EM) and space-alternating generalized EM (SAGE) algorithms have been applied to direction of arrival (DOA) estimation in known noise. In this work, the two algorithms are proposed for DOA estimation in unknown uniform noise. Both the deterministic and stochastic signal models are considered. Moreover, a modified EM (MEM) algorithm applicable to the noise assumption is… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

  12. arXiv:2203.12707  [pdf, other

    cs.CV eess.IV

    Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation

    Authors: Yanwu Xu, Shaoan Xie, Wenhao Wu, Kun Zhang, Mingming Gong, Kayhan Batmanghelich

    Abstract: Unpaired image-to-image translation (I2I) is an ill-posed problem, as an infinite number of translation functions can map the source domain distribution to the target distribution. Therefore, much effort has been put into designing suitable constraints, e.g., cycle consistency (CycleGAN), geometry consistency (GCGAN), and contrastive learning-based constraints (CUTGAN), that help better pose the p… ▽ More

    Submitted 29 March, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 accepted paper

  13. arXiv:2103.13491  [pdf, other

    eess.IV cs.LG

    Feature Weighted Non-negative Matrix Factorization

    Authors: Mulin Chen, Maoguo Gong, Xuelong Li

    Abstract: Non-negative Matrix Factorization (NMF) is one of the most popular techniques for data representation and clustering, and has been widely used in machine learning and data analysis. NMF concentrates the features of each sample into a vector, and approximates it by the linear combination of basis vectors, such that the low-dimensional representations are achieved. However, in real-world application… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

  14. arXiv:2102.11114  [pdf, other

    cs.CL cs.SD eess.AS

    Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

    Authors: Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng

    Abstract: Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR s… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: Accepted in 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

  15. Hierarchical Amortized Training for Memory-efficient High Resolution 3D GAN

    Authors: Li Sun, Junxiang Chen, Yanwu Xu, Mingming Gong, Ke Yu, Kayhan Batmanghelich

    Abstract: Generative Adversarial Networks (GAN) have many potential medical imaging applications, including data augmentation, domain adaptation, and model explanation. Due to the limited memory of Graphical Processing Units (GPUs), most current 3D GAN models are trained on low-resolution medical images, these models either cannot scale to high-resolution or are prone to patchy artifacts. In this work, we p… ▽ More

    Submitted 12 September, 2022; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: Paper accepted to IEEE Journal of Biomedical and Health Informatics, code available at https://github.com/batmanlab/HA-GAN

    Journal ref: in IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 8, pp. 3966-3975, Aug. 2022