Skip to main content

Showing 1–39 of 39 results for author: Zeng, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.17809  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

    Authors: Chenyang Le, Yao Qian, Dongmei Wang, Long Zhou, Shujie Liu, Xiaofei Wang, Midia Yousefi, Yanmin Qian, **yu Li, Sheng Zhao, Michael Zeng

    Abstract: There is a rising interest and trend in research towards directly translating speech from one language to another, known as end-to-end speech-to-speech translation. However, most end-to-end models struggle to outperform cascade models, i.e., a pipeline framework by concatenating speech recognition, machine translation and text-to-speech models. The primary challenges stem from the inherent complex… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Work in progress

  2. arXiv:2405.08423  [pdf, other

    eess.IV cs.CV

    NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution

    Authors: Yihong Chen, Zhen Fan, Shuai Dong, Zhiwei Chen, Wenjie Li, Minghui Qin, Min Zeng, Xubing Lu, Guofu Zhou, Xingsen Gao, Jun-Ming Liu

    Abstract: Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  3. arXiv:2404.06690  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

    Authors: Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, **yu Li, Lei He, Sheng Zhao, Michael Zeng

    Abstract: Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-rou… ▽ More

    Submitted 29 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  4. arXiv:2402.07383  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

    Authors: Naoyuki Kanda, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Canrun Li, Chung-Hsien Tsai, Zhen Xiao, Yufei Xia, **zhu Li, Yanqing Liu, Sheng Zhao, Michael Zeng

    Abstract: Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their applications and user experience. While there have been prior works to generate natural laughter, they fell short in terms of controlling the timing an… ▽ More

    Submitted 4 March, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: See https://aka.ms/elate/ for demo samples, v2: subjective evaluation has been added

  5. arXiv:2402.04395  [pdf

    eess.SP

    Auto-Encoder Optimized PAM IM/DD Transceivers for Amplified Fiber Links

    Authors: Amir Omidi, Mai Banawan, Erwan Weckenmann, Benoit Paquin, Alireza Geravand, Zibo Zheng, Wei Shi, Ming Zeng, Leslie A. Rusch

    Abstract: We examine pulse amplitude modulation (PAM) for intensity modulation and direct detection systems. Using a straight-forward, mixed noise model, we optimize the constellations with an autoencoder-based neural network (NN), an improve required signal-to-noise ratio of 4 dB for amplified spontaneous emission (ASE)-limited PAM4 and PAM8, without increasing system complexity. Performance can also be im… ▽ More

    Submitted 29 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: 9 pages and 13 figures

  6. arXiv:2311.03282  [pdf, ps, other

    cs.IT eess.SP

    Resource Allocation for RIS-Empowered Wireless Communications: Low-Complexity and Robust Designs

    Authors: Ming Zeng, Wanming Hao, Zhangjie Peng, Zheng Chu, Xingwang Li, Changsheng You, Cunhua Pan

    Abstract: This article delves into advancements in resource allocation techniques tailored for systems utilizing reconfigurable intelligent surfaces (RIS), with a primary focus on achieving low-complexity and resilient solutions. The investigation of low-complexity approaches for RIS holds significant relevance, primarily owing to the intricate characteristics inherent in RIS-based systems and the need of d… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: submitted to IEEE WCM

  7. arXiv:2309.13874  [pdf, other

    eess.AS cs.LG cs.SD

    Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

    Authors: Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Xinkai Wang, Hemin Yang, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng

    Abstract: Target Speech Extraction (TSE) is a crucial task in speech processing that focuses on isolating the clean speech of a specific speaker from complex mixtures. While discriminative methods are commonly used for TSE, they can introduce distortion in terms of speech perception quality. On the other hand, generative approaches, particularly diffusion-based methods, can enhance speech quality perceptual… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  8. Physical Layer Security for NOMA Systems: Requirements, Issues, and Recommendations

    Authors: Saeid Pakravan, Jean-Yves Chouinard, Xingwang Li, Ming Zeng, Wanming Hao, Quoc-Viet Pham, Octavia A. Dobre

    Abstract: Non-orthogonal multiple access (NOMA) has been viewed as a potential candidate for the upcoming generation of wireless communication systems. Comparing to traditional orthogonal multiple access (OMA), multiplexing users in the same time-frequency resource block can increase the number of served users and improve the efficiency of the systems in terms of spectral efficiency. Nevertheless, from a se… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: 17 pages, 4 figures

    Journal ref: IEEE Internet of Things Journal

  9. arXiv:2307.08234  [pdf, other

    eess.AS

    Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

    Authors: Shaoshi Ling, Yuxuan Hu, Shuangbei Qian, Guoli Ye, Yao Qian, Yifan Gong, Ed Lin, Michael Zeng

    Abstract: Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions. Pretrained large language models (LLMs) have the potential to improve the performance of E2E ASR. However, integrating a pretrained language model into an E2E speech recognition model has shown limited benefits due to the mismatches between text-based LL… ▽ More

    Submitted 2 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  10. arXiv:2305.18747  [pdf, other

    eess.AS cs.CL

    Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

    Authors: Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

    Abstract: State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages. However, it remains a challenge for these models to recognize overlapped speech, which is often seen in meeting conversations. We propose an approach to adapt USMs for multi-talker ASR. We first develop an enhanced version of serialized out… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by Interspeech 2023

  11. arXiv:2305.14838  [pdf, other

    cs.CL cs.SD eess.AS

    ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

    Authors: Chenyang Le, Yao Qian, Long Zhou, Shujie Liu, Yanmin Qian, Michael Zeng, Xuedong Huang

    Abstract: Joint speech-language training is challenging due to the large demand for training data and GPU consumption, as well as the modality gap between speech and language. We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models and optimized data-efficiently for spoken language tasks. Particularly, we propose to incorporate… ▽ More

    Submitted 14 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023, Poster

  12. arXiv:2305.12311  [pdf, other

    cs.CL cs.AI cs.CV cs.LG eess.AS

    i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

    Authors: Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

    Abstract: The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is a… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

  13. arXiv:2305.11846  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    Any-to-Any Generation via Composable Diffusion

    Authors: Zineng Tang, Ziyi Yang, Chenguang Zhu, Michael Zeng, Mohit Bansal

    Abstract: We present Composable Diffusion (CoDi), a novel generative model capable of generating any combination of output modalities, such as language, image, video, or audio, from any combination of input modalities. Unlike existing generative AI systems, CoDi can generate multiple modalities in parallel and its input is not limited to a subset of modalities like text or image. Despite the absence of trai… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Project Page: https://codi-gen.github.io

  14. arXiv:2303.14044  [pdf, other

    cs.GR cs.CV cs.SD eess.AS

    MusicFace: Music-driven Expressive Singing Face Synthesis

    Authors: Pengfei Liu, Wen** Deng, Hengda Li, **tai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng

    Abstract: It is still an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music signal. In this paper, we present a method for this task with natural motions of the lip, facial expression, head pose, and eye states. Due to the coupling of the mixed information of human voice and background music in common signals of music audio, we design a decouple-and-fuse str… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to CVMJ

  15. arXiv:2303.10949  [pdf, other

    eess.AS cs.CL cs.SD

    Code-Switching Text Generation and Injection in Mandarin-English ASR

    Authors: Haibin Yu, Yuxuan Hu, Yao Qian, Ma **, Linquan Liu, Shujie Liu, Yu Shi, Yanmin Qian, Edward Lin, Michael Zeng

    Abstract: Code-switching speech refers to a means of expression by mixing two or more languages within a single utterance. Automatic Speech Recognition (ASR) with End-to-End (E2E) modeling for such speech can be a challenging task due to the lack of data. In this study, we investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transd… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  16. arXiv:2303.08372  [pdf, other

    eess.AS cs.SD

    Target Sound Extraction with Variable Cross-modality Clues

    Authors: Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng

    Abstract: Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources. It often uses a model conditioned on a fixed form of target sound clues, such as a sound class label, which limits the ways in which users can interact with the model to specify the target sounds. To leverage… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  17. arXiv:2210.15936  [pdf, other

    cs.SD eess.AS

    A comprehensive study on self-supervised distillation for speaker representation learning

    Authors: Zhengyang Chen, Yao Qian, Bing Han, Yanmin Qian, Michael Zeng

    Abstract: In real application scenarios, it is often challenging to obtain a large amount of labeled data for speaker representation learning due to speaker privacy concerns. Self-supervised learning with no labels has become a more and more promising way to solve it. Compared with contrastive learning, self-distilled approaches use only positive samples in the loss function and thus are more attractive. In… ▽ More

    Submitted 25 November, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted by SLT2022

  18. arXiv:2205.01818  [pdf, other

    cs.LG cs.AI cs.CL cs.CV eess.AS

    i-Code: An Integrative and Composable Multimodal Learning Framework

    Authors: Ziyi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

    Abstract: Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. I… ▽ More

    Submitted 5 May, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

  19. arXiv:2202.07140  [pdf, ps, other

    cs.IT eess.SP

    Securing Reconfigurable Intelligent Surface-Aided Cell-Free Networks

    Authors: Wanming Hao, Junjie Li, Gangcan Sun, Ming Zeng, Octavia A. Dobre

    Abstract: In this paper, we investigate the physical layer security in the reconfigurable intelligent surface (RIS)-aided cell-free networks. A maximum weighted sum secrecy rate problem is formulated by jointly optimizing the active beamforming (BF) at the base stations and passive BF at the RISs. To handle this non-trivial problem, we adopt the alternating optimization to decouple the original problem into… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  20. arXiv:2202.07137  [pdf, other

    cs.IT eess.SP

    Ultra Wide Band THz IRS Communications: Applications, Challenges, Key Techniques, and Research Opportunities

    Authors: Wanming Hao, Fuhui Zhou, Ming Zeng, Octavia A. Dobre, Naofal Al-Dhahir

    Abstract: Terahertz (THz) communication is a promising technology for future wireless networks due to its ultra-wide bandwidth. However, THz signals suffer from severe attenuation and poor diffraction capability, making it vulnerable to blocking obstacles. To compensate for these two shortcomings and improve the system performance, an intelligent reflecting surface (IRS) can be exploited to change the propa… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Journal ref: IEEE Network,2022

  21. arXiv:2112.05826  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    Sequence-level self-learning with multiple hypotheses

    Authors: Kenichi Kumatani, Dimitrios Dimitriadis, Yashesh Gaur, Robert Gmyr, Sefik Emre Eskimez, **yu Li, Michael Zeng

    Abstract: In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. However, the imperfect ASR result makes unsupervised learning difficult to consistently improve recognition performance especially in the case that multipl… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: Published in Interspeech 2020: https://www.isca-speech.org/archive_v0/Interspeech_2020/pdfs/2020.pdf

    Report number: https://www.isca-speech.org/archive_v0/Interspeech_2020/pdfs/2020.pdf

    Journal ref: Proc. Interspeech 2020, page 3775-3779

  22. arXiv:2111.13608  [pdf, other

    cs.IT eess.SP

    Joint Wireless and Computing Resources Allocation in Multi-Cell MEC

    Authors: M. Zeng

    Abstract: This paper addresses join wireless and computing resource allocation in mobile edge computing (MEC) systems with several access points and with the possibility that users connect to many access points, and utilize the computation capability of many servers at the same time. The problem of sum transmission energy minimization under response time constraints is considered. It is proved, that the opt… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: 6 pages, 3 figures. arXiv admin note: text overlap with arXiv:1910.04841

  23. arXiv:2110.13900  [pdf, other

    cs.CL cs.SD eess.AS

    WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

    Authors: Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, **yu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei

    Abstract: Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, spoken content, etc., learning universal representations for all speech tasks is challenging. To tackle the problem, we propose a new pre-trained… ▽ More

    Submitted 17 June, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: Submitted to the Journal of Selected Topics in Signal Processing (JSTSP)

  24. arXiv:2110.12138  [pdf, other

    cs.SD eess.AS

    Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding

    Authors: Wei Wang, Shuo Ren, Yao Qian, Shujie Liu, Yu Shi, Yanmin Qian, Michael Zeng

    Abstract: The advances in attention-based encoder-decoder (AED) networks have brought great progress to end-to-end (E2E) automatic speech recognition (ASR). One way to further improve the performance of AED-based E2E ASR is to introduce an extra text encoder for leveraging extensive text data and thus capture more context-aware linguistic information. However, this approach brings a mismatch problem between… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

    Comments: submitted to ICASSP 2022

  25. arXiv:2110.09707  [pdf

    cs.RO eess.SY

    PI(t)D(t) Control and Motion Profiling for Omnidirectional Mobile Robots

    Authors: Michael Zeng

    Abstract: Recently, a trend is emerging toward human-servicing autonomous mobile robots, with diverse applications including delivery of supplies in hospitals, hotels, or labs where personnel are scarce, or reacting to indoor emergencies. However, existing autonomous mobile robot (AMR) motion is slow and inefficient, a foundational barrier to proliferation of human-servicing applications. This research has… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: 12 pages, 13 figures

  26. arXiv:2110.05777  [pdf, other

    cs.SD eess.AS

    Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

    Authors: Zhengyang Chen, Sanyuan Chen, Yu Wu, Yao Qian, Chengyi Wang, Shujie Liu, Yanmin Qian, Michael Zeng

    Abstract: The speech representations learned from large-scale unlabeled data have shown better generalizability than those from supervised learning and thus attract a lot of interest to be applied for various downstream tasks. In this paper, we explore the limits of speech representations learned by different self-supervised objectives and datasets for automatic speaker verification (ASV), especially with a… ▽ More

    Submitted 24 January, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022

  27. arXiv:2106.05630  [pdf, other

    cs.SD cs.CL cs.IR cs.MM eess.AS

    MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

    Authors: Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu

    Abstract: Symbolic music understanding, which refers to the understanding of music from the symbolic data (e.g., MIDI format, but not audio), covers many music applications such as genre classification, emotion classification, and music pieces matching. While good music representations are beneficial for these applications, the lack of training data hinders representation learning. Inspired by the success o… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted by ACL 2021 Findings

  28. arXiv:2102.11114  [pdf, other

    cs.CL cs.SD eess.AS

    Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

    Authors: Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng

    Abstract: Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR s… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

    Comments: Accepted in 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

  29. arXiv:2102.06283  [pdf, other

    cs.CL cs.SD eess.AS

    Speech-language Pre-training for End-to-end Spoken Language Understanding

    Authors: Yao Qian, Ximo Bian, Yu Shi, Naoyuki Kanda, Leo Shen, Zhen Xiao, Michael Zeng

    Abstract: End-to-end (E2E) spoken language understanding (SLU) can infer semantics directly from speech signal without cascading an automatic speech recognizer (ASR) with a natural language understanding (NLU) module. However, paired utterance recordings and corresponding semantics may not always be available or sufficient to train an E2E SLU model in a real production environment. In this paper, we propose… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  30. arXiv:2101.07597  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

    Authors: Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang

    Abstract: In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner. The resultant representations can capture information more correlated with phonetic structures and improve… ▽ More

    Submitted 10 June, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

    Comments: accepted by ICML2021

  31. arXiv:2008.05798  [pdf, ps, other

    cs.IT eess.SP

    Hardware Impaired Ambient Backscatter NOMA Systems: Reliability and Security

    Authors: Xingwang Li, Mengle Zhao, Ming Zeng, Shahid Mumtaz, Varun G Menon, Zhiguo Ding, Octavia A. Dobre

    Abstract: Non-orthogonal multiple access (NOMA) and ambient backscatter communication have been envisioned as two promising technologies for the Internet-of-things due to their high spectral efficiency and energy efficiency. Motivated by this fact, we consider an ambient backscatter NOMA system in the presence of a malicious eavesdropper. Under some realistic assumptions of residual hardware impairments (RH… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  32. arXiv:2007.10001  [pdf, other

    cs.IT eess.SP

    Power Minimization for Multi-cell Uplink NOMA with Imperfect SIC

    Authors: M. Zeng, W. Hao, O. A. Dobre, Z. Ding, H. V. Poor

    Abstract: In this paper, we investigate a multi-cell uplink non-orthogonal multiple access (NOMA) system with imperfect successive interference cancellation (SIC). The objective of the formulated optimization problem is to minimize the total power consumption under users' quality-of-service constraints. The considered problem is first transformed into a linear programming problem, upon which centralized and… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

    Comments: accepted by IEEE WCL; NOMA, uplink, multi-cell, power minimization, imperfect SIC

  33. arXiv:2006.05407  [pdf, other

    cs.CV eess.IV

    D-VPnet: A Network for Real-time Dominant Vanishing Point Detection in Natural Scenes

    Authors: Yin-Bo Liu, Ming Zeng, Qing-Hao Meng

    Abstract: As an important part of linear perspective, vanishing points (VPs) provide useful clues for map** objects from 2D photos to 3D space. Existing methods are mainly focused on extracting structural features such as lines or contours and then clustering these features to detect VPs. However, these techniques suffer from ambiguous information due to the large number of line segments and contours dete… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: 18 pages, 6 figures, under review

    ACM Class: I.4.7

  34. Sum Rate Maximization for IRS-assisted Uplink NOMA

    Authors: M. Zeng, X. Li, G. Li, W. Hao, O. A. Dobre

    Abstract: An intelligent reflecting surface (IRS) consists of a large number of low-cost reflecting elements, which can steer the incident signal collaboratively by passive beamforming. This way, IRS reconfigures the wireless environment to boost the system performance. In this paper, we consider an IRS-assisted uplink non-orthogonal multiple access (NOMA) system. The objective is to maximize the sum rate o… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

    Comments: IEEE COMML, IRS, RIS, NOMA, sum rate, uplink

    Journal ref: IEEE COMML 2020

  35. arXiv:2002.04169  [pdf, other

    cs.IT eess.SP

    Edge Cache-assisted Secure Low-Latency Millimeter Wave Transmission

    Authors: Wanming Hao, Ming Zeng, Gangcan Sun, Pei Xiao

    Abstract: In this paper, we consider an edge cache-assisted millimeter wave cloud radio access network (C-RAN). Each remote radio head (RRH) in the C-RAN has a local cache, which can pre-fetch and store the files requested by the actuators. Multiple RRHs form a cluster to cooperatively serve the actuators, which acquire their required files either from the local caches or from the central processor via mult… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    Comments: IEEE_IoT, Accept

  36. arXiv:1910.04841  [pdf, ps, other

    cs.IT eess.SP

    Dynamic Spectrum Sharing for Load Balancing in Multi-Cell Mobile Edge Computing

    Authors: Ming Zeng, Viktoria Fodor

    Abstract: Large-scale mobile edge computing (MEC) systems require scalable solutions to allocate communication and computing resources to the users. In this letter we address this challenge by applying dynamic spectrum sharing among the base stations (BSs), together with local resource allocation in the cells. We show that the network-wide resource allocation can be transformed into a convex optimization pr… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: IEEE WCL

  37. arXiv:1909.06218  [pdf, other

    cs.IT eess.SP

    Codebook-Based Max-Min Energy-Efficient Resource Allocation for Uplink mmWave MIMO-NOMA Systems

    Authors: Wanming Hao, Ming Zeng, Gangcan Sun, Osamu Muta, Octavia A. Dobre, Shouyi Yang, Haris Gacanin

    Abstract: In this paper, we investigate the energy-efficient resource allocation problem in an uplink non-orthogonal multiple access (NOMA) millimeter wave system, where the fully-connected-based sparse radio frequency chain antenna structure is applied at the base station (BS). To relieve the pilot overhead for channel estimation, we propose a codebook-based analog beam design scheme, which only requires t… ▽ More

    Submitted 13 September, 2019; originally announced September 2019.

    Comments: IEEE_T_COM, accepted

  38. arXiv:1907.10001  [pdf

    cs.IT eess.SP

    Non-Orthogonal Multiple Access (NOMA): How It Meets 5G and Beyond

    Authors: S. M. Riazul Islam, Ming Zeng, Octavia A. Dobre, Kyung-Sup Kwak

    Abstract: Due to massive connectivity and increasing demands of various services and data-hungry applications, a full-scale implementation of the fifth generation (5G) wireless systems requires more effective radio access techniques. In this regard, non-orthogonal multiple access (NOMA) has recently gained ever-growing attention from both academia and industry. Compared to orthogonal multiple access (OMA) t… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: 38 pages, 9 figures, Wiley 5G Ref

  39. arXiv:1905.02545  [pdf, other

    eess.AS cs.CL cs.SD

    Meeting Transcription Using Virtual Microphone Arrays

    Authors: Takuya Yoshioka, Zhuo Chen, Dimitrios Dimitriadis, William Hinthorn, Xuedong Huang, Andreas Stolcke, Michael Zeng

    Abstract: We describe a system that generates speaker-annotated transcripts of meetings by using a virtual microphone array, a set of spatially distributed asynchronous recording devices such as laptops and mobile phones. The system is composed of continuous audio stream alignment, blind beamforming, speech recognition, speaker diarization using prior speaker information, and system combination. When utiliz… ▽ More

    Submitted 7 July, 2019; v1 submitted 3 May, 2019; originally announced May 2019.

    Report number: MSR-TR-2019-11