Skip to main content

Showing 1–27 of 27 results for author: Lian, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.12569  [pdf, other

    eess.SP

    TypeII-CsiNet: CSI Feedback with TypeII Codebook

    Authors: Yiliang Sang, Ke Ma, Yang Ming, ** Lian, Zhaocheng Wang

    Abstract: The latest TypeII codebook selects partial strongest angular-delay ports for the feedback of downlink channel state information (CSI), whereas its performance is limited due to the deficiency of utilizing the correlations among the port coefficients. To tackle this issue, we propose a tailored autoencoder named TypeII-CsiNet to effectively integrate the TypeII codebook with deep learning, wherein… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  2. arXiv:2403.00529  [pdf, other

    cs.SD cs.LG eess.AS

    VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

    Authors: Weiwei Lin, Chenhang He, Man-Wai Mak, Jiachen Lian, Kong Aik Lee

    Abstract: Achieving nuanced and accurate emulation of human voice has been a longstanding goal in artificial intelligence. Although significant progress has been made in recent years, the mainstream of speech synthesis models still relies on supervised speaker modeling and explicit reference utterances. However, there are many aspects of human voice, such as emotion, intonation, and speaking style, for whic… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: preprint

  3. arXiv:2402.02411  [pdf, other

    eess.IV cs.CV

    Physics-Inspired Degradation Models for Hyperspectral Image Fusion

    Authors: Jie Lian, Lizhi Wang, Lin Zhu, Renwei Dian, Zhiwei Xiong, Hua Huang

    Abstract: The fusion of a low-spatial-resolution hyperspectral image (LR-HSI) with a high-spatial-resolution multispectral image (HR-MSI) has garnered increasing research interest. However, most fusion methods solely focus on the fusion algorithm itself and overlook the degradation models, which results in unsatisfactory performance in practical scenarios. To fill this gap, we propose physics-inspired degra… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  4. arXiv:2401.10015  [pdf, other

    cs.CL eess.AS

    Towards Hierarchical Spoken Language Dysfluency Modeling

    Authors: Jiachen Lian, Gopala Anumanchipalli

    Abstract: Speech disfluency modeling is the bottleneck for both speech therapy and language learning. However, there is no effective AI solution to systematically tackle this problem. We solidify the concept of disfluent speech and disfluent speech modeling. We then present Hierarchical Unconstrained Disfluency Modeling (H-UDM) approach, the hierarchical extension of UDM that addresses both disfluency trans… ▽ More

    Submitted 21 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 2024 EACL. Hierarchical extension of our previous workshop paper arXiv:2312.12810

  5. arXiv:2312.12810  [pdf, other

    eess.AS cs.SD

    Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection

    Authors: Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli

    Abstract: Dysfluent speech modeling requires time-accurate and silence-aware transcription at both the word-level and phonetic-level. However, current research in dysfluency modeling primarily focuses on either transcription or detection, and the performance of each aspect remains limited. In this work, we present an unconstrained dysfluency modeling (UDM) approach that addresses both transcription and dete… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 2023 ASRU

  6. arXiv:2310.05962  [pdf, other

    cs.IT cs.LG eess.SP

    Improving the Performance of R17 Type-II Codebook with Deep Learning

    Authors: Ke Ma, Yiliang Sang, Yang Ming, ** Lian, Chang Tian, Zhaocheng Wang

    Abstract: The Type-II codebook in Release 17 (R17) exploits the angular-delay-domain partial reciprocity between uplink and downlink channels to select part of angular-delay-domain ports for measuring and feeding back the downlink channel state information (CSI), where the performance of existing deep learning enhanced CSI feedback methods is limited due to the deficiency of sparse structures. To address th… ▽ More

    Submitted 13 September, 2023; originally announced October 2023.

    Comments: Accepted by IEEE GLOBECOM 2023, conference version of Arxiv:2305.08081

  7. arXiv:2309.15203  [pdf, other

    cs.CR cs.HC eess.SP

    Eve Said Yes: AirBone Authentication for Head-Wearable Smart Voice Assistant

    Authors: Chenpei Huang, Hui Zhong, Jie Lian, Pavana Prakash, Dian Shi, Yuan Xu, Miao Pan

    Abstract: Recent advances in machine learning and natural language processing have fostered the enormous prosperity of smart voice assistants and their services, e.g., Alexa, Google Home, Siri, etc. However, voice spoofing attacks are deemed to be one of the major challenges of voice control security, and never stop evolving such as deep-learning-based voice conversion and speech synthesis techniques. To so… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 13 pages, 12 figures

  8. arXiv:2309.09088  [pdf, other

    cs.SD eess.AS

    Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition

    Authors: Haoming Guo, Seth Z. Zhao, Jiachen Lian, Gopala Anumanchipalli, Gerald Friedland

    Abstract: Vocoder models have recently achieved substantial progress in generating authentic audio comparable to human quality while significantly reducing memory requirement and inference time. However, these data-hungry generative models require large-scale audio data for learning good representations. In this paper, we apply contrastive learning methods in training the vocoder to improve the perceptual q… ▽ More

    Submitted 18 December, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

  9. arXiv:2307.02471  [pdf, other

    eess.AS

    Deep Speech Synthesis from MRI-Based Articulatory Representations

    Authors: Peter Wu, Tingle Li, Yi**g Lu, Yubin Zhang, Jiachen Lian, Alan W Black, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli

    Abstract: In this paper, we study articulatory synthesis, a speech synthesis method using human vocal tract information that offers a way to develop efficient, generalizable and interpretable synthesizers. While recent advances have enabled intelligible articulatory synthesis using electromagnetic articulography (EMA), these methods lack critical articulatory information like excitation and nasality, limiti… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  10. arXiv:2302.06419  [pdf, other

    eess.AS cs.AI cs.CL

    AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations

    Authors: Jiachen Lian, Alexei Baevski, Wei-Ning Hsu, Michael Auli

    Abstract: Self-supervision has shown great potential for audio-visual speech recognition by vastly reducing the amount of labeled data required to build good systems. However, existing methods are either not entirely end-to-end or do not train joint representations of both modalities. In this paper, we introduce AV-data2vec which addresses these challenges and builds audio-visual representations based on pr… ▽ More

    Submitted 21 January, 2024; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: 2023 ASRU

  11. arXiv:2210.16498  [pdf, other

    eess.AS cs.SD

    Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization

    Authors: Jiachen Lian, Alan W Black, Yi**g Lu, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli

    Abstract: Articulatory representation learning is the fundamental research in modeling neural speech production system. Our previous work has established a deep paradigm to decompose the articulatory kinematics data into gestures, which explicitly model the phonological and linguistic structure encoded with human speech production mechanism, and corresponding gestural scores. We continue with this line of w… ▽ More

    Submitted 20 February, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

    Comments: Accepted to 2023 ICASSP. Camera Ready

  12. arXiv:2206.02512  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder

    Authors: Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

    Abstract: In this paper, we propose a novel unsupervised text-to-speech (UTTS) framework which does not require text-audio pairs for the TTS acoustic modeling (AM). UTTS is a multi-speaker speech synthesizer that supports zero-shot voice cloning, it is developed from a perspective of disentangled speech representation learning. The framework offers a flexible choice of a speaker's duration model, timbre fea… ▽ More

    Submitted 11 October, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Under Review

  13. arXiv:2205.05227  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    Towards Improved Zero-shot Voice Conversion with Conditional DSVAE

    Authors: Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

    Abstract: Disentangling content and speaking style information is essential for zero-shot non-parallel voice conversion (VC). Our previous study investigated a novel framework with disentangled sequential variational autoencoder (DSVAE) as the backbone for information decomposition. We have demonstrated that simultaneous disentangling content embedding and speaker embedding from one utterance is feasible fo… ▽ More

    Submitted 20 June, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: Accepted to 2022 Interspeech. Demo link is here https://jlian2.github.io/Improved-Voice-Conversion-with-Conditional-DSVAE/

  14. arXiv:2204.00465  [pdf, other

    eess.AS cs.AI eess.SP

    Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition

    Authors: Jiachen Lian, Alan W Black, Louis Goldstein, Gopala Krishna Anumanchipalli

    Abstract: Most of the research on data-driven speech representation learning has focused on raw audios in an end-to-end manner, paying little attention to their internal phonological or gestural structure. This work, investigating the speech representations derived from articulatory kinematics signals, uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data… ▽ More

    Submitted 20 June, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: Accepted to 2022 Interspeech. Code is publicly available at https://github.com/Berkeley-Speech-Group/ema_gesture

  15. arXiv:2203.16705  [pdf, other

    eess.AS cs.AI cs.CL eess.SP

    Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion

    Authors: Jiachen Lian, Chunlei Zhang, Dong Yu

    Abstract: Traditional studies on voice conversion (VC) have made progress with parallel training data and known speakers. Good voice conversion quality is obtained by exploring better alignment modules or expressive map** functions. In this study, we investigate zero-shot VC from a novel perspective of self-supervised disentangled speech representation learning. Specifically, we achieve the disentanglemen… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to 2022 ICASSP

  16. arXiv:2110.15018  [pdf, other

    eess.AS cs.SD

    TorchAudio: Building Blocks for Audio and Speech Processing

    Authors: Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Anjali Chourdia, Artyom Astafurov, Caroline Chen, Ching-Feng Yeh, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jay Mahadeokar, Jeff Hwang, Ji Chen, Peter Goldsborough, Prabhat Roy, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-BĂ©lair, Yangyang Shi

    Abstract: This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically dif… ▽ More

    Submitted 16 February, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: Accepted by ICASSP 2022

  17. arXiv:2110.12192  [pdf, other

    eess.IV cs.CV cs.LG

    Dual Shape Guided Segmentation Network for Organs-at-Risk in Head and Neck CT Images

    Authors: Shuai Wang, Theodore Yanagihara, Bhishamjit Chera, Colette Shen, Pew-Thian Yap, Jun Lian

    Abstract: The accurate segmentation of organs-at-risk (OARs) in head and neck CT images is a critical step for radiation therapy of head and neck cancer patients. However, manual delineation for numerous OARs is time-consuming and laborious, even for expert oncologists. Moreover, manual delineation results are susceptible to high intra- and inter-variability. To this end, we propose a novel dual shape guide… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

  18. arXiv:2106.14143  [pdf, ps, other

    eess.SY math.DS math.OC

    Sparse Control Synthesis for Uncertain Responsive Loads with Stochastic Stability Guarantees

    Authors: Sai Pushpak Nandanoori, Soumya Kundu, Jianming Lian, Umesh Vaidya, Draguna Vrabie, Karanjit Kalsi

    Abstract: Recent studies have demonstrated the potential of flexible loads in providing frequency response services. However, uncertainty and variability in various weather-related and end-use behavioral factors often affect the demand-side control performance. This work addresses this problem with the design of a demand-side control to achieve frequency response under load uncertainties. Our approach invol… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

    Comments: accepted for publication at the IEEE Transactions on Power Sysems

    Report number: PNNL-SA-156076

  19. arXiv:2104.10326  [pdf, other

    eess.IV cs.CV

    A Structure-Aware Relation Network for Thoracic Diseases Detection and Segmentation

    Authors: Jie Lian, **gyu Liu, Shu Zhang, Kai Gao, Xiaoqing Liu, Dingwen Zhang, Yizhou Yu

    Abstract: Instance level detection and segmentation of thoracic diseases or abnormalities are crucial for automatic diagnosis in chest X-ray images. Leveraging on constant structure and disease relations extracted from domain knowledge, we propose a structure-aware relation network (SAR-Net) extending Mask R-CNN. The SAR-Net consists of three relation modules: 1. the anatomical structure relation module enc… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: This paper has been accepted by IEEE Transactions on Medical Imaging

  20. Masked Proxy Loss For Text-Independent Speaker Verification

    Authors: Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh

    Abstract: Open-set speaker recognition can be regarded as a metric learning problem, which is to maximize inter-class variance and minimize intra-class variance. Supervised metric learning can be categorized into entity-based learning and proxy-based learning. Most of the existing metric learning objectives like Contrastive, Triplet, Prototypical, GE2E, etc all belong to the former division, the performance… ▽ More

    Submitted 24 June, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted at Interspeech 2021

  21. arXiv:2011.03689  [pdf, other

    cs.SD eess.AS

    Detection and Evaluation of human and machine generated speech in spoofing attacks on automatic speaker verification systems

    Authors: Yang Gao, Jiachen Lian, Bhiksha Raj, Rita Singh

    Abstract: Automatic speaker verification (ASV) systems utilize the biometric information in human speech to verify the speaker's identity. The techniques used for performing speaker verification are often vulnerable to malicious attacks that attempt to induce the ASV system to return wrong results, allowing an impostor to bypass the system and gain access. Attackers use a multitude of spoofing techniques fo… ▽ More

    Submitted 24 November, 2020; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: 6 pages excluding references. Paper accepted by IEEE Spoken Language Technology (SLT) 2021

  22. arXiv:2010.10298  [pdf

    eess.IV cs.CV

    The Detection of Thoracic Abnormalities ChestX-Det10 Challenge Results

    Authors: Jie Lian, **gyu Liu, Yizhou Yu, Mengyuan Ding, Yaoci Lu, Yi Lu, Jie Cai, Deshou Lin, Miao Zhang, Zhe Wang, Kai He, Yijie Yu

    Abstract: The detection of thoracic abnormalities challenge is organized by the Deepwise AI Lab. The challenge is divided into two rounds. In this paper, we present the results of 6 teams which reach the second round. The challenge adopts the ChestX-Det10 dateset proposed by the Deepwise AI Lab. ChestX-Det10 is the first chest X-Ray dataset with instance-level annotations, including 10 categories of disease… ▽ More

    Submitted 21 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

  23. arXiv:2008.00152  [pdf, other

    cs.CR eess.SY

    Transactive Energy System Deployment over Insecure Communication Links

    Authors: Yang Lu, Jianming Lian, Minghui Zhu, Ke Ma

    Abstract: In this paper, the privacy and security issues associated with the transactive energy system (TES) deployment over insecure communication links are addressed. In particular, it is ensured that (1) individual agents' bidding information is kept private throughout hierarchical market-based interactions; and (2) any extraneous data injection attack can be quickly and easily detected. An implementatio… ▽ More

    Submitted 16 October, 2021; v1 submitted 31 July, 2020; originally announced August 2020.

    Comments: 10 pages, 6 figures, journal submission

  24. arXiv:2007.09770  [pdf, other

    eess.SY

    Multi-stage Power Scheduling Framework for Data Center with Chilled Water Storage in Energy and Regulation Markets

    Authors: Yangyang Fu, Xu Han, Jessica Stershic, Wangda Zuo, Kyri Baker, Jianming Lian

    Abstract: Leveraging electrochemical and thermal energy storage systems has been proposed as a strategy to reduce peak power in data centers. Thermal energy storage systems, such as chilled water tanks, have gained increasing attention in data centers for load shifting due to their relatively small capital and operational costs compared to electrochemical energy storage. However, there are few studies inves… ▽ More

    Submitted 19 July, 2020; originally announced July 2020.

  25. arXiv:2006.10550  [pdf

    eess.IV cs.CV

    ChestX-Det10: Chest X-ray Dataset on Detection of Thoracic Abnormalities

    Authors: **gyu Liu, Jie Lian, Yizhou Yu

    Abstract: Instance level detection of thoracic diseases or abnormalities are crucial for automatic diagnosis in chest X-ray images. Most existing works on chest X-rays focus on disease classification and weakly supervised localization. In order to push forward the research on disease classification and localization on chest X-rays. We provide a new benchmark called ChestX-Det10, including box-level annotati… ▽ More

    Submitted 19 October, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

  26. arXiv:1701.02036  [pdf, other

    eess.SY

    Decentralized Robust Control for Dam** Inter-area Oscillations in Power Systems

    Authors: Jianming Lian, Shaobu Wang, Ruisheng Diao, Zhenyu Huang

    Abstract: As power systems become more and more interconnected, the inter-area oscillations has become a serious factor limiting large power transfer among different areas. Underdamped (Undamped) inter-area oscillations may cause system breakup and even lead to large-scale blackout. Traditional dam** controllers include Power System Stabilizer (PSS) and Flexible AC Transmission System (FACTS) controller,… ▽ More

    Submitted 8 January, 2017; originally announced January 2017.

  27. Distributed Robust Adaptive Frequency Control of Power Systems with Dynamic Loads

    Authors: Hunmin Kim, Minghui Zhu, Jianming Lian

    Abstract: This paper investigates the frequency control of multi-machine power systems subject to uncertain and dynamic net loads. We propose distributed internal model controllers that coordinate synchronous generators and demand response to tackle the unpredictable nature of net loads. Frequency stability is formally guaranteed via Lyapunov analysis. Numerical simulations on the IEEE 68-bus test system de… ▽ More

    Submitted 8 January, 2020; v1 submitted 17 October, 2015; originally announced October 2015.

    Comments: Published in the IEEE Transaction on Automatic Control