Skip to main content

Showing 1–24 of 24 results for author: Huai, B

.
  1. arXiv:2312.10741  [pdf, other

    eess.AS cs.CL cs.SD

    StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

    Authors: Yu Zhang, Rongjie Huang, Ruiqi Li, **Zheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

    Abstract: Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expr… ▽ More

    Submitted 2 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  2. arXiv:2311.05419  [pdf, other

    cs.CL cs.AI

    Mirror: A Universal Framework for Various Information Extraction Tasks

    Authors: Tong Zhu, Junfei Ren, Zijian Yu, Mengsong Wu, Guoliang Zhang, Xiaoye Qu, Wenliang Chen, Zhefeng Wang, Baoxing Huai, Min Zhang

    Abstract: Sharing knowledge between information extraction tasks has always been a challenge due to the diverse data formats and task variations. Meanwhile, this divergence leads to information waste and increases difficulties in building complex applications in real scenarios. Recent studies often formulate IE tasks as a triplet extraction problem. However, such a paradigm does not support multi-span and n… ▽ More

    Submitted 26 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP23 main conference

  3. TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Ziyue Jiang, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

    Abstract: Recently, there has been a growing interest in the field of controllable Text-to-Speech (TTS). While previous studies have relied on users providing specific style factor values based on acoustic knowledge or selecting reference speeches that meet certain requirements, generating speech solely from natural text prompts has emerged as a new challenge for researchers. This challenge arises due to th… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Journal ref: 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  4. Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation

    Authors: Likang Wu, Zhi Li, Hongke Zhao, Zhefeng Wang, Qi Liu, Baoxing Huai, Nicholas **g Yuan, Enhong Chen

    Abstract: Zero-Shot Learning (ZSL), which aims at automatically recognizing unseen objects, is a promising learning paradigm to understand new real-world knowledge for machines continuously. Recently, the Knowledge Graph (KG) has been proven as an effective scheme for handling the zero-shot task with large-scale and non-attribute data. Prior studies always embed relationships of seen and unseen objects into… ▽ More

    Submitted 20 June, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: text overlap with arXiv:1805.11724 by other authors

  5. arXiv:2306.06800  [pdf, other

    cs.CL

    AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

    Authors: Asaad Alghamdi, Xinyu Duan, Wei Jiang, Zhenhai Wang, Yimeng Wu, Qingrong Xia, Zhefeng Wang, Yi Zheng, Mehdi Rezagholizadeh, Baoxing Huai, Peilun Cheng, Abbas Ghaddar

    Abstract: Develo** monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP). In this work, we present AraMUS, the largest Arabic PLM with 11B parameters trained on 529GB of high-quality Arabic textual data. AraMUS achieves state-of-the-art performances on a diverse set of Arabic classification and generative tasks.… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

  6. arXiv:2306.05119  [pdf, other

    cs.CL

    Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework

    Authors: Mingqi Gao, Xiaojun Wan, Jia Su, Zhefeng Wang, Baoxing Huai

    Abstract: Factuality is important to dialogue summarization. Factual error correction (FEC) of model-generated summaries is one way to improve factuality. Current FEC evaluation that relies on factuality metrics is not reliable and detailed enough. To address this problem, we are the first to manually annotate a FEC dataset for dialogue summarization containing 4000 items and propose FERRANTI, a fine-graine… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023 Main Conference

  7. arXiv:2305.12839  [pdf, other

    cs.CL

    CopyNE: Better Contextual ASR by Copying Named Entities

    Authors: Shilin Zhou, Zhenghua Li, Yu Hong, Min Zhang, Zhefeng Wang, Baoxing Huai

    Abstract: End-to-end automatic speech recognition (ASR) systems have made significant progress in general scenarios. However, it remains challenging to transcribe contextual named entities (NEs) in the contextual ASR scenario. Previous approaches have attempted to address this by utilizing the NE dictionary. These approaches treat entities as individual tokens and generate them token-by-token, which may res… ▽ More

    Submitted 27 May, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: ACL 2024

  8. arXiv:2304.14662  [pdf, other

    cs.CL

    CED: Catalog Extraction from Documents

    Authors: Tong Zhu, Guoliang Zhang, Zechang Li, Zijian Yu, Junfei Ren, Mengsong Wu, Zhefeng Wang, Baoxing Huai, **fu Chao, Wenliang Chen

    Abstract: Sentence-by-sentence information extraction from long documents is an exhausting and error-prone task. As the indicator of document skeleton, catalogs naturally chunk documents into segments and provide informative cascade semantics, which can help to reduce the search space. Despite their usefulness, catalogs are hard to be extracted without the assist from external knowledge. For documents that… ▽ More

    Submitted 28 April, 2023; originally announced April 2023.

  9. arXiv:2302.03512  [pdf, other

    cs.CL

    A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and Future Trends

    Authors: Xiaoye Qu, Yingjie Gu, Qingrong Xia, Zechang Li, Zhefeng Wang, Baoxing Huai

    Abstract: As more and more Arabic texts emerged on the Internet, extracting important information from these Arabic texts is especially useful. As a fundamental technology, Named entity recognition (NER) serves as the core component in information extraction technology, while also playing a critical role in many other Natural Language Processing (NLP) systems, such as question answering and knowledge graph… ▽ More

    Submitted 8 August, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: Accepted by IEEE TKDE

  10. arXiv:2212.08322  [pdf, other

    cs.AI cs.CL

    ReCo: Reliable Causal Chain Reasoning via Structural Causal Recurrent Neural Networks

    Authors: Kai Xiong, Xiao Ding, Zhongyang Li, Li Du, Bing Qin, Yi Zheng, Baoxing Huai

    Abstract: Causal chain reasoning (CCR) is an essential ability for many decision-making AI systems, which requires the model to build reliable causal chains by connecting causal pairs. However, CCR suffers from two main transitive problems: threshold effect and scene drift. In other words, the causal pairs to be spliced may have a conflicting threshold boundary or scenario. To address these issues, we propo… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted by EMNLP 2022

  11. arXiv:2212.06522  [pdf, other

    cs.CL

    Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-grained Student Ensemble

    Authors: Xiaoye Qu, Jun Zeng, Daizong Liu, Zhefeng Wang, Baoxing Huai, Pan Zhou

    Abstract: Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples. Unfortunately, the distant supervision may induce noisy labels, thus undermining the robustness of the learned models and restricting the practical application. To relieve this problem, recent works adopt self-training teacher-student framewor… ▽ More

    Submitted 13 December, 2022; originally announced December 2022.

    Comments: Accepted at AAAI 2023

  12. arXiv:2210.17122  [pdf, other

    cs.CL

    Mining Word Boundaries in Speech as Naturally Annotated Word Segmentation Data

    Authors: Lei Zhang, Zhenghua Li, Shilin Zhou, Chen Gong, Zhefeng Wang, Baoxing Huai, Min Zhang

    Abstract: Inspired by early research on exploring naturally annotated data for Chinese word segmentation (CWS), and also by recent research on integration of speech and text processing, this work for the first time proposes to mine word boundaries from parallel speech/text data. First we collect parallel speech/text data from two Internet sources that are related with CWS data used in our experiments. Then,… ▽ More

    Submitted 30 October, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

    Comments: latest version

  13. arXiv:2205.10687  [pdf, other

    cs.CL

    Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding

    Authors: Abbas Ghaddar, Yimeng Wu, Sunyam Bagga, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais

    Abstract: There is a growing body of work in recent years to develop pre-trained language models (PLMs) for the Arabic language. This work concerns addressing two major problems in existing Arabic PLMs which constraint progress of the Arabic NLU and NLG fields.First, existing Arabic PLMs are not well-explored and their pre-trainig can be improved significantly using a more methodical approach. Second, there… ▽ More

    Submitted 21 May, 2022; originally announced May 2022.

  14. arXiv:2204.05544  [pdf, other

    cs.CL

    Delving Deep into Regularity: A Simple but Effective Method for Chinese Named Entity Recognition

    Authors: Yingjie Gu, Xiaoye Qu, Zhefeng Wang, Yi Zheng, Baoxing Huai, Nicholas **g Yuan

    Abstract: Recent years have witnessed the improving performance of Chinese Named Entity Recognition (NER) from proposing new frameworks or incorporating word lexicons. However, the inner composition of entity mentions in character-level Chinese NER has been rarely studied. Actually, most mentions of regular types have strong name regularity. For example, entities end with indicator words such as "company" o… ▽ More

    Submitted 18 April, 2022; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: Accepted at NAACL 2022 Findings

  15. arXiv:2202.10301  [pdf, other

    cs.CV cs.AI

    VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation

    Authors: Jiong Wang, Zhou Zhao, Weike **, Xinyu Duan, Zhen Lei, Baoxing Huai, Yiling Wu, Xiaofei He

    Abstract: For face presentation attack detection (PAD), most of the spoofing cues are subtle, local image patterns (e.g., local image distortion, 3D mask edge and cut photo edges). The representations of existing PAD works with simple global pooling method, however, lose the local feature discriminability. In this paper, the VLAD aggregation method is adopted to quantize local features with visual vocabular… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

    Comments: ACM MM 2021

  16. arXiv:2112.06013  [pdf, other

    cs.CL

    Efficient Document-level Event Extraction via Pseudo-Trigger-aware Pruned Complete Graph

    Authors: Tong Zhu, Xiaoye Qu, Wenliang Chen, Zhefeng Wang, Baoxing Huai, Nicholas **g Yuan, Min Zhang

    Abstract: Most previous studies of document-level event extraction mainly focus on building argument chains in an autoregressive way, which achieves a certain success but is inefficient in both training and inference. In contrast to the previous studies, we propose a fast and lightweight model named as PTPCG. In our model, we design a novel strategy for event argument combination together with a non-autoreg… ▽ More

    Submitted 4 October, 2022; v1 submitted 11 December, 2021; originally announced December 2021.

    Comments: Accepted to IJCAI'2022

  17. arXiv:2112.04329  [pdf, other

    cs.CL

    JABER and SABER: Junior and Senior Arabic BERt

    Authors: Abbas Ghaddar, Yimeng Wu, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais

    Abstract: Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception. However, we found that previously released Arabic BERT models were significantly under-trained. In this technical report, we present JABER and SABER, Junior and Senior Arabic BERt respectively, our pre-trained language model prototypes dedicated f… ▽ More

    Submitted 9 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Technical Report; v2: add SABER and CAMeLBERT evaluation; v3: fix minor typos and grammatical errors

  18. arXiv:2110.07468  [pdf, other

    eess.AS cs.MM cs.SD

    SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation

    Authors: Rongjie Huang, Chenye Cui, Feiyang Chen, Yi Ren, **glin Liu, Zhou Zhao, Baoxing Huai, Zhefeng Wang

    Abstract: Deep generative models have achieved significant progress in speech synthesis to date, while high-fidelity singing voice synthesis is still an open problem for its long continuous pronunciation, rich high-frequency parts, and strong expressiveness. Existing neural vocoders designed for text-to-speech cannot directly be applied to singing voice synthesis because they result in glitches and poor hig… ▽ More

    Submitted 5 August, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted by ACM Multimedia 2022

  19. arXiv:2107.06831  [pdf, other

    cs.MM cs.CV

    Parallel and High-Fidelity Text-to-Lip Generation

    Authors: **glin Liu, Zhiying Zhu, Yi Ren, Wencan Huang, Baoxing Huai, Nicholas Yuan, Zhou Zhao

    Abstract: As a key component of talking face generation, lip movements generation determines the naturalness and coherence of the generated talking face video. Prior literature mainly focuses on speech-to-lip generation while there is a paucity in text-to-lip (T2L) generation. T2L is a challenging task and existing end-to-end works depend on the attention mechanism and autoregressive (AR) decoding manner. H… ▽ More

    Submitted 20 December, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: Author draft

  20. arXiv:2106.00334  [pdf, ps, other

    cs.CL

    An In-depth Study on Internal Structure of Chinese Words

    Authors: Chen Gong, Saihao Huang, Houquan Zhou, Zhenghua Li, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas **g Yuan

    Abstract: Unlike English letters, Chinese characters have rich and specific meanings. Usually, the meaning of a word can be derived from its constituent characters in some way. Several previous works on syntactic parsing propose to annotate shallow word-internal structures for better utilizing character-level information. This work proposes to model the deep internal structures of Chinese words as dependenc… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: Accepted by ACL-IJCNLP 2021 (long paper)

  21. arXiv:2102.03577  [pdf, other

    cs.IR cs.AI cs.LG

    Drug Package Recommendation via Interaction-aware Graph Induction

    Authors: Zhi Zheng, Chao Wang, Tong Xu, Dazhong Shen, Penggang Qin, Baoxing Huai, Tongzhu Liu, Enhong Chen

    Abstract: Recent years have witnessed the rapid accumulation of massive electronic medical records (EMRs), which highly support the intelligent medical services such as drug recommendation. However, prior arts mainly follow the traditional recommendation strategies like collaborative filtering, which usually treat individual drugs as mutually independent, while the latent interactions among drugs, e.g., syn… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

  22. arXiv:2101.02394  [pdf, other

    cs.CL

    Read, Retrospect, Select: An MRC Framework to Short Text Entity Linking

    Authors: Yingjie Gu, Xiaoye Qu, Zhefeng Wang, Baoxing Huai, Nicholas **g Yuan, Xiaolin Gui

    Abstract: Entity linking (EL) for the rapidly growing short text (e.g. search queries and news titles) is critical to industrial applications. Most existing approaches relying on adequate context for long text EL are not effective for the concise and sparse short text. In this paper, we propose a novel framework called Multi-turn Multiple-choice Machine reading comprehension (M3}) to solve the short text EL… ▽ More

    Submitted 7 January, 2021; originally announced January 2021.

    Comments: Accepted at AAAI 2021

  23. arXiv:2008.06941  [pdf, other

    cs.CV cs.MM

    Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding

    Authors: Zhu Zhang, Zhou Zhao, Zhijie Lin, Baoxing Huai, Nicholas **g Yuan

    Abstract: Spatio-temporal video grounding aims to retrieve the spatio-temporal tube of a queried object according to the given sentence. Currently, most existing grounding methods are restricted to well-aligned segment-sentence pairs. In this paper, we explore spatio-temporal video grounding on unaligned data and multi-form sentences. This challenging task requires to capture critical object relations to id… ▽ More

    Submitted 22 August, 2020; v1 submitted 16 August, 2020; originally announced August 2020.

    Comments: Accepted by IJCAI 2020

  24. arXiv:2008.02516  [pdf, other

    eess.AS cs.CL cs.CV cs.LG cs.SD

    FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire

    Authors: **glin Liu, Yi Ren, Zhou Zhao, Chen Zhang, Baoxing Huai, Nicholas **g Yuan

    Abstract: Lipreading is an impressive technique and there has been a definite improvement of accuracy in recent years. However, existing methods for lipreading mainly build on autoregressive (AR) model, which generate target tokens one by one and suffer from high inference latency. To breakthrough this constraint, we propose FastLR, a non-autoregressive (NAR) lipreading model which generates all target toke… ▽ More

    Submitted 15 March, 2021; v1 submitted 6 August, 2020; originally announced August 2020.

    Comments: Accepted by ACM MM 2020