Skip to main content

Showing 1–20 of 20 results for author: Sha, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19959  [pdf, other

    cs.SD eess.AS

    RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

    Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

    Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2405.16635  [pdf, other

    cs.CL

    Compressing Lengthy Context With UltraGist

    Authors: Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou

    Abstract: Compressing lengthy context is a critical but technically challenging problem. In this paper, we propose a new method called UltraGist, which is distinguished for its high-quality compression of lengthy context due to the innovative design of the compression and learning algorithm. UltraGist brings forth the following important benefits. Firstly, it notably contributes to the flexibility of compre… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  3. arXiv:2404.19553  [pdf, other

    cs.CL

    Extending Llama-3's Context Ten-Fold Overnight

    Authors: Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hong** Qian, Qiwei Ye, Zhicheng Dou

    Abstract: We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  4. arXiv:2404.16587  [pdf, other

    cs.CL cs.AI

    Understanding Privacy Risks of Embeddings Induced by Large Language Models

    Authors: Zhihao Zhu, Ninglu Shao, Defu Lian, Chenwang Wu, Zheng Liu, Yi Yang, Enhong Chen

    Abstract: Large language models (LLMs) show early signs of artificial general intelligence but struggle with hallucinations. One promising solution to mitigate these hallucinations is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation. However, such a solution risks compromising privacy, as recent studies experimentally showed that the original text can be partially rec… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  5. arXiv:2402.11577  [pdf, other

    cs.CL

    Extensible Embedding: A Flexible Multipler For LLM's Context Length

    Authors: Ninglu Shao, Shitao Xiao, Zheng Liu, Peitian Zhang

    Abstract: Large language models (LLMs) call for extension of context to handle many critical applications. However, the existing approaches are prone to expensive costs and inferior quality of context extension. In this work, we propose Extensible Embedding, which realizes high-quality extension of LLM's context with strong flexibility and cost-effectiveness. Extensible embedding stand as an enhancement of… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  6. arXiv:2401.07793  [pdf, other

    cs.CL

    Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization

    Authors: Ninglu Shao, Shitao Xiao, Zheng Liu, Peitian Zhang

    Abstract: Large language models (LLMs) are in need of sufficient contexts to handle many critical applications, such as retrieval augmented generation and few-shot learning. However, due to the constrained window size, the LLMs can only access to the information within a limited context. Although the size of context window can be extended by fine-tuning, it will result in a substantial cost in both training… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  7. arXiv:2401.03462  [pdf, other

    cs.CL cs.AI

    Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

    Authors: Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou

    Abstract: The utilization of long contexts poses a big challenge for LLMs due to their limited context window size. Although the context window can be extended through fine-tuning, it will result in a considerable cost at both training and inference time, and exert an unfavorable impact to the LLM's original capabilities. In this work, we propose a new method called Activation Beacon, which condenses LLM's… ▽ More

    Submitted 2 February, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  8. arXiv:2309.13916  [pdf, other

    eess.AS cs.SD

    Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors

    Authors: Di Liang, Nian Shao, Xiaofei Li

    Abstract: This work proposes a frame-wise online/streaming end-to-end neural diarization (FS-EEND) method in a frame-in-frame-out fashion. To frame-wisely detect a flexible number of speakers and extract/update their corresponding attractors, we propose to leverage a causal speaker embedding encoder and an online non-autoregressive self-attention-based attractor decoder. A look-ahead mechanism is adopted to… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  9. arXiv:2309.08153  [pdf, other

    eess.AS cs.SD

    Fine-tune the pretrained ATST model for sound event detection

    Authors: Nian Shao, Xian Li, Xiaofei Li

    Abstract: Sound event detection (SED) often suffers from the data deficiency problem. The recent baseline system in the DCASE2023 challenge task 4 leverages the large pretrained self-supervised learning (SelfSL) models to mitigate such restriction, where the pretrained models help to produce more discriminative features for SED. However, the pretrained models are regarded as a frozen feature extractor in th… ▽ More

    Submitted 29 December, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: 5 pages, 3 figures, camera-ready version for ICASSP 2024

  10. arXiv:2306.04186  [pdf, other

    eess.AS cs.LG

    Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks

    Authors: Xian Li, Nian Shao, Xiaofei Li

    Abstract: Self-supervised learning (SSL) has emerged as a popular approach for learning audio representations. One goal of audio self-supervised pre-training is to transfer knowledge to downstream audio tasks, generally including clip-level and frame-level tasks. While frame-level tasks are important for fine-grained acoustic scene/event understanding, prior studies primarily evaluate on clip-level downstre… ▽ More

    Submitted 7 November, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Submitted to IEEE TASLP. arXiv admin note: text overlap with arXiv:2204.12076

  11. Uncovering ChatGPT's Capabilities in Recommender Systems

    Authors: Sunhao Dai, Ninglu Shao, Haiyuan Zhao, Weijie Yu, Zihua Si, Chen Xu, Zhongxiang Sun, Xiao Zhang, Jun Xu

    Abstract: The debut of ChatGPT has recently attracted the attention of the natural language processing (NLP) community and beyond. Existing studies have demonstrated that ChatGPT shows significant improvement in a range of downstream NLP tasks, but the capabilities and limitations of ChatGPT in terms of recommendations remain unclear. In this study, we aim to conduct an empirical analysis of ChatGPT's recom… ▽ More

    Submitted 24 August, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted by RecSys 2023

  12. arXiv:2210.17041  [pdf, other

    cs.CL

    GPS: Genetic Prompt Search for Efficient Few-shot Learning

    Authors: Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Yanggang Wang, Haiyu Li, Zhilin Yang

    Abstract: Prompt-based techniques have demostrated great potential for improving the few-shot generalization of pretrained language models. However, their performance heavily relies on the manual design of prompts and thus requires a lot of human efforts. In this paper, we introduce Genetic Prompt Search (GPS) to improve few-shot learning with prompts, which utilizes a genetic algorithm to automatically sea… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: 10 pages

  13. arXiv:2210.06719  [pdf, other

    cs.LG cs.AI

    Reward Imputation with Sketching for Contextual Batched Bandits

    Authors: Xiao Zhang, Ninglu Shao, Zihua Si, Jun Xu, Wenhan Wang, Han**g Su, Ji-Rong Wen

    Abstract: Contextual batched bandit (CBB) is a setting where a batch of rewards is observed from the environment at the end of each episode, but the rewards of the non-executed actions are unobserved, resulting in partial-information feedback. Existing approaches for CBB often ignore the rewards of the non-executed actions, leading to underutilization of feedback information. In this paper, we propose an ef… ▽ More

    Submitted 7 October, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted by NeurIPS 2023

    ACM Class: I.2.6

  14. arXiv:2201.06910  [pdf, other

    cs.LG cs.CL

    ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization

    Authors: Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Yanggang Wang, Haiyu Li, Zhilin Yang

    Abstract: We propose a multitask pretraining approach ZeroPrompt for zero-shot generalization, focusing on task scaling and zero-shot prompting. While previous models are trained on only a few dozen tasks, we scale to 1,000 tasks for the first time using real-world data. This leads to a crucial discovery that task scaling can be an efficient alternative to model scaling; i.e., the model size has little impa… ▽ More

    Submitted 30 October, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: 18 pages

  15. arXiv:2110.11144  [pdf, other

    eess.AS cs.LG cs.SD

    RCT: Random Consistency Training for Semi-supervised Sound Event Detection

    Authors: Nian Shao, Erfan Loweimi, Xiaofei Li

    Abstract: Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency. The integration of semi-supervised learning (SSL) largely mitigates such problem while bringing no extra annotation budget. This paper researches on several core modules of SSL, and introduces a random consistency training (RCT) strategy. First, a self-consistency loss is… ▽ More

    Submitted 27 March, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: Preprint for interspeech 2022

  16. arXiv:2102.03741  [pdf, other

    cs.CL

    Memory Augmented Sequential Paragraph Retrieval for Multi-hop Question Answering

    Authors: Nan Shao, Yiming Cui, Ting Liu, Shi** Wang, Guo** Hu

    Abstract: Retrieving information from correlative paragraphs or documents to answer open-domain multi-hop questions is very challenging. To deal with this challenge, most of the existing works consider paragraphs as nodes in a graph and propose graph-based methods to retrieve them. However, in this paper, we point out the intrinsic defect of such methods. Instead, we propose a new architecture that models p… ▽ More

    Submitted 7 February, 2021; originally announced February 2021.

    Comments: 10 pages

  17. Is Graph Structure Necessary for Multi-hop Question Answering?

    Authors: Nan Shao, Yiming Cui, Ting Liu, Shi** Wang, Guo** Hu

    Abstract: Recently, attempting to model texts as graph structure and introducing graph neural networks to deal with it has become a trend in many NLP research areas. In this paper, we investigate whether the graph structure is necessary for multi-hop question answering. Our analysis is centered on HotpotQA. We construct a strong baseline model to establish that, with the proper use of pre-trained models, gr… ▽ More

    Submitted 29 October, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: 6 pages, to appear at EMNLP 2020

  18. arXiv:1911.03831  [pdf, other

    cs.LG stat.ML

    Manifold Denoising by Nonlinear Robust Principal Component Analysis

    Authors: He Lyu, Ningyu Sha, Shuyang Qin, Ming Yan, Yuying Xie, Rongrong Wang

    Abstract: This paper extends robust principal component analysis (RPCA) to nonlinear manifolds. Suppose that the observed data matrix is the sum of a sparse component and a component drawn from some low dimensional manifold. Is it possible to separate them by using similar ideas as RPCA? Is there any benefit in treating the manifold as a whole as opposed to treating each local region independently? We answe… ▽ More

    Submitted 9 November, 2019; originally announced November 2019.

  19. TripleNet: Triple Attention Network for Multi-Turn Response Selection in Retrieval-based Chatbots

    Authors: Wentao Ma, Yiming Cui, Nan Shao, Su He, Wei-Nan Zhang, Ting Liu, Shi** Wang, Guo** Hu

    Abstract: We consider the importance of different utterances in the context for selecting the response usually depends on the current query. In this paper, we propose the model TripleNet to fully model the task with the triple <context, query, response> instead of <context, response> in previous works. The heart of TripleNet is a novel attention mechanism named triple attention to model the relationships wi… ▽ More

    Submitted 29 September, 2019; v1 submitted 23 September, 2019; originally announced September 2019.

    Comments: 10 pages, accepted as a conference paper at CoNLL 2019

    Journal ref: CoNLL 2019 737-746

  20. arXiv:1905.10906  [pdf, other

    cs.LG cs.CR stat.ML

    Non-Determinism in Neural Networks for Adversarial Robustness

    Authors: Daanish Ali Khan, Linhong Li, Ninghao Sha, Zhuoran Liu, Abelino Jimenez, Bhiksha Raj, Rita Singh

    Abstract: Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas. In most instances where these tasks are deployed in real-world scenarios, the models used in them have been shown to be susceptible to adversarial attacks, making it imperative for us to address the challenge of… ▽ More

    Submitted 26 May, 2019; originally announced May 2019.