Skip to main content

Showing 1–50 of 73 results for author: Ou, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17519  [pdf, other

    cs.CL

    Entropy-Based Decoding for Retrieval-Augmented Large Language Models

    Authors: Zexuan Qiu, Zi**g Ou, Bin Wu, **g**g Li, Aiwei Liu, Irwin King

    Abstract: Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, trainin… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.10808  [pdf, other

    cs.LG

    Diffusion Model With Optimal Covariance Matching

    Authors: Zi**g Ou, Mingtian Zhang, Andi Zhang, Tim Z. Xiao, Yingzhen Li, David Barber

    Abstract: The probabilistic diffusion model has become highly effective across various domains. Typically, sampling from a diffusion model involves using a denoising distribution characterized by a Gaussian with a learned mean and either fixed or learned covariances. In this paper, we leverage the recently proposed full covariance moment matching technique and introduce a novel method for learning covarianc… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  3. arXiv:2406.09816  [pdf, other

    math.OC cs.MA

    A Zeroth-Order Proximal Algorithm for Consensus Optimization

    Authors: Chengan Wang, Zichong Ou, Jie Lu

    Abstract: This paper considers a consensus optimization problem, where all the nodes in a network, with access to the zeroth-order information of its local objective function only, attempt to cooperatively achieve a common minimizer of the sum of their local objectives. To address this problem, we develop ZoPro, a zeroth-order proximal algorithm, which incorporates a zeroth-order oracle for approximating He… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

  4. arXiv:2406.02166  [pdf, other

    cs.SD cs.CL eess.AS

    Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision

    Authors: Saierdaer Yusuyin, Te Ma, Hao Huang, Wenbo Zhao, Zhijian Ou

    Abstract: There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. Th… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.13084  [pdf, other

    cs.CL cs.AI

    The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG)

    Authors: Yucheng Cai, Si Chen, Yi Huang, Junlan Feng, Zhijian Ou

    Abstract: The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG), Co-located with SLT 2024

    Submitted 21 May, 2024; originally announced May 2024.

  6. arXiv:2404.11699  [pdf, other

    cs.RO

    Retrieval-Augmented Embodied Agents

    Authors: Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang

    Abstract: Embodied agents operating in complex and uncertain environments face considerable challenges. While some advanced agents handle complex manipulation tasks with proficiency, their success often hinges on extensive training data to develop their capabilities. In contrast, humans typically rely on recalling past experiences and analogous situations to solve new problems. Aiming to emulate this human… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  7. arXiv:2403.10961  [pdf, other

    cs.LG cs.CL cs.SD eess.AS

    Energy-Based Models with Applications to Speech and Language Processing

    Authors: Zhijian Ou

    Abstract: Energy-Based Models (EBMs) are an important class of probabilistic models, also known as random fields and undirected graphical models. EBMs are un-normalized and thus radically different from other popular self-normalized probabilistic models such as hidden Markov models (HMMs), autoregressive models, generative adversarial nets (GANs) and variational auto-encoders (VAEs). Over the past years, EB… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: The version before publisher editing

    Journal ref: Foundations and Trends in Signal Processing: Vol. 18: No. 1-2, pp 1-199

  8. arXiv:2403.06199  [pdf, other

    cs.CV cs.CL

    Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models

    Authors: Minjie Zhu, Yichen Zhu, Xin Liu, Ning Liu, Zhiyuan Xu, Chaomin Shen, Yaxin Peng, Zhicai Ou, Feifei Feng, Jian Tang

    Abstract: Multimodal Large Language Models (MLLMs) have showcased impressive skills in tasks related to visual understanding and reasoning. Yet, their widespread application faces obstacles due to the high computational demands during both the training and inference phases, restricting their use to a limited audience within the research and user communities. In this paper, we investigate the design aspects… ▽ More

    Submitted 25 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

  9. arXiv:2401.02330  [pdf, other

    cs.CV cs.CL

    LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model

    Authors: Yichen Zhu, Minjie Zhu, Ning Liu, Zhicai Ou, Xiaofeng Mou, Jian Tang

    Abstract: In this paper, we introduce LLaVA-$φ$ (LLaVA-Phi), an efficient multi-modal assistant that harnesses the power of the recently advanced small language model, Phi-2, to facilitate multi-modal dialogues. LLaVA-Phi marks a notable advancement in the realm of compact multi-modal models. It demonstrates that even smaller language models, with as few as 2.7B parameters, can effectively engage in intrica… ▽ More

    Submitted 22 February, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: The datasets were incomplete as they did not include all the necessary copyrights

  10. arXiv:2311.10271  [pdf, other

    cs.CL

    Prompt Pool based Class-Incremental Continual Learning for Dialog State Tracking

    Authors: Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Continual learning is crucial for dialog state tracking (DST) in dialog systems, since requirements from users for new functionalities are often encountered. However, most of existing continual learning methods for DST require task identities during testing, which is a severe limit in real-world applications. In this paper, we aim to address continual learning of DST in the class-incremental scena… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  11. arXiv:2310.03262  [pdf, other

    cs.CL

    Predicting Emergent Abilities with Infinite Resolution Evaluation

    Authors: Shengding Hu, Xin Liu, Xu Han, Xinrong Zhang, Chaoqun He, Weilin Zhao, Yankai Lin, Ning Ding, Zebin Ou, Guoyang Zeng, Zhiyuan Liu, Maosong Sun

    Abstract: The scientific scale-up of large language models (LLMs) necessitates a comprehensive understanding of their scaling properties. However, the existing literature on the scaling properties only yields an incomplete answer: optimization loss decreases predictably as the model size increases, in line with established scaling law; yet no scaling law for task has been established and the task performanc… ▽ More

    Submitted 17 April, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: After revision

  12. arXiv:2309.11065  [pdf, other

    cs.CL

    UniPCM: Universal Pre-trained Conversation Model with Task-aware Automatic Prompt

    Authors: Yucheng Cai, Wentao Ma, Yuchuan Wu, Shuzheng Si, Yuan Shao, Zhijian Ou, Yongbin Li

    Abstract: Recent research has shown that multi-task pre-training greatly improves the model's robustness and transfer ability, which is crucial for building a high-quality dialog system. However, most previous works on multi-task pre-training rely heavily on human-defined input format or prompt, which is not optimal in quality and quantity. In this work, we propose to use Task-based Automatic Prompt generat… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  13. arXiv:2307.07595  [pdf, other

    stat.ML cs.LG

    Training Discrete Energy-Based Models with Energy Discrepancy

    Authors: Tobias Schröder, Zi**g Ou, Yingzhen Li, Andrew B. Duncan

    Abstract: Training energy-based models (EBMs) on discrete spaces is challenging because sampling over such spaces can be difficult. We propose to train discrete EBMs with energy discrepancy (ED), a novel type of contrastive loss functional which only requires the evaluation of the energy function at data points and their perturbed counter parts, thus not relying on sampling strategies like Markov chain Mont… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Presented at ICML 2023 Workshop: Sampling and Optimization in Discrete Space (SODS 2023)

  14. arXiv:2307.06431  [pdf, other

    stat.ML cs.LG

    Energy Discrepancies: A Score-Independent Loss for Energy-Based Models

    Authors: Tobias Schröder, Zi**g Ou, Jen Ning Lim, Yingzhen Li, Sebastian J. Vollmer, Andrew B. Duncan

    Abstract: Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likeli… ▽ More

    Submitted 27 November, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

    Comments: Camera Ready version for the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Changes in this revision: Appendix A1: Corrected proof of Theorem 1. Appendix D3: Added definition and numerical experiments for energy discrepancy on binary discrete spaces. Minor changes in the main text and correction of typos. Added new references

  15. arXiv:2305.13199  [pdf, other

    cs.CL

    Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision

    Authors: Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Most existing task-oriented dialog (TOD) systems track dialog states in terms of slots and values and use them to query a database to get relevant knowledge to generate responses. In real-life applications, user utterances are noisier, and thus it is more difficult to accurately track dialog states and correctly secure relevant knowledge. Recently, a progress in question answering and document-gro… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 5 pages, accepted by INTERSPEECH2023

  16. arXiv:2305.12676  [pdf, other

    cs.CL

    Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition

    Authors: Hong Liu, Zhaobiao Lv, Zhijian Ou, Wenbo Zhao, Qing Xiao

    Abstract: Energy-based language models (ELMs) parameterize an unnormalized distribution for natural sentences and are radically different from popular autoregressive language models (ALMs). As an important application, ELMs have been successfully used as a means for calculating sentence scores in speech recognition, but they all use less-modern CNN or LSTM networks. The recent progress in Transformer networ… ▽ More

    Submitted 29 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted into INTERSPEECH 2023

  17. arXiv:2305.02139  [pdf, other

    cs.LG cs.CL

    A Curriculum View of Robust Loss Functions

    Authors: Zebin Ou, Yue Zhang

    Abstract: Robust loss functions are designed to combat the adverse impacts of label noise, whose robustness is typically supported by theoretical bounds agnostic to the training dynamics. However, these bounds may fail to characterize the empirical performance as it remains unclear why robust loss functions can underfit. We show that most loss functions can be rewritten into a form with the same class-score… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  18. arXiv:2304.10707  [pdf, other

    stat.ML cs.LG

    Persistently Trained, Diffusion-assisted Energy-based Models

    Authors: Xinwei Zhang, Zhiqiang Tan, Zhijian Ou

    Abstract: Maximum likelihood (ML) learning for energy-based models (EBMs) is challenging, partly due to non-convergence of Markov chain Monte Carlo.Several variations of ML learning have been proposed, but existing methods all fail to achieve both post-training image generation and proper density estimation. We propose to introduce diffusion data and learn a joint EBM, called diffusion assisted-EBMs, throug… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: main text 8 pages

  19. arXiv:2304.03548  [pdf, other

    cs.CL

    GEMINI: Controlling the Sentence-level Writing Style for Abstractive Text Summarization

    Authors: Guangsheng Bao, Zebin Ou, Yue Zhang

    Abstract: Human experts write summaries using different techniques, including extracting a sentence from the document and rewriting it, or fusing various information from the document to abstract it. These techniques are flexible and thus difficult to be imitated by any single method. To address this issue, we propose an adaptive model, GEMINI, that integrates a rewriter and a generator to mimic the sentenc… ▽ More

    Submitted 9 December, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

    Comments: EMNLP2023 camera-ready version. 8 pages, 5 figures, 6 tables

  20. arXiv:2211.02639  [pdf, other

    eess.IV cs.CV cs.LG

    PIPPI2021: An Approach to Automated Diagnosis and Texture Analysis of the Fetal Liver & Placenta in Fetal Growth Restriction

    Authors: Aya Mutaz Zeidan, Paula Ramirez Gilliland, Ashay Patel, Zhanchong Ou, Dimitra Flouri, Nada Mufti, Kasia Maksym, Rosalind Aughwane, Sebastien Ourselin, Anna David, Andrew Melbourne

    Abstract: Fetal growth restriction (FGR) is a prevalent pregnancy condition characterised by failure of the fetus to reach its genetically predetermined growth potential. We explore the application of model fitting techniques, linear regression machine learning models, deep learning regression, and Haralick textured features from multi-contrast MRI for multi-fetal organ analysis of FGR. We employed T2 relax… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  21. arXiv:2210.11720  [pdf, other

    cs.CL

    MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling Correction

    Authors: Wangjie Jiang, Zhihao Ye, Zi**g Ou, Ruihui Zhao, Jianguang Zheng, Yi Liu, Siheng Li, Bang Liu, Yujiu Yang, Yefeng Zheng

    Abstract: Chinese Spelling Correction (CSC) is gaining increasing attention due to its promise of automatically detecting and correcting spelling errors in Chinese texts. Despite its extensive use in many applications, like search engines and optical character recognition systems, little has been explored in medical scenarios in which complex and uncommon medical entities are easily misspelled. Correcting t… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: The full version of CIKM 2022 accepted resource paper "MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling Correction". (https://dl.acm.org/doi/10.1145/3511808.3557636)

  22. arXiv:2210.08692  [pdf, other

    cs.CL cs.AI

    A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems

    Authors: Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Building user simulators (USs) for reinforcement learning (RL) of task-oriented dialog systems (DSs) has gained more and more attention, which, however, still faces several fundamental challenges. First, it is unclear whether we can leverage pretrained language models to design, for example, GPT-2 based USs, to catch up and interact with the recently advanced GPT-2 based DSs. Second, an important… ▽ More

    Submitted 18 October, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: Accepted by EMNLP 2022 SereTOD Workshop

  23. arXiv:2210.06706  [pdf, other

    cs.CL cs.AI

    Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture

    Authors: Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Recently, there has been progress in supervised funetuning pretrained GPT-2 to build end-to-end task-oriented dialog (TOD) systems. However, online reinforcement learning of a GPT-2 based dialog system (DS), together with a end-to-end user simulator (US), has not ever been explored. Moreover, a drawback with existing GPT-2 based TOD systems is that they mostly employ the whole dialog history as in… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: An early version of Markovian Generative Architectures (MGA) and Generative User Simulator (GUS)

  24. arXiv:2209.13464  [pdf, other

    cs.CL cs.AI

    Information Extraction and Human-Robot Dialogue towards Real-life Tasks: A Baseline Study with the MobileCS Dataset

    Authors: Hong Liu, Hao Peng, Zhijian Ou, Juanzi Li, Yi Huang, Junlan Feng

    Abstract: Recently, there have merged a class of task-oriented dialogue (TOD) datasets collected through Wizard-of-Oz simulated games. However, the Wizard-of-Oz data are in fact simulated data and thus are fundamentally different from real-life conversations, which are more noisy and casual. Recently, the SereTOD challenge is organized and releases the MobileCS dataset, which consists of real-world dialog t… ▽ More

    Submitted 18 October, 2022; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted by EMNLP 2022 SereTOD Workshop

  25. arXiv:2207.12235  [pdf, other

    cs.CL

    Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning of Discrete Latent Variable Models

    Authors: Yucheng Cai, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Develo** semi-supervised task-oriented dialog (TOD) systems by leveraging unlabeled dialog data has attracted increasing interests. For semi-supervised learning of latent state TOD models, variational learning is often used, but suffers from the annoying high-variance of the gradients propagated through discrete latent variables and the drawback of indirectly optimizing the target log-likelihood… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted into SIGDIAL 2022

  26. arXiv:2207.02657  [pdf, other

    cs.CL

    A Challenge on Semi-Supervised and Reinforced Task-Oriented Dialog Systems

    Authors: Zhijian Ou, Junlan Feng, Juanzi Li, Yakun Li, Hong Liu, Hao Peng, Yi Huang, Jiangjiang Zhao

    Abstract: A challenge on Semi-Supervised and Reinforced Task-Oriented Dialog Systems, Co-located with EMNLP2022 SereTOD Workshop.

    Submitted 25 September, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

    Comments: Version 2.1

  27. arXiv:2205.06530  [pdf, other

    cs.CV

    Modeling Semantic Composition with Syntactic Hypergraph for Video Question Answering

    Authors: Zenan Xu, Wanjun Zhong, Qinliang Su, Zi**g Ou, Fuwei Zhang

    Abstract: A key challenge in video question answering is how to realize the cross-modal semantic alignment between textual concepts and corresponding visual objects. Existing methods mostly seek to align the word representations with the video regions. However, word representations are often not able to convey a complete description of textual concepts, which are in general described by the compositions of… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: 11pages, 7 figures

  28. arXiv:2204.07367  [pdf, other

    cs.CL

    On the Role of Pre-trained Language Models in Word Ordering: A Case Study with BART

    Authors: Zebin Ou, Meishan Zhang, Yue Zhang

    Abstract: Word ordering is a constrained language generation task taking unordered words as input. Existing work uses linear models and neural networks for the task, yet pre-trained language models have not been studied in word ordering, let alone why they help. We use BART as an instance and show its effectiveness in the task. To explain why BART helps word ordering, we extend analysis with probing and emp… ▽ More

    Submitted 28 October, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: COLING 2022

  29. arXiv:2204.06452  [pdf, other

    cs.CL cs.HC

    Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems

    Authors: Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Recently, Transformer based pretrained language models (PLMs), such as GPT2 and T5, have been leveraged to build generative task-oriented dialog (TOD) systems. A drawback of existing PLM-based models is their non-Markov architectures across turns, i.e., the whole history is used as the conditioning input at each turn. First, this brings inefficiencies in memory and computation. Furthermore, using… ▽ More

    Submitted 13 October, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted by SLT 2022

  30. arXiv:2203.16776  [pdf, ps, other

    eess.AS cs.CL cs.LG

    An Empirical Study of Language Model Integration for Transducer based Speech Recognition

    Authors: Huahuan Zheng, Keyu An, Zhijian Ou, Chen Huang, Ke Ding, Guanglu Wan

    Abstract: Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and internal language model estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract th… ▽ More

    Submitted 3 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted into INTERSPEECH 2022

  31. arXiv:2203.16758  [pdf, other

    eess.AS cs.CL

    CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

    Authors: Keyu An, Huahuan Zheng, Zhijian Ou, Hongyu Xiang, Ke Ding, Guanglu Wan

    Abstract: History and future contextual information are known to be important for accurate acoustic modeling. However, acquiring future context brings latency for streaming ASR. In this paper, we propose a new framework - Chunking, Simulating Future Context and Decoding (CUSIDE) for streaming speech recognition. A new simulation module is introduced to recursively simulate the future contextual frames, with… ▽ More

    Submitted 2 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted into INTERSPEECH 2022

  32. arXiv:2203.16757  [pdf, ps, other

    eess.AS cs.CL

    Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study

    Authors: Keyu An, Ji Xiao, Zhijian Ou

    Abstract: Recently, the end-to-end training approach for multi-channel ASR has shown its effectiveness, which usually consists of a beamforming front-end and a recognition back-end. However, the end-to-end training becomes more difficult due to the integration of multiple modules, particularly considering that multi-channel speech data recorded in real environments are limited in size. This raises the deman… ▽ More

    Submitted 8 October, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted by ISCSLP 2022. arXiv admin note: substantial text overlap with arXiv:2107.02670

  33. arXiv:2203.01693  [pdf, other

    cs.LG

    Learning Neural Set Functions Under the Optimal Subset Oracle

    Authors: Zi**g Ou, Tingyang Xu, Qinliang Su, Yingzhen Li, Peilin Zhao, Yatao Bian

    Abstract: Learning neural set functions becomes increasingly more important in many applications like product recommendation and compound selection in AI-aided drug discovery. The majority of existing works study methodologies of set function learning under the function value oracle, which, however, requires expensive supervision signals. This renders it impractical for applications with only weak supervisi… ▽ More

    Submitted 23 May, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

  34. arXiv:2112.01686  [pdf, other

    cs.CV

    Make A Long Image Short: Adaptive Token Length for Vision Transformers

    Authors: Yichen Zhu, Yuqin Zhu, Jie Du, Yi Wang, Zhicai Ou, Feifei Feng, Jian Tang

    Abstract: The vision transformer splits each image into a sequence of tokens with fixed length and processes the tokens in the same way as words in natural language processing. More tokens normally lead to better performance but considerably increased computational cost. Motivated by the proverb "A picture is worth a thousand words" we aim to accelerate the ViT model by making a long image short. To this en… ▽ More

    Submitted 5 December, 2021; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: 10 pages, Technical report

  35. arXiv:2112.00265  [pdf, other

    cs.LG cs.CV

    Training BatchNorm Only in Neural Architecture Search and Beyond

    Authors: Yichen Zhu, Jie Du, Yuqin Zhu, Yi Wang, Zhicai Ou, Feifei Feng, Jian Tang

    Abstract: This work investigates the usage of batch normalization in neural architecture search (NAS). Specifically, Frankle et al. find that training BatchNorm only can achieve nontrivial performance. Furthermore, Chen et al. claim that training BatchNorm only can speed up the training of the one-shot NAS supernet over ten times. Critically, there is no effort to understand 1) why training BatchNorm only c… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

    Comments: 11 pages Technical report

  36. arXiv:2111.01415  [pdf, other

    cs.SE cs.AI cs.CR

    Callee: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning

    Authors: Wenyu Zhu, Zhiyao Feng, Zihan Zhang, Jianjun Chen, Zhijian Ou, Min Yang, Chao Zhang

    Abstract: Recovering binary programs' call graphs is crucial for inter-procedural analysis tasks and applications based on them.transfer One of the core challenges is recognizing targets of indirect calls (i.e., indirect callees). Existing solutions all have high false positives and negatives, making call graphs inaccurate. In this paper, we propose a new solution Callee combining transfer learning and cont… ▽ More

    Submitted 23 December, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

  37. arXiv:2110.06354  [pdf, other

    cs.CL cs.IR cs.LG

    Tell Me How to Survey: Literature Review Made Simple with Automatic Reading Path Generation

    Authors: Jiayuan Ding, Tong Xiang, Zi**g Ou, Wangyang Zuo, Ruihui Zhao, Chenghua Lin, Yefeng Zheng, Bang Liu

    Abstract: Recent years have witnessed the dramatic growth of paper volumes with plenty of new research papers published every day, especially in the area of computer science. How to glean papers worth reading from the massive literature to do a quick survey or keep up with the latest advancement about a specific research topic has become a challenging task. Existing academic search engines such as Google Sc… ▽ More

    Submitted 25 April, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: 16 pages, 12 figures

    Journal ref: ICDE 2022

  38. arXiv:2109.04314  [pdf, other

    cs.CL

    Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems

    Authors: Hong Liu, Yucheng Cai, Zhenru Lin, Zhijian Ou, Yi Huang, Junlan Feng

    Abstract: Recently, two approaches, fine-tuning large pre-trained language models and variational training, have attracted significant interests, separately, for semi-supervised end-to-end task-oriented dialog (TOD) systems. In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches. Among many options of models, we propose the g… ▽ More

    Submitted 26 January, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted into IEEE/ACM Transactions on Audio, Speech and Language Processing

  39. arXiv:2109.02867  [pdf, other

    cs.IR

    Refining BERT Embeddings for Document Hashing via Mutual Information Maximization

    Authors: Zi**g Ou, Qinliang Su, Jianxing Yu, Ruihui Zhao, Yefeng Zheng, Bang Liu

    Abstract: Existing unsupervised document hashing methods are mostly established on generative models. Due to the difficulties of capturing long dependency structures, these methods rarely model the raw documents directly, but instead to model the features extracted from them (e.g. bag-of-words (BOW), TFIDF). In this paper, we propose to learn hash codes from BERT embeddings after observing their tremendous… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

  40. arXiv:2107.05038  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings

    Authors: Chengrui Zhu, Keyu An, Huahuan Zheng, Zhijian Ou

    Abstract: The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing for multilingual and crosslingual speech recognition methods for low-resourced languages. A drawback suffered by previous methods in using phonological features is that the acoustic-to-PF extraction in a bottom-up way is itself difficult… ▽ More

    Submitted 30 October, 2021; v1 submitted 11 July, 2021; originally announced July 2021.

    Comments: ASRU2021

  41. arXiv:2107.03007  [pdf, other

    eess.AS cs.CL cs.SD

    Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers

    Authors: Huahuan Zheng, Wenjie Peng, Zhijian Ou, **song Zhang

    Abstract: Automatic speech recognition systems have been largely improved in the past few decades and current systems are mainly hybrid-based and end-to-end-based. The recently proposed CTC-CRF framework inherits the data-efficiency of the hybrid approach and the simplicity of the end-to-end approach. In this paper, we further advance CTC-CRF based ASR technique with explorations on modeling units and neura… ▽ More

    Submitted 8 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: Submitted to ASRU 2021

  42. arXiv:2105.13066  [pdf, other

    cs.IR cs.AI

    Integrating Semantics and Neighborhood Information with Graph-Driven Generative Models for Document Retrieval

    Authors: Zi**g Ou, Qinliang Su, Jianxing Yu, Bang Liu, **gwen Wang, Ruihui Zhao, Changyou Chen, Yefeng Zheng

    Abstract: With the need of fast retrieval speed and small memory footprint, document hashing has been playing a crucial role in large-scale information retrieval. To generate high-quality hashing code, both semantics and neighborhood information are crucial. However, most existing methods leverage only one of them or simply combine them via some intuitive criteria, lacking a theoretical principle to guide t… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Journal ref: ACL2021

  43. arXiv:2105.06138  [pdf, other

    cs.CV

    Unsupervised Hashing with Contrastive Information Bottleneck

    Authors: Zexuan Qiu, Qinliang Su, Zi**g Ou, Jianxing Yu, Changyou Chen

    Abstract: Many unsupervised hashing methods are implicitly established on the idea of reconstructing the input data, which basically encourages the hashing codes to retain as much information of original data as possible. However, this requirement may force the models spending lots of their effort on reconstructing the unuseful background information, while ignoring to preserve the discriminative semantic i… ▽ More

    Submitted 18 May, 2021; v1 submitted 13 May, 2021; originally announced May 2021.

    Comments: IJCAI 2021

  44. arXiv:2104.14791  [pdf, other

    eess.AS cs.CL cs.SD

    Deformable TDNN with adaptive receptive fields for speech recognition

    Authors: Keyu An, Yi Zhang, Zhijian Ou

    Abstract: Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems. Nevertheless, the receptive fields of TDNNs are limited and fixed, which is not desirable for tasks like speech recognition, where the temporal dynamics of speech are varied and affected by many factors. This paper proposes to use deformable TDNNs for adaptive t… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

    Comments: 5 pages. submitted to Interspeech 2021

  45. arXiv:2104.04748  [pdf, other

    cs.CL cs.AI cs.LG

    Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management

    Authors: Zhengxu Hou, Bang Liu, Ruihui Zhao, Zi**g Ou, Yafei Liu, Xi Chen, Yefeng Zheng

    Abstract: For task-oriented dialog systems, training a Reinforcement Learning (RL) based Dialog Management module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL.To solve this problem, many strategies have been proposed to give proper rewards when training RL, but their rewards lack interpretability and cannot accurately estimate the distribution of state-action… ▽ More

    Submitted 10 April, 2021; originally announced April 2021.

    Comments: 9 pages

    Journal ref: NAACL 2021

  46. arXiv:2011.06724  [pdf, other

    cs.SD eess.AS

    The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

    Authors: Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao

    Abstract: Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data. However improving robustness, including achieving equally good performance on diverse speakers and accents, is still a challenging problem. In particular, the performance of children speech recognition (CSR) still lags behind due to 1) the speech and language characteristics of children's… ▽ More

    Submitted 16 November, 2020; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: 7 pages, 3 figures, 3 tables

  47. arXiv:2011.05649  [pdf, other

    eess.AS cs.LG

    Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

    Authors: Huahuan Zheng, Keyu An, Zhijian Ou

    Abstract: Neural Architecture Search (NAS), the process of automating architecture engineering, is an appealing next step to advancing end-to-end Automatic Speech Recognition (ASR), replacing expert-designed networks with learned, task-specific architectures. In contrast to early computational-demanding NAS methods, recent gradient-based NAS methods, e.g., DARTS (Differentiable ARchiTecture Search), SNAS (S… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: Accepted by IEEE SLT 2021

  48. arXiv:2010.14047  [pdf, other

    cs.SI

    Embedding Dynamic Attributed Networks by Modeling the Evolution Processes

    Authors: Zenan Xu, Zi**g Ou, Qinliang Su, Jianxing Yu, Xiaojun Quan, Zhenkun Lin

    Abstract: Network embedding has recently emerged as a promising technique to embed nodes of a network into low-dimensional vectors. While fairly successful, most existing works focus on the embedding techniques for static networks. But in practice, there are many networks that are evolving over time and hence are dynamic, e.g., the social networks. To address this issue, a high-order spatio-temporal embeddi… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Accepted by COLING 2020 : The 28th International Conference on Computational Linguistics

  49. arXiv:2010.13116  [pdf, other

    cs.LG

    An empirical study of domain-agnostic semi-supervised learning via energy-based models: joint-training and pre-training

    Authors: Yunfu Song, Huahuan Zheng, Zhijian Ou

    Abstract: A class of recent semi-supervised learning (SSL) methods heavily rely on domain-specific data augmentations. In contrast, generative SSL methods involve unsupervised learning based on generative models by either joint-training or pre-training, and are more appealing from the perspective of being domain-agnostic, since they do not inherently require data augmentations. Joint-training estimates the… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

  50. arXiv:2009.08115  [pdf, other

    cs.CL cs.AI

    A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning

    Authors: Yichi Zhang, Zhijian Ou, Huixin Wang, Junlan Feng

    Abstract: Structured belief states are crucial for user goal tracking and database query in task-oriented dialog systems. However, training belief trackers often requires expensive turn-level annotations of every user utterance. In this paper we aim at alleviating the reliance on belief state labels in building end-to-end dialog systems, by leveraging unlabeled dialog data towards semi-supervised learning.… ▽ More

    Submitted 13 October, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: Accepted by EMNLP 2020