Skip to main content

Showing 1–42 of 42 results for author: Mi, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00617  [pdf, other

    cs.LG cs.AI cs.CL cs.GT

    Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

    Authors: Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu

    Abstract: Reinforcement Learning with Human Feedback (RLHF) has achieved great success in aligning large language models (LLMs) with human preferences. Prevalent RLHF approaches are reward-based, following the Bradley-Terry (BT) model assumption, which may not fully capture the complexity of human preferences. In this paper, we explore RLHF under a general preference framework and approach it from a game-th… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2407.00320  [pdf, other

    cs.CL cs.AI cs.LG

    LiteSearch: Efficacious Tree Search for LLM

    Authors: Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Dian Yu, Haitao Mi, **song Su, Dong Yu

    Abstract: Recent research suggests that tree search algorithms (e.g. Monte Carlo Tree Search) can dramatically boost LLM performance on complex mathematical reasoning tasks. However, they often require more than 10 times the computational resources of greedy decoding due to wasteful search strategies, making them difficult to be deployed in practical applications. This study introduces a novel guided tree s… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2406.20094  [pdf, other

    cs.CL cs.LG

    Scaling Synthetic Data Creation with 1,000,000,000 Personas

    Authors: Xin Chan, Xiaoyang Wang, Dian Yu, Haitao Mi, Dong Yu

    Abstract: We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data. To fully exploit this methodology at scale, we introduce Persona Hub -- a collection of 1 billion diverse personas automatically curated from web data. These 1 billion personas (~13% of the world's total population), acting as distri… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Work in progress

  4. arXiv:2406.11698  [pdf, other

    cs.CL

    Meta Reasoning for Large Language Models

    Authors: Peizhong Gao, Ao Xie, Shaoguang Mao, Wenshan Wu, Yan Xia, Haipeng Mi, Furu Wei

    Abstract: We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) inspired by human meta-reasoning. Traditional in-context learning-based reasoning techniques, such as Tree-of-Thoughts, show promise but lack consistent state-of-the-art performance across diverse tasks due to their specialized nature. MRP addresses this limitation by guiding… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.06326  [pdf, other

    cs.CL

    Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

    Authors: Xiaoying Zhang, Baolin Peng, Ye Tian, **gyan Zhou, Yipeng Zhang, Haitao Mi, Helen Meng

    Abstract: Large language models (LLMs) often struggle to provide up-to-date information due to their one-time training and the constantly evolving nature of the world. To keep LLMs current, existing approaches typically involve continued pre-training on new documents. However, they frequently face difficulties in extracting stored knowledge. Motivated by the remarkable success of the Feynman Technique in ef… ▽ More

    Submitted 15 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 30 pages

  6. arXiv:2406.02866  [pdf

    cs.HC

    A Design Experience for Interactive Narrative Based on The User Behavior

    Authors: Yuan Yao, Haipeng Mi

    Abstract: Research on interactive narrative experiences in physical spaces is becoming more popular, growing into an established new media art format with the development of technology and evolution of audience aesthetics. However, the methods of designing interactive narratives are still similar to the basic video narratology of traditional designers, directors, and producers. This paper provides a design… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: to appear at Cumulus Conference Proceedings Roma 2021

    Journal ref: Cumulus Conference Proceedings Roma 2021

  7. arXiv:2405.17837  [pdf, other

    cs.HC

    Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces

    Authors: Qiuyu Lu, Jiawei Fang, Zhihao Yao, Yue Yang, Shiqing Lyu, Haipeng Mi, Lining Yao

    Abstract: In the field of Human-Computer Interaction (HCI), the development of interactive devices represents a significant area of focus. The advent of novel hardware and advanced fabrication techniques has underscored the demand for specialized design tools that democratize the prototy** process for such cutting-edge devices. While these tools simplify the process through parametric design and simulatio… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 25 pages, 12 figures

  8. arXiv:2404.12253  [pdf, other

    cs.CL cs.LG

    Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

    Authors: Ye Tian, Baolin Peng, Linfeng Song, Lifeng **, Dian Yu, Haitao Mi, Dong Yu

    Abstract: Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning. Recent work proposed advanced prompting techniques and the necessity of fine-tuning with high-quality data to augment LLMs' reasoning abilities. However, these approaches are inherently constrained by data availability and quality. I… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  9. arXiv:2404.09338  [pdf, other

    cs.CL

    Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

    Authors: Souvik Das, Lifeng **, Linfeng Song, Haitao Mi, Baolin Peng, Dong Yu

    Abstract: Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination -- generating content ungrounded in the realities of training data. Recent work has focused on decoding techniques to improve factuality during inference by leveraging LLMs' hierarchical representation of factual knowledge, manipulating the predicted distributions at inference time. Current… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Work in Progress

  10. arXiv:2403.09849  [pdf, other

    cs.CL cs.AI

    Self-Consistency Boosts Calibration for Math Reasoning

    Authors: Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Lifeng **, Haitao Mi, **song Su, Dong Yu

    Abstract: Calibration, which establishes the correlation between accuracy and model confidence, is important for LLM development. We design three off-the-shelf calibration methods based on self-consistency (Wang et al., 2022) for math reasoning tasks. Evaluation on two popular benchmarks (GSM8K and MathQA) using strong open-source LLMs (Mistral and LLaMA2), our methods better bridge model confidence and acc… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  11. arXiv:2403.03496  [pdf, ps, other

    cs.CL

    A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation

    Authors: Xiangci Li, Linfeng Song, Lifeng **, Haitao Mi, Jessica Ouyang, Dong Yu

    Abstract: Knowledge-based, open-domain dialogue generation aims to build chit-chat systems that talk to humans using mined support knowledge. Many types and sources of knowledge have previously been shown to be useful as support knowledge. Even in the era of large language models, response generation grounded in knowledge retrieved from additional up-to-date sources remains a practically important approach.… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  12. arXiv:2402.17982  [pdf, other

    cs.CL

    Collaborative decoding of critical tokens for boosting factuality of large language models

    Authors: Lifeng **, Baolin Peng, Linfeng Song, Haitao Mi, Ye Tian, Dong Yu

    Abstract: The most common training pipeline for large language models includes pretraining, finetuning and aligning phases, with their respective resulting models, such as the pretrained model and the finetuned model. Finetuned and aligned models show improved abilities of instruction following and safe generation, however their abilities to stay factual about the world are impacted by the finetuning proces… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: work in progress

  13. arXiv:2402.15631  [pdf, other

    cs.CL cs.AI

    Fine-Grained Self-Endorsement Improves Factuality and Reasoning

    Authors: Ante Wang, Linfeng Song, Baolin Peng, Ye Tian, Lifeng **, Haitao Mi, **song Su, Dong Yu

    Abstract: This work studies improving large language model (LLM) generations at inference time by mitigating fact-conflicting hallucinations. Particularly, we propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses. Compared with prior ensemble methods (Wang et al., 2022;Chen et al., 2023)) that perform response-level selection, our appro… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  14. arXiv:2402.09267  [pdf, other

    cs.CL cs.AI

    Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

    Authors: Xiaoying Zhang, Baolin Peng, Ye Tian, **gyan Zhou, Lifeng **, Linfeng Song, Haitao Mi, Helen Meng

    Abstract: Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capa… ▽ More

    Submitted 11 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 20 pages

    Journal ref: ACL2024 Main

  15. arXiv:2402.09199  [pdf, other

    cs.CL cs.AI cs.LG

    Ten Words Only Still Help: Improving Black-Box AI-Generated Text Detection via Proxy-Guided Efficient Re-Sampling

    Authors: Yuhui Shi, Qiang Sheng, Juan Cao, Hao Mi, Beizhe Hu, Danding Wang

    Abstract: With the rapidly increasing application of large language models (LLMs), their abuse has caused many undesirable societal problems such as fake news, academic dishonesty, and information pollution. This makes AI-generated text (AIGT) detection of great importance. Among existing methods, white-box methods are generally superior to black-box methods in terms of performance and generalizability, but… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 13 pages, 6 figures, 7 tables

    Journal ref: IJCAI 2024

  16. arXiv:2401.10353  [pdf, other

    cs.CL

    Inconsistent dialogue responses and how to recover from them

    Authors: Mian Zhang, Lifeng **, Linfeng Song, Haitao Mi, Dong Yu

    Abstract: One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem. In this work, we study methods to assess and bolster utterance consistency of chat systems. A dataset is first developed for studying the inconsistencies, where inconsistent dialogue responses, explanations of the inconsistencies, and recover… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted in EACL 2024. Code and dataset available at https://github.com/mianzhang/CIDER

  17. arXiv:2309.16155  [pdf, other

    cs.CL cs.LG

    The Trickle-down Impact of Reward (In-)consistency on RLHF

    Authors: Lingfeng Shen, Sihao Chen, Linfeng Song, Lifeng **, Baolin Peng, Haitao Mi, Daniel Khashabi, Dong Yu

    Abstract: Standard practice within Reinforcement Learning from Human Feedback (RLHF) involves optimizing against a Reward Model (RM), which itself is trained to reflect human preferences for desirable generations. A notable subject that is understudied is the (in-)consistency of RMs -- whether they can recognize the semantic changes to different prompts and appropriately adapt their reward assignments -- an… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  18. arXiv:2309.10202  [pdf, other

    cs.CL cs.AI

    Stabilizing RLHF through Advantage Model and Selective Rehearsal

    Authors: Baolin Peng, Linfeng Song, Ye Tian, Lifeng **, Haitao Mi, Dong Yu

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing, yet aligning these models with human values and preferences using RLHF remains a significant challenge. This challenge is characterized by various instabilities, such as reward hacking and catastrophic forgetting. In this technical report, we propose two innovations to stabilize RLHF training: 1) Advantage Model, which d… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 9 pages, working in progress

  19. arXiv:2303.00865  [pdf, other

    cs.CV

    AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images

    Authors: Ramin Nakhli, Puria Azadi Moghadam, Haoyang Mi, Hossein Farahani, Alexander Baras, Blake Gilks, Ali Bashashati

    Abstract: Processing giga-pixel whole slide histopathology images (WSI) is a computationally expensive task. Multiple instance learning (MIL) has become the conventional approach to process WSIs, in which these images are split into smaller patches for further processing. However, MIL-based techniques ignore explicit information about the individual cells within a patch. In this paper, by defining the novel… ▽ More

    Submitted 5 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR 2023

  20. Search-Engine-augmented Dialogue Response Generation with Cheaply Supervised Query Production

    Authors: Ante Wang, Linfeng Song, Qi Liu, Haitao Mi, Longyue Wang, Zhaopeng Tu, **song Su, Dong Yu

    Abstract: Knowledge-aided dialogue response generation aims at augmenting chatbots with relevant external knowledge in the hope of generating more informative responses. The majority of previous work assumes that the relevant knowledge is given as input or retrieved from a static pool of knowledge. However, this assumption violates the real-world situation, where knowledge is continually updated and a chatb… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: Artificial Intelligence 2023

  21. arXiv:2301.13683  [pdf, other

    cs.CL

    Friend-training: Learning from Models of Different but Related Tasks

    Authors: Mian Zhang, Lifeng **, Linfeng Song, Haitao Mi, Xiabing Zhou, Dong Yu

    Abstract: Current self-training methods such as standard self-training, co-training, tri-training, and others often focus on improving model performance on a single task, utilizing differences in input features, model architectures, and training processes. However, many tasks in natural language processing are about different but related aspects of language, and models trained for one task can be great teac… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

    Comments: Accepted by EACL2023

  22. arXiv:2211.04476  [pdf, other

    cs.CL cs.AI cs.LG

    Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing

    Authors: Wenyue Hua, Lifeng **, Linfeng Song, Haitao Mi, Yongfeng Zhang, Dong Yu

    Abstract: Pretrained natural language processing (NLP) models have achieved high overall performance, but they still make systematic errors. Instead of manual error analysis, research on slice detection models (SDM), which automatically identify underperforming groups of datapoints, has caught escalated attention in Computer Vision for both understanding model behaviors and providing insights for future mod… ▽ More

    Submitted 10 September, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: 15 pages, 5 figures, accepted by Transactions of the Association for Computational Linguistics

  23. arXiv:2210.12309  [pdf, other

    cs.CL cs.CV cs.MM

    Learning a Grammar Inducer from Massive Uncurated Instructional Videos

    Authors: Songyang Zhang, Linfeng Song, Lifeng **, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo

    Abstract: Video-aided grammar induction aims to leverage video information for finding more accurate syntactic grammars for accompanying text. While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence. Such data can be found in abundance online, and the weak cor… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: Accepted by EMNLP 2022

  24. arXiv:2203.04045  [pdf, other

    cs.CL

    Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken Conversations

    Authors: Ruijie Yan, Shuang Peng, Haitao Mi, Liang Jiang, Shihui Yang, Yuchi Zhang, Jiajun Li, Liangrui Peng, Yongliang Wang, Zujie Wen

    Abstract: Building robust and general dialogue models for spoken conversations is challenging due to the gap in distributions of spoken and written data. This paper presents our approach to build generalized models for the Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations Challenge of DSTC-10. In order to mitigate the discrepancies between spoken and written text, we mainly employ e… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

  25. arXiv:2203.00281  [pdf, other

    cs.CL

    Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation

    Authors: Xiang Hu, Haitao Mi, Liang Li, Gerard de Melo

    Abstract: Recently CKY-based models show great potential in unsupervised grammar induction thanks to their human-like encoding paradigm, which runs recursively and hierarchically, but requires $O(n^3)$ time-complexity. Recursive Transformer based on Differentiable Trees (R2D2) makes it possible to scale to large language model pre-training even with complex tree encoder by introducing a heuristic pruning me… ▽ More

    Submitted 2 November, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: EMNLP 2022

  26. arXiv:2112.14430  [pdf, other

    cs.LG cs.CR

    DP-FP: Differentially Private Forward Propagation for Large Models

    Authors: Jian Du, Haitao Mi

    Abstract: When applied to large-scale learning problems, the conventional wisdom on privacy-preserving deep learning, known as Differential Private Stochastic Gradient Descent (DP-SGD), has met with limited success due to significant performance degradation and high memory overhead when compared to the non-privacy counterpart. We show how to mitigate the performance drop by replacing the DP-SGD with a novel… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 12 pages

  27. arXiv:2107.05866  [pdf, other

    cs.CL cs.HC

    A Dialogue-based Information Extraction System for Medical Insurance Assessment

    Authors: Shuang Peng, Mengdi Zhou, Minghui Yang, Haitao Mi, Shaosheng Cao, Zujie Wen, Teng Xu, Hongbin Wang, Lei Liu

    Abstract: In the Chinese medical insurance industry, the assessor's role is essential and requires significant efforts to converse with the claimant. This is a highly professional job that involves many parts, such as identifying personal information, collecting related evidence, and making a final insurance report. Due to the coronavirus (COVID-19) pandemic, the previous offline insurance assessment has to… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

    Comments: To be published in the Findings of ACL 2021

  28. R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling

    Authors: Xiang Hu, Haitao Mi, Zujie Wen, Yafang Wang, Yi Su, **g Zheng, Gerard de Melo

    Abstract: Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. This paper proposes a recursive Transformer model based on differentiable CKY style binary trees to emulate… ▽ More

    Submitted 3 March, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

    Comments: ACL-IJCNLP 2021

  29. arXiv:2105.01511  [pdf

    cs.NI eess.SY

    Radio Communication Scenarios in 5G-Railways

    Authors: Ruisi He, Bo Ai, Zhangdui Zhong, Mi Yang, Chen Huang, Ruifeng Chen, Jianwen Ding, Hang Mi, Zhangfeng Ma, Guiqi Sun, Changzhu Liu

    Abstract: With the rapid development of railways, especially high-speed railways, there is an increasingly urgent demand for new wireless communication system for railways. Taking the mature 5G technology as an opportunity, 5G-railways (5G-R) have been widely regarded as a solution to meet the diversified demands of railway wireless communications. For the design, deployment and improvement of 5G-R networks… ▽ More

    Submitted 6 April, 2021; originally announced May 2021.

    Comments: 7 pages

  30. arXiv:2101.11296  [pdf, other

    cs.LG cs.AI

    FedH2L: Federated Learning with Model and Statistical Heterogeneity

    Authors: Yiying Li, Wei Zhou, Huaimin Wang, Haibo Mi, Timothy M. Hospedales

    Abstract: Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy. Mainstream FL approaches require each participant to share a common network architecture and further assume that data are are sampled IID across participants. However, in real-world deployments participants may require heterogeneous network archite… ▽ More

    Submitted 27 July, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

  31. General audio tagging with ensembling convolutional neural network and statistical features

    Authors: Kele Xu, Boqing Zhu, Qiuqiang Kong, Haibo Mi, Bo Ding, Dezhi Wang, Huaimin Wang

    Abstract: Audio tagging aims to infer descriptive labels from audio clips. Audio tagging is challenging due to the limited size of data and noisy labels. In this paper, we describe our solution for the DCASE 2018 Task 2 general audio tagging challenge. The contributions of our solution include: We investigated a variety of convolutional neural network architectures to solve the audio tagging task. Statistic… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: Submitted to ICASSP

  32. arXiv:1810.06877  [pdf, other

    cs.LG cs.CV stat.ML

    Collaborative Deep Learning Across Multiple Data Centers

    Authors: Kele Xu, Haibo Mi, Dawei Feng, Huaimin Wang, Chuan Chen, Zibin Zheng, Xu Lan

    Abstract: Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practice, however, it is often infeasible to transfer all data to a centralized data center due to not only bandwidth limitation but also the constraints of privacy regulations. Model avera… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.

    Comments: Submitted to AAAI 2019

  33. arXiv:1806.04422  [pdf, other

    cs.CV

    Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network

    Authors: Dawei Feng, Kele Xu, Haibo Mi, Feifan Liao, Yan Zhou

    Abstract: Acoustic scene classification is an intricate problem for a machine. As an emerging field of research, deep Convolutional Neural Networks (CNN) achieve convincing results. In this paper, we explore the use of multi-scale Dense connected convolutional neural network (DenseNet) for the classification task, with the goal to improve the classification performance as multi-scale features can be extract… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

    Comments: Accepted to 2018 Pacific Rim Knowledge Acquisition Workshop (PKAW)

  34. arXiv:1805.07319  [pdf, other

    cs.CV

    Mixup-Based Acoustic Scene Classification Using Multi-Channel Convolutional Neural Network

    Authors: Kele Xu, Dawei Feng, Haibo Mi, Boqing Zhu, Dezhi Wang, Lilun Zhang, Hengxing Cai, Shuwen Liu

    Abstract: Audio scene classification, the problem of predicting class labels of audio scenes, has drawn lots of attention during the last several years. However, it remains challenging and falls short of accuracy and efficiency. Recently, Convolutional Neural Network (CNN)-based methods have achieved better performance with comparison to the traditional methods. Nevertheless, conventional single channel CNN… ▽ More

    Submitted 18 May, 2018; originally announced May 2018.

  35. arXiv:1612.04211  [pdf, other

    cs.CL

    Multi-Perspective Context Matching for Machine Comprehension

    Authors: Zhiguo Wang, Haitao Mi, Wael Hamza, Radu Florian

    Abstract: Previous machine comprehension (MC) datasets are either too small to train end-to-end deep learning models, or not difficult enough to evaluate the ability of current MC techniques. The newly released SQuAD dataset alleviates these limitations, and gives us a chance to develop more realistic MC models. Based on this dataset, we propose a Multi-Perspective Context Matching (MPCM) model, which is an… ▽ More

    Submitted 13 December, 2016; originally announced December 2016.

    Comments: 8

  36. arXiv:1608.02927  [pdf, other

    cs.CL

    Temporal Attention Model for Neural Machine Translation

    Authors: Baskaran Sankaran, Haitao Mi, Yaser Al-Onaizan, Abe Ittycheriah

    Abstract: Attention-based Neural Machine Translation (NMT) models suffer from attention deficiency issues as has been observed in recent research. We propose a novel mechanism to address some of these limitations and improve the NMT attention. Specifically, our approach memorizes the alignments temporally (within each sentence) and modulates the attention with the accumulated temporal memory, as the decoder… ▽ More

    Submitted 9 August, 2016; originally announced August 2016.

    Comments: 8 pages

  37. arXiv:1608.00112  [pdf, other

    cs.CL

    Supervised Attentions for Neural Machine Translation

    Authors: Haitao Mi, Zhiguo Wang, Abe Ittycheriah

    Abstract: In this paper, we improve the attention or alignment accuracy of neural machine translation by utilizing the alignments of training sentence pairs. We simply compute the distance between the machine attentions and the "true" alignments, and minimize this cost in the training procedure. Our experiments on large-scale Chinese-to-English task show that our model improves both translation and alignmen… ▽ More

    Submitted 30 July, 2016; originally announced August 2016.

    Comments: 6 pages. In Proceedings of EMNLP 2016. arXiv admin note: text overlap with arXiv:1605.03148

  38. arXiv:1606.05409  [pdf, ps, other

    cs.CL

    Sense Embedding Learning for Word Sense Induction

    Authors: Linfeng Song, Zhiguo Wang, Haitao Mi, Daniel Gildea

    Abstract: Conventional word sense induction (WSI) methods usually represent each instance with discrete linguistic features or cooccurrence features, and train a model for each polysemous word individually. In this work, we propose to learn sense embeddings for the WSI task. In the training stage, our method induces several sense centroids (embedding) for each polysemous word. In the testing stage, our meth… ▽ More

    Submitted 22 June, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

    Comments: 6 pages, no figures in *SEM 2016

  39. arXiv:1605.03209  [pdf, other

    cs.CL

    Vocabulary Manipulation for Neural Machine Translation

    Authors: Haitao Mi, Zhiguo Wang, Abe Ittycheriah

    Abstract: In order to capture rich language phenomena, neural machine translation models have to use a large vocabulary size, which requires high computing time and large memory usage. In this paper, we alleviate this issue by introducing a sentence-level or batch-level vocabulary, which is only a very small sub-set of the full output vocabulary. For each sentence or batch, we only predict the target words… ▽ More

    Submitted 10 May, 2016; originally announced May 2016.

    Comments: 6 pages

  40. arXiv:1605.03148  [pdf, other

    cs.CL

    Coverage Embedding Models for Neural Machine Translation

    Authors: Haitao Mi, Baskaran Sankaran, Zhiguo Wang, Abe Ittycheriah

    Abstract: In this paper, we enhance the attention-based neural machine translation (NMT) by adding explicit coverage embedding models to alleviate issues of repeating and drop** translations in NMT. For each source word, our model starts with a full coverage embedding vector to track the coverage status, and then keeps updating it with neural networks as the translation goes. Experiments on the large-scal… ▽ More

    Submitted 29 August, 2016; v1 submitted 10 May, 2016; originally announced May 2016.

    Comments: 6 pages; In Proceddings of EMNLP 2016

  41. arXiv:1602.07019  [pdf, other

    cs.CL

    Sentence Similarity Learning by Lexical Decomposition and Composition

    Authors: Zhiguo Wang, Haitao Mi, Abraham Ittycheriah

    Abstract: Most conventional sentence similarity methods only focus on similar parts of two input sentences, and simply ignore the dissimilar parts, which usually give us some clues and semantic meanings about the sentences. In this work, we propose a model to take into account both the similarities and dissimilarities by decomposing and composing lexical semantics over sentences. The model represents each w… ▽ More

    Submitted 14 July, 2017; v1 submitted 22 February, 2016; originally announced February 2016.

    Comments: In Proceedings of Coling 2016

  42. arXiv:1602.06797  [pdf, other

    cs.CL

    Semi-supervised Clustering for Short Text via Deep Representation Learning

    Authors: Zhiguo Wang, Haitao Mi, Abraham Ittycheriah

    Abstract: In this work, we propose a semi-supervised method for short text clustering, where we represent texts as distributed vectors with neural networks, and use a small amount of labeled data to specify our intention for clustering. We design a novel objective to combine the representation learning process and the k-means clustering process together, and optimize the objective with both labeled data and… ▽ More

    Submitted 14 July, 2017; v1 submitted 22 February, 2016; originally announced February 2016.

    Comments: In Proceedings of CoNLL 2016