Search | arXiv e-print repository

arXiv:2406.12023 [pdf, other]

LiLiuM: eBay's Large Language Models for e-commerce

Authors: Christian Herold, Michael Kozielski, Leonid Ekimov, Pavel Petrushkov, Pierre-Yves Vandenbussche, Shahram Khadivi

Abstract: We introduce the LiLiuM series of large language models (LLMs): 1B, 7B, and 13B parameter models developed 100% in-house to fit eBay's specific needs in the e-commerce domain. This gives eBay full control over all aspects of the models including license, data, vocabulary, and architecture. We expect these models to be used as a foundation for fine-tuning and instruction-tuning, eliminating depende… ▽ More We introduce the LiLiuM series of large language models (LLMs): 1B, 7B, and 13B parameter models developed 100% in-house to fit eBay's specific needs in the e-commerce domain. This gives eBay full control over all aspects of the models including license, data, vocabulary, and architecture. We expect these models to be used as a foundation for fine-tuning and instruction-tuning, eliminating dependencies to external models. The LiLiuM LLMs have been trained on 3 trillion tokens of multilingual text from general and e-commerce domain. They perform similar to the popular LLaMA-2 models on English natural language understanding (NLU) benchmarks. At the same time, we outperform LLaMA-2 on non-English NLU tasks, machine translation and on e-commerce specific downstream tasks. As part of our data mixture, we utilize the newly released RedPajama-V2 dataset for training and share our insights regarding data filtering and deduplication. We also discuss in detail how to serialize structured data for use in autoregressive language modeling. We provide insights on the effects of including code and parallel machine translation data in pre-training. Furthermore, we develop our own tokenizer and model vocabulary, customized towards e-commerce. This way, we can achieve up to 34% speed-up in text generation on eBay-specific downstream tasks compared to LLaMA-2. Finally, in relation to LLM pretraining, we show that checkpoint averaging can further improve over the best individual model checkpoint. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2402.05147 [pdf, other]

ApiQ: Finetuning of 2-Bit Quantized Large Language Model

Authors: Baohao Liao, Christian Herold, Shahram Khadivi, Christof Monz

Abstract: Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the effectiveness of these methods compared to full finetuning. Despite the advancements, current strategies for memory-efficient finetuning, such as QLoRA, exhibit inconsistent performance across di… ▽ More Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the effectiveness of these methods compared to full finetuning. Despite the advancements, current strategies for memory-efficient finetuning, such as QLoRA, exhibit inconsistent performance across diverse bit-width quantizations and multifaceted tasks. This inconsistency largely stems from the detrimental impact of the quantization process on preserved knowledge, leading to catastrophic forgetting and undermining the utilization of pretrained models for finetuning purposes. In this work, we introduce a novel quantization framework, ApiQ, designed to restore the lost information from quantization by concurrently initializing the LoRA components and quantizing the weights of LLMs. This approach ensures the maintenance of the original LLM's activation precision while mitigating the error propagation from shallower into deeper layers. Through comprehensive evaluations conducted on a spectrum of language tasks with various LLMs, ApiQ demonstrably minimizes activation error during quantization. Consequently, it consistently achieves superior finetuning results across various bit-widths. △ Less

Submitted 21 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: more benchmarks and new method, block-wise ApiQ. code: https://github.com/BaohaoLiao/ApiQ

arXiv:2311.02084 [pdf, other]

ITEm: Unsupervised Image-Text Embedding Learning for eCommerce

Authors: Baohao Liao, Michael Kozielski, Sanjika Hewavitharana, Jiangbo Yuan, Shahram Khadivi, Tomer Lancewicki

Abstract: Product embedding serves as a cornerstone for a wide range of applications in eCommerce. The product embedding learned from multiple modalities shows significant improvement over that from a single modality, since different modalities provide complementary information. However, some modalities are more informatively dominant than others. How to teach a model to learn embedding from different modal… ▽ More Product embedding serves as a cornerstone for a wide range of applications in eCommerce. The product embedding learned from multiple modalities shows significant improvement over that from a single modality, since different modalities provide complementary information. However, some modalities are more informatively dominant than others. How to teach a model to learn embedding from different modalities without neglecting information from the less dominant modality is challenging. We present an image-text embedding model (ITEm), an unsupervised learning method that is designed to better attend to image and text modalities. We extend BERT by (1) learning an embedding from text and image without knowing the regions of interest; (2) training a global representation to predict masked words and to construct masked image patches without their individual representations. We evaluate the pre-trained ITEm on two tasks: the search for extremely similar products and the prediction of product categories, showing substantial gains compared to strong baseline models. △ Less

Submitted 26 February, 2024; v1 submitted 22 October, 2023; originally announced November 2023.

arXiv:2310.12303 [pdf, other]

Document-Level Language Models for Machine Translation

Authors: Frithjof Petrick, Christian Herold, Pavel Petrushkov, Shahram Khadivi, Hermann Ney

Abstract: Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining… ▽ More Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining any existing sentence-level translation model with a document-level language model. We improve existing approaches by leveraging recent advancements in model combination. Additionally, we propose novel weighting techniques that make the system combination more flexible and significantly reduce computational overhead. In a comprehensive evaluation on four diverse translation tasks, we show that our extensions improve document-targeted scores substantially and are also computationally more efficient. However, we also find that in most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system. Finally, we explore language model fusion in the light of recent advancements in large language models. Our findings suggest that there might be strong potential in utilizing large language models via model combination. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: accepted at WMT 2023

arXiv:2305.03923 [pdf, other]

Active Continual Learning: On Balancing Knowledge Retention and Learnability

Authors: Thuy-Trang Vu, Shahram Khadivi, Mahsa Ghorbanali, Dinh Phung, Gholamreza Haffari

Abstract: Acquiring new knowledge without forgetting what has been learned in a sequence of tasks is the central focus of continual learning (CL). While tasks arrive sequentially, the training data are often prepared and annotated independently, leading to the CL of incoming supervised learning tasks. This paper considers the under-explored problem of active continual learning (ACL) for a sequence of active… ▽ More Acquiring new knowledge without forgetting what has been learned in a sequence of tasks is the central focus of continual learning (CL). While tasks arrive sequentially, the training data are often prepared and annotated independently, leading to the CL of incoming supervised learning tasks. This paper considers the under-explored problem of active continual learning (ACL) for a sequence of active learning (AL) tasks, where each incoming task includes a pool of unlabelled data and an annotation budget. We investigate the effectiveness and interplay between several AL and CL algorithms in the domain, class and task-incremental scenarios. Our experiments reveal the trade-off between two contrasting goals of not forgetting the old knowledge and the ability to quickly learn new knowledge in CL and AL, respectively. While conditioning the AL query strategy on the annotations collected for the previous tasks leads to improved task performance on the domain and task incremental learning, our proposed forgetting-learning profile suggests a gap in balancing the effect of AL and CL for the class-incremental scenario. △ Less

Submitted 30 January, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

arXiv:2210.11628 [pdf, other]

Can Domains Be Transferred Across Languages in Multi-Domain Multilingual Neural Machine Translation?

Authors: Thuy-Trang Vu, Shahram Khadivi, Xuanli He, Dinh Phung, Gholamreza Haffari

Abstract: Previous works mostly focus on either multilingual or multi-domain aspects of neural machine translation (NMT). This paper investigates whether the domain information can be transferred across languages on the composition of multi-domain and multilingual NMT, particularly for the incomplete data condition where in-domain bitext is missing for some language pairs. Our results in the curated leave-o… ▽ More Previous works mostly focus on either multilingual or multi-domain aspects of neural machine translation (NMT). This paper investigates whether the domain information can be transferred across languages on the composition of multi-domain and multilingual NMT, particularly for the incomplete data condition where in-domain bitext is missing for some language pairs. Our results in the curated leave-one-domain-out experiments show that multi-domain multilingual (MDML) NMT can boost zero-shot translation performance up to +10 gains on BLEU, as well as aid the generalisation of multi-domain NMT to the missing domain. We also explore strategies for effective integration of multilingual and multi-domain NMT, including language and domain tag combination and auxiliary task training. We find that learning domain-aware representations and adding target-language tags to the encoder leads to effective MDML-NMT. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: WMT2022

arXiv:2203.13151 [pdf, other]

Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking

Authors: Iñigo Urteaga, Moulay-Zaïdane Draïdia, Tomer Lancewicki, Shahram Khadivi

Abstract: We design and evaluate a Bayesian optimization framework for resource efficient pre-training of Transformer-based language models (TLMs). TLM pre-training requires high computational resources and introduces many unresolved design choices, such as selecting its pre-training hyperparameters. We propose a multi-armed bandit framework for the sequential selection of TLM pre-training hyperparameters,… ▽ More We design and evaluate a Bayesian optimization framework for resource efficient pre-training of Transformer-based language models (TLMs). TLM pre-training requires high computational resources and introduces many unresolved design choices, such as selecting its pre-training hyperparameters. We propose a multi-armed bandit framework for the sequential selection of TLM pre-training hyperparameters, aimed at optimizing language model performance, in a resource efficient manner. We design a Thompson sampling algorithm, with a surrogate Gaussian process reward model of the Masked Language Model (MLM) pre-training objective, for its sequential minimization. Instead of MLM pre-training with fixed masking probabilities, the proposed Gaussian process-based Thompson sampling (GP-TS) accelerates pre-training by sequentially selecting masking hyperparameters that improve performance. We empirically demonstrate how GP-TS pre-trains language models efficiently, i.e., it achieves lower MLM loss in fewer epochs, across a variety of settings. In addition, GP-TS pre-trained TLMs attain competitive downstream performance, while avoiding expensive hyperparameter grid search. GP-TS provides an interactive framework for efficient and optimized TLM pre-training that, by circumventing costly hyperparameter selection, enables substantial computational savings. △ Less

Submitted 30 May, 2023; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: Work accepted for publication at ACL Findings 2023. The code used for this study is publicly available at https://github.com/iurteaga/gp_ts_nlp

arXiv:2112.13960 [pdf, ps, other]

A Preordered RNN Layer Boosts Neural Machine Translation in Low Resource Settings

Authors: Mohaddeseh Bastan, Shahram Khadivi

Abstract: Neural Machine Translation (NMT) models are strong enough to convey semantic and syntactic information from the source language to the target language. However, these models are suffering from the need for a large amount of data to learn the parameters. As a result, for languages with scarce data, these models are at risk of underperforming. We propose to augment attention based neural network wit… ▽ More Neural Machine Translation (NMT) models are strong enough to convey semantic and syntactic information from the source language to the target language. However, these models are suffering from the need for a large amount of data to learn the parameters. As a result, for languages with scarce data, these models are at risk of underperforming. We propose to augment attention based neural network with reordering information to alleviate the lack of data. This augmentation improves the translation quality for both English to Persian and Persian to English by up to 6% BLEU absolute over the baseline models. △ Less

Submitted 27 December, 2021; originally announced December 2021.

arXiv:2109.13097 [pdf, other]

Towards Reinforcement Learning for Pivot-based Neural Machine Translation with Non-autoregressive Transformer

Authors: Evgeniia Tokarchuk, Jan Rosendahl, Weiyue Wang, Pavel Petrushkov, Tomer Lancewicki, Shahram Khadivi, Hermann Ney

Abstract: Pivot-based neural machine translation (NMT) is commonly used in low-resource setups, especially for translation between non-English language pairs. It benefits from using high resource source-pivot and pivot-target language pairs and an individual system is trained for both sub-tasks. However, these models have no connection during training, and the source-pivot model is not optimized to produce… ▽ More Pivot-based neural machine translation (NMT) is commonly used in low-resource setups, especially for translation between non-English language pairs. It benefits from using high resource source-pivot and pivot-target language pairs and an individual system is trained for both sub-tasks. However, these models have no connection during training, and the source-pivot model is not optimized to produce the best translation for the source-target task. In this work, we propose to train a pivot-based NMT system with the reinforcement learning (RL) approach, which has been investigated for various text generation tasks, including machine translation (MT). We utilize a non-autoregressive transformer and present an end-to-end pivot-based integrated model, enabling training on source-target data. △ Less

Submitted 27 September, 2021; originally announced September 2021.

Comments: RL4RealLife Workshop 2021 camera-ready

arXiv:2109.12950 [pdf, other]

doi 10.18653/v1/2021.iwslt-1.32

Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer

Authors: Evgeniia Tokarchuk, Jan Rosendahl, Weiyue Wang, Pavel Petrushkov, Tomer Lancewicki, Shahram Khadivi, Hermann Ney

Abstract: Complex natural language applications such as speech translation or pivot translation traditionally rely on cascaded models. However, cascaded models are known to be prone to error propagation and model discrepancy problems. Furthermore, there is no possibility of using end-to-end training data in conventional cascaded systems, meaning that the training data most suited for the task cannot be used… ▽ More Complex natural language applications such as speech translation or pivot translation traditionally rely on cascaded models. However, cascaded models are known to be prone to error propagation and model discrepancy problems. Furthermore, there is no possibility of using end-to-end training data in conventional cascaded systems, meaning that the training data most suited for the task cannot be used. Previous studies suggested several approaches for integrated end-to-end training to overcome those problems, however they mostly rely on (synthetic or natural) three-way data. We propose a cascaded model based on the non-autoregressive Transformer that enables end-to-end training without the need for an explicit intermediate representation. This new architecture (i) avoids unnecessary early decisions that can cause errors which are then propagated throughout the cascaded models and (ii) utilizes the end-to-end training data directly. We conduct an evaluation on two pivot-based machine translation tasks, namely French-German and German-Czech. Our experimental results show that the proposed architecture yields an improvement of more than 2 BLEU for French-German over the cascaded baseline. △ Less

Submitted 27 September, 2021; originally announced September 2021.

Comments: IWSLT 2021 camera-ready

arXiv:2109.08712 [pdf, other]

Back-translation for Large-Scale Multilingual Machine Translation

Authors: Baohao Liao, Shahram Khadivi, Sanjika Hewavitharana

Abstract: This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). This work aims to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation… ▽ More This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). This work aims to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Better performance is obtained by the constrained sampling method, which is different from the finding of the bilingual translation. Besides, we also explore the effect of vocabularies and the amount of synthetic data. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement. We submitted to both the small tasks and achieved the second place. △ Less

Submitted 17 September, 2021; originally announced September 2021.

arXiv:2010.09482 [pdf, other]

Diving Deep into Context-Aware Neural Machine Translation

Authors: **g**g Huo, Christian Herold, Yingbo Gao, Leonard Dahlmann, Shahram Khadivi, Hermann Ney

Abstract: Context-aware neural machine translation (NMT) is a promising direction to improve the translation quality by making use of the additional context, e.g., document-level translation, or having meta-information. Although there exist various architectures and analyses, the effectiveness of different context-aware NMT models is not well explored yet. This paper analyzes the performance of document-lev… ▽ More Context-aware neural machine translation (NMT) is a promising direction to improve the translation quality by making use of the additional context, e.g., document-level translation, or having meta-information. Although there exist various architectures and analyses, the effectiveness of different context-aware NMT models is not well explored yet. This paper analyzes the performance of document-level NMT models on four diverse domains with a varied amount of parallel document-level bilingual data. We conduct a comprehensive set of experiments to investigate the impact of document-level NMT. We find that there is no single best approach to document-level NMT, but rather that different architectures come out on top on different tasks. Looking at task-specific problems, such as pronoun resolution or headline translation, we find improvements in the context-aware systems, even in cases where the corpus-level metrics like BLEU show no significant improvement. We also show that document-level back-translation significantly helps to compensate for the lack of document-level bi-texts. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: Accepted at 5th Conference on Machine Translation (WMT20)

arXiv:1909.09524 [pdf, other]

Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages

Authors: Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, Hermann Ney

Abstract: We present effective pre-training strategies for neural machine translation (NMT) using parallel corpora involving a pivot language, i.e., source-pivot and pivot-target, leading to a significant improvement in source-target translation. We propose three methods to increase the relation among source, pivot, and target languages in the pre-training: 1) step-wise training of a single model for differ… ▽ More We present effective pre-training strategies for neural machine translation (NMT) using parallel corpora involving a pivot language, i.e., source-pivot and pivot-target, leading to a significant improvement in source-target translation. We propose three methods to increase the relation among source, pivot, and target languages in the pre-training: 1) step-wise training of a single model for different language pairs, 2) additional adapter component to smoothly connect pre-trained encoder and decoder, and 3) cross-lingual encoder training via autoencoding of the pivot language. Our methods greatly outperform multilingual models up to +2.6% BLEU in WMT 2019 French-German and German-Czech tasks. We show that our improvements are valid also in zero-shot/zero-resource scenarios. △ Less

Submitted 20 September, 2019; originally announced September 2019.

Comments: EMNLP 2019 camera-ready

arXiv:1906.07286 [pdf, other]

Generalizing Back-Translation in Neural Machine Translation

Authors: Miguel Graça, Yunsu Kim, Julian Schamper, Shahram Khadivi, Hermann Ney

Abstract: Back-translation - data augmentation by translating target monolingual data - is a crucial component in modern neural machine translation (NMT). In this work, we reformulate back-translation in the scope of cross-entropy optimization of an NMT model, clarifying its underlying mathematical assumptions and approximations beyond its heuristic usage. Our formulation covers broader synthetic data gener… ▽ More Back-translation - data augmentation by translating target monolingual data - is a crucial component in modern neural machine translation (NMT). In this work, we reformulate back-translation in the scope of cross-entropy optimization of an NMT model, clarifying its underlying mathematical assumptions and approximations beyond its heuristic usage. Our formulation covers broader synthetic data generation schemes, including sampling from a target-to-source NMT model. With this formulation, we point out fundamental problems of the sampling-based approaches and propose to remedy them by (i) disabling label smoothing for the target-to-source model and (ii) sampling from a restricted search space. Our statements are investigated on the WMT 2018 German - English news translation task. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Comments: 4th Conference on Machine Translation (WMT 2019) camera-ready

arXiv:1906.03129 [pdf, other]

Word-based Domain Adaptation for Neural Machine Translation

Authors: Shen Yan, Leonard Dahlmann, Pavel Petrushkov, Sanjika Hewavitharana, Shahram Khadivi

Abstract: In this paper, we empirically investigate applying word-level weights to adapt neural machine translation to e-commerce domains, where small e-commerce datasets and large out-of-domain datasets are available. In order to mine in-domain like words in the out-of-domain datasets, we compute word weights by using a domain-specific and a non-domain-specific language model followed by smoothing and bina… ▽ More In this paper, we empirically investigate applying word-level weights to adapt neural machine translation to e-commerce domains, where small e-commerce datasets and large out-of-domain datasets are available. In order to mine in-domain like words in the out-of-domain datasets, we compute word weights by using a domain-specific and a non-domain-specific language model followed by smoothing and binary quantization. The baseline model is trained on mixed in-domain and out-of-domain datasets. Experimental results on English to Chinese e-commerce domain translation show that compared to continuing training without word weights, it improves MT quality by up to 2.11% BLEU absolute and 1.59% TER. We have also trained models using fine-tuning on the in-domain data. Pre-training a model with word weights improves fine-tuning up to 1.24% BLEU absolute and 1.64% TER, respectively. △ Less

Submitted 7 June, 2019; originally announced June 2019.

Comments: Published on the proceedings of the International Workshop on Spoken Language Translation (IWSLT), 2018

Journal ref: Proceedings of the 15th International Workshop on Spoken Language Translation, Bruges, Belgium, October 29-30, 2018

arXiv:1906.01942 [pdf, other]

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

Authors: Yunsu Kim, Hendrik Rosendahl, Nick Rossenbach, Jan Rosendahl, Shahram Khadivi, Hermann Ney

Abstract: We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on to… ▽ More We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: ACL 2019 Repl4NLP camera-ready

arXiv:1806.07169 [pdf, ps, other]

Learning from Chunk-based Feedback in Neural Machine Translation

Authors: Pavel Petrushkov, Shahram Khadivi, Evgeny Matusov

Abstract: We empirically investigate learning from partial feedback in neural machine translation (NMT), when partial feedback is collected by asking users to highlight a correct chunk of a translation. We propose a simple and effective way of utilizing such feedback in NMT training. We demonstrate how the common machine translation problem of domain mismatch between training and deployment can be reduced s… ▽ More We empirically investigate learning from partial feedback in neural machine translation (NMT), when partial feedback is collected by asking users to highlight a correct chunk of a translation. We propose a simple and effective way of utilizing such feedback in NMT training. We demonstrate how the common machine translation problem of domain mismatch between training and deployment can be reduced solely based on chunk-level user feedback. We conduct a series of simulation experiments to test the effectiveness of the proposed method. Our results show that chunk-level feedback outperforms sentence based feedback by up to 2.61% BLEU absolute. △ Less

Submitted 19 June, 2018; originally announced June 2018.

Comments: the paper accepted in ACL 2018 Conference, Melbourne, Australia

arXiv:1804.05958 [pdf, other]

Can Neural Machine Translation be Improved with User Feedback?

Authors: Julia Kreutzer, Shahram Khadivi, Evgeny Matusov, Stefan Riezler

Abstract: We present the first real-world application of methods for improving neural machine translation (NMT) with human reinforcement, based on explicit and implicit user feedback collected on the eBay e-commerce platform. Previous work has been confined to simulation experiments, whereas in this paper we work with real logged feedback for offline bandit learning of NMT parameters. We conduct a thorough… ▽ More We present the first real-world application of methods for improving neural machine translation (NMT) with human reinforcement, based on explicit and implicit user feedback collected on the eBay e-commerce platform. Previous work has been confined to simulation experiments, whereas in this paper we work with real logged feedback for offline bandit learning of NMT parameters. We conduct a thorough analysis of the available explicit user judgments---five-star ratings of translation quality---and show that they are not reliable enough to yield significant improvements in bandit learning. In contrast, we successfully utilize implicit task-based feedback collected in a cross-lingual search task to improve task-specific and machine translation quality metrics. △ Less

Submitted 16 April, 2018; originally announced April 2018.

Comments: Accepted at NAACL-HLT 2018 (Industry Track)

arXiv:1711.02053 [pdf, other]

doi 10.1007/978-3-319-74412-4_7

Detecting Community Structure in Dynamic Social Networks Using the Concept of Leadership

Authors: Saeed Haji Seyed Javadi, Pedram Gharani, Shahram Khadivi

Abstract: Detecting community structure in social networks is a fundamental problem empowering us to identify groups of actors with similar interests. There have been extensive works focusing on finding communities in static networks, however, in reality, due to dynamic nature of social networks, they are evolving continuously. Ignoring the dynamic aspect of social networks, neither allows us to capture evo… ▽ More Detecting community structure in social networks is a fundamental problem empowering us to identify groups of actors with similar interests. There have been extensive works focusing on finding communities in static networks, however, in reality, due to dynamic nature of social networks, they are evolving continuously. Ignoring the dynamic aspect of social networks, neither allows us to capture evolutionary behavior of the network nor to predict the future status of individuals. Aside from being dynamic, another significant characteristic of real-world social networks is the presence of leaders, i.e. nodes with high degree centrality having a high attraction to absorb other members and hence to form a local community. In this paper, we devised an efficient method to incrementally detect communities in highly dynamic social networks using the intuitive idea of importance and persistence of community leaders over time. Our proposed method is able to find new communities based on the previous structure of the network without recomputing them from scratch. This unique feature, enables us to efficiently detect and track communities over time rapidly. Experimental results on the synthetic and real-world social networks demonstrate that our method is both effective and efficient in discovering communities in dynamic social networks. △ Less

Submitted 6 November, 2017; originally announced November 2017.

arXiv:1708.03271 [pdf, other]

Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search

Authors: Leonard Dahlmann, Evgeny Matusov, Pavel Petrushkov, Shahram Khadivi

Abstract: In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level translati… ▽ More In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level translation probabilities and a target language model. Experimental results on German->English news domain and English->Russian e-commerce domain translation tasks show that using phrase-based models in NMT search improves MT quality by up to 2.3% BLEU absolute as compared to a strong NMT baseline. △ Less

Submitted 10 August, 2017; originally announced August 2017.

Comments: To appear in Proceedings of EMNLP 2017

arXiv:1708.03186 [pdf, other]

Neural and Statistical Methods for Leveraging Meta-information in Machine Translation

Authors: Shahram Khadivi, Patrick Wilken, Leonard Dahlmann, Evgeny Matusov

Abstract: In this paper, we discuss different methods which use meta information and richer context that may accompany source language input to improve machine translation quality. We focus on category information of input text as meta information, but the proposed methods can be extended to all textual and non-textual meta information that might be available for the input text or automatically predicted us… ▽ More In this paper, we discuss different methods which use meta information and richer context that may accompany source language input to improve machine translation quality. We focus on category information of input text as meta information, but the proposed methods can be extended to all textual and non-textual meta information that might be available for the input text or automatically predicted using the text content. The main novelty of this work is to use state-of-the-art neural network methods to tackle this problem within a statistical machine translation (SMT) framework. We observe translation quality improvements up to 3% in terms of BLEU score in some text categories. △ Less

Submitted 10 August, 2017; originally announced August 2017.

Comments: To appear in MT Summit 2017

arXiv:1701.08533 [pdf, ps, other]

Graph-Based Semi-Supervised Conditional Random Fields For Spoken Language Understanding Using Unaligned Data

Authors: Mohammad Aliannejadi, Masoud Kiaeeha, Shahram Khadivi, Saeed Shiry Ghidary

Abstract: We experiment graph-based Semi-Supervised Learning (SSL) of Conditional Random Fields (CRF) for the application of Spoken Language Understanding (SLU) on unaligned data. The aligned labels for examples are obtained using IBM Model. We adapt a baseline semi-supervised CRF by defining new feature set and altering the label propagation algorithm. Our results demonstrate that our proposed approach sig… ▽ More We experiment graph-based Semi-Supervised Learning (SSL) of Conditional Random Fields (CRF) for the application of Spoken Language Understanding (SLU) on unaligned data. The aligned labels for examples are obtained using IBM Model. We adapt a baseline semi-supervised CRF by defining new feature set and altering the label propagation algorithm. Our results demonstrate that our proposed approach significantly improves the performance of the supervised model by utilizing the knowledge gained from the graph. △ Less

Submitted 30 January, 2017; originally announced January 2017.

Comments: Workshop of The Australasian Language Technology Association

arXiv:1701.01854 [pdf]

Neural Machine Translation on Scarce-Resource Condition: A case-study on Persian-English

Authors: Mohaddeseh Bastan, Shahram Khadivi, Mohammad Mehdi Homayounpour

Abstract: Neural Machine Translation (NMT) is a new approach for Machine Translation (MT), and due to its success, it has absorbed the attention of many researchers in the field. In this paper, we study NMT model on Persian-English language pairs, to analyze the model and investigate the appropriateness of the model for scarce-resourced scenarios, the situation that exists for Persian-centered translation s… ▽ More Neural Machine Translation (NMT) is a new approach for Machine Translation (MT), and due to its success, it has absorbed the attention of many researchers in the field. In this paper, we study NMT model on Persian-English language pairs, to analyze the model and investigate the appropriateness of the model for scarce-resourced scenarios, the situation that exists for Persian-centered translation systems. We adjust the model for the Persian language and find the best parameters and hyper parameters for two tasks: translation and transliteration. We also apply some preprocessing task on the Persian dataset which yields to increase for about one point in terms of BLEU score. Also, we have modified the loss function to enhance the word alignment of the model. This new loss function yields a total of 1.87 point improvements in terms of BLEU score in the translation quality. △ Less

Submitted 7 January, 2017; originally announced January 2017.

Comments: 6 pages, Submitted in ICEE 2017

arXiv:1607.01628 [pdf, other]

Guided Alignment Training for Topic-Aware Neural Machine Translation

Authors: Wenhu Chen, Evgeny Matusov, Shahram Khadivi, Jan-Thorsten Peter

Abstract: In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by… ▽ More In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input texts such as topic or category information can significantly improve translation quality when used as an additional signal to the decoder part of the network. With both novel features, the BLEU score of the NMT system on a product title set improves from 18.6 to 21.3%. Even larger MT quality gains are obtained through domain adaptation of a general domain NMT system to e-commerce data. The developed NMT system also performs well on the IWSLT speech translation task, where an ensemble of four variant systems outperforms the phrase-based baseline by 2.1% BLEU absolute. △ Less

Submitted 6 July, 2016; originally announced July 2016.

Comments: 11 pages

Showing 1–24 of 24 results for author: Khadivi, S