Skip to main content

Showing 1–24 of 24 results for author: Khadivi, S

.
  1. arXiv:2406.12023  [pdf, other

    cs.CL cs.LG

    LiLiuM: eBay's Large Language Models for e-commerce

    Authors: Christian Herold, Michael Kozielski, Leonid Ekimov, Pavel Petrushkov, Pierre-Yves Vandenbussche, Shahram Khadivi

    Abstract: We introduce the LiLiuM series of large language models (LLMs): 1B, 7B, and 13B parameter models developed 100% in-house to fit eBay's specific needs in the e-commerce domain. This gives eBay full control over all aspects of the models including license, data, vocabulary, and architecture. We expect these models to be used as a foundation for fine-tuning and instruction-tuning, eliminating depende… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2402.05147  [pdf, other

    cs.LG cs.CL

    ApiQ: Finetuning of 2-Bit Quantized Large Language Model

    Authors: Baohao Liao, Christian Herold, Shahram Khadivi, Christof Monz

    Abstract: Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the effectiveness of these methods compared to full finetuning. Despite the advancements, current strategies for memory-efficient finetuning, such as QLoRA, exhibit inconsistent performance across di… ▽ More

    Submitted 21 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: more benchmarks and new method, block-wise ApiQ. code: https://github.com/BaohaoLiao/ApiQ

  3. arXiv:2311.02084  [pdf, other

    cs.CV cs.CL cs.IR

    ITEm: Unsupervised Image-Text Embedding Learning for eCommerce

    Authors: Baohao Liao, Michael Kozielski, Sanjika Hewavitharana, Jiangbo Yuan, Shahram Khadivi, Tomer Lancewicki

    Abstract: Product embedding serves as a cornerstone for a wide range of applications in eCommerce. The product embedding learned from multiple modalities shows significant improvement over that from a single modality, since different modalities provide complementary information. However, some modalities are more informatively dominant than others. How to teach a model to learn embedding from different modal… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 October, 2023; originally announced November 2023.

  4. arXiv:2310.12303  [pdf, other

    cs.CL cs.AI cs.LG

    Document-Level Language Models for Machine Translation

    Authors: Frithjof Petrick, Christian Herold, Pavel Petrushkov, Shahram Khadivi, Hermann Ney

    Abstract: Despite the known limitations, most machine translation systems today still operate on the sentence-level. One reason for this is, that most parallel training data is only sentence-level aligned, without document-level meta information available. In this work, we set out to build context-aware translation systems utilizing document-level monolingual data instead. This can be achieved by combining… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: accepted at WMT 2023

  5. arXiv:2305.03923  [pdf, other

    cs.LG cs.CL

    Active Continual Learning: On Balancing Knowledge Retention and Learnability

    Authors: Thuy-Trang Vu, Shahram Khadivi, Mahsa Ghorbanali, Dinh Phung, Gholamreza Haffari

    Abstract: Acquiring new knowledge without forgetting what has been learned in a sequence of tasks is the central focus of continual learning (CL). While tasks arrive sequentially, the training data are often prepared and annotated independently, leading to the CL of incoming supervised learning tasks. This paper considers the under-explored problem of active continual learning (ACL) for a sequence of active… ▽ More

    Submitted 30 January, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

  6. arXiv:2210.11628  [pdf, other

    cs.CL

    Can Domains Be Transferred Across Languages in Multi-Domain Multilingual Neural Machine Translation?

    Authors: Thuy-Trang Vu, Shahram Khadivi, Xuanli He, Dinh Phung, Gholamreza Haffari

    Abstract: Previous works mostly focus on either multilingual or multi-domain aspects of neural machine translation (NMT). This paper investigates whether the domain information can be transferred across languages on the composition of multi-domain and multilingual NMT, particularly for the incomplete data condition where in-domain bitext is missing for some language pairs. Our results in the curated leave-o… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: WMT2022

  7. arXiv:2203.13151  [pdf, other

    cs.CL cs.LG stat.ML

    Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking

    Authors: Iñigo Urteaga, Moulay-Zaïdane Draïdia, Tomer Lancewicki, Shahram Khadivi

    Abstract: We design and evaluate a Bayesian optimization framework for resource efficient pre-training of Transformer-based language models (TLMs). TLM pre-training requires high computational resources and introduces many unresolved design choices, such as selecting its pre-training hyperparameters. We propose a multi-armed bandit framework for the sequential selection of TLM pre-training hyperparameters,… ▽ More

    Submitted 30 May, 2023; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Work accepted for publication at ACL Findings 2023. The code used for this study is publicly available at https://github.com/iurteaga/gp_ts_nlp

  8. arXiv:2112.13960  [pdf, ps, other

    cs.CL

    A Preordered RNN Layer Boosts Neural Machine Translation in Low Resource Settings

    Authors: Mohaddeseh Bastan, Shahram Khadivi

    Abstract: Neural Machine Translation (NMT) models are strong enough to convey semantic and syntactic information from the source language to the target language. However, these models are suffering from the need for a large amount of data to learn the parameters. As a result, for languages with scarce data, these models are at risk of underperforming. We propose to augment attention based neural network wit… ▽ More

    Submitted 27 December, 2021; originally announced December 2021.

  9. arXiv:2109.13097  [pdf, other

    cs.CL

    Towards Reinforcement Learning for Pivot-based Neural Machine Translation with Non-autoregressive Transformer

    Authors: Evgeniia Tokarchuk, Jan Rosendahl, Weiyue Wang, Pavel Petrushkov, Tomer Lancewicki, Shahram Khadivi, Hermann Ney

    Abstract: Pivot-based neural machine translation (NMT) is commonly used in low-resource setups, especially for translation between non-English language pairs. It benefits from using high resource source-pivot and pivot-target language pairs and an individual system is trained for both sub-tasks. However, these models have no connection during training, and the source-pivot model is not optimized to produce… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: RL4RealLife Workshop 2021 camera-ready

  10. Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer

    Authors: Evgeniia Tokarchuk, Jan Rosendahl, Weiyue Wang, Pavel Petrushkov, Tomer Lancewicki, Shahram Khadivi, Hermann Ney

    Abstract: Complex natural language applications such as speech translation or pivot translation traditionally rely on cascaded models. However, cascaded models are known to be prone to error propagation and model discrepancy problems. Furthermore, there is no possibility of using end-to-end training data in conventional cascaded systems, meaning that the training data most suited for the task cannot be used… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: IWSLT 2021 camera-ready

  11. arXiv:2109.08712  [pdf, other

    cs.CL cs.AI cs.LG

    Back-translation for Large-Scale Multilingual Machine Translation

    Authors: Baohao Liao, Shahram Khadivi, Sanjika Hewavitharana

    Abstract: This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). This work aims to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

  12. arXiv:2010.09482  [pdf, other

    cs.CL cs.AI

    Diving Deep into Context-Aware Neural Machine Translation

    Authors: **g**g Huo, Christian Herold, Yingbo Gao, Leonard Dahlmann, Shahram Khadivi, Hermann Ney

    Abstract: Context-aware neural machine translation (NMT) is a promising direction to improve the translation quality by making use of the additional context, e.g., document-level translation, or having meta-information. Although there exist various architectures and analyses, the effectiveness of different context-aware NMT models is not well explored yet. This paper analyzes the performance of document-lev… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: Accepted at 5th Conference on Machine Translation (WMT20)

  13. arXiv:1909.09524  [pdf, other

    cs.CL cs.LG

    Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages

    Authors: Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, Hermann Ney

    Abstract: We present effective pre-training strategies for neural machine translation (NMT) using parallel corpora involving a pivot language, i.e., source-pivot and pivot-target, leading to a significant improvement in source-target translation. We propose three methods to increase the relation among source, pivot, and target languages in the pre-training: 1) step-wise training of a single model for differ… ▽ More

    Submitted 20 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019 camera-ready

  14. arXiv:1906.07286  [pdf, other

    cs.CL cs.LG

    Generalizing Back-Translation in Neural Machine Translation

    Authors: Miguel Graça, Yunsu Kim, Julian Schamper, Shahram Khadivi, Hermann Ney

    Abstract: Back-translation - data augmentation by translating target monolingual data - is a crucial component in modern neural machine translation (NMT). In this work, we reformulate back-translation in the scope of cross-entropy optimization of an NMT model, clarifying its underlying mathematical assumptions and approximations beyond its heuristic usage. Our formulation covers broader synthetic data gener… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

    Comments: 4th Conference on Machine Translation (WMT 2019) camera-ready

  15. arXiv:1906.03129  [pdf, other

    cs.CL cs.AI

    Word-based Domain Adaptation for Neural Machine Translation

    Authors: Shen Yan, Leonard Dahlmann, Pavel Petrushkov, Sanjika Hewavitharana, Shahram Khadivi

    Abstract: In this paper, we empirically investigate applying word-level weights to adapt neural machine translation to e-commerce domains, where small e-commerce datasets and large out-of-domain datasets are available. In order to mine in-domain like words in the out-of-domain datasets, we compute word weights by using a domain-specific and a non-domain-specific language model followed by smoothing and bina… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: Published on the proceedings of the International Workshop on Spoken Language Translation (IWSLT), 2018

    Journal ref: Proceedings of the 15th International Workshop on Spoken Language Translation, Bruges, Belgium, October 29-30, 2018

  16. arXiv:1906.01942  [pdf, other

    cs.CL cs.LG

    Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

    Authors: Yunsu Kim, Hendrik Rosendahl, Nick Rossenbach, Jan Rosendahl, Shahram Khadivi, Hermann Ney

    Abstract: We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on to… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: ACL 2019 Repl4NLP camera-ready

  17. arXiv:1806.07169  [pdf, ps, other

    cs.CL

    Learning from Chunk-based Feedback in Neural Machine Translation

    Authors: Pavel Petrushkov, Shahram Khadivi, Evgeny Matusov

    Abstract: We empirically investigate learning from partial feedback in neural machine translation (NMT), when partial feedback is collected by asking users to highlight a correct chunk of a translation. We propose a simple and effective way of utilizing such feedback in NMT training. We demonstrate how the common machine translation problem of domain mismatch between training and deployment can be reduced s… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

    Comments: the paper accepted in ACL 2018 Conference, Melbourne, Australia

  18. arXiv:1804.05958  [pdf, other

    cs.CL stat.ML

    Can Neural Machine Translation be Improved with User Feedback?

    Authors: Julia Kreutzer, Shahram Khadivi, Evgeny Matusov, Stefan Riezler

    Abstract: We present the first real-world application of methods for improving neural machine translation (NMT) with human reinforcement, based on explicit and implicit user feedback collected on the eBay e-commerce platform. Previous work has been confined to simulation experiments, whereas in this paper we work with real logged feedback for offline bandit learning of NMT parameters. We conduct a thorough… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

    Comments: Accepted at NAACL-HLT 2018 (Industry Track)

  19. arXiv:1711.02053  [pdf, other

    cs.SI physics.soc-ph

    Detecting Community Structure in Dynamic Social Networks Using the Concept of Leadership

    Authors: Saeed Haji Seyed Javadi, Pedram Gharani, Shahram Khadivi

    Abstract: Detecting community structure in social networks is a fundamental problem empowering us to identify groups of actors with similar interests. There have been extensive works focusing on finding communities in static networks, however, in reality, due to dynamic nature of social networks, they are evolving continuously. Ignoring the dynamic aspect of social networks, neither allows us to capture evo… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

  20. arXiv:1708.03271  [pdf, other

    cs.CL

    Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search

    Authors: Leonard Dahlmann, Evgeny Matusov, Pavel Petrushkov, Shahram Khadivi

    Abstract: In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level translati… ▽ More

    Submitted 10 August, 2017; originally announced August 2017.

    Comments: To appear in Proceedings of EMNLP 2017

  21. arXiv:1708.03186  [pdf, other

    cs.CL

    Neural and Statistical Methods for Leveraging Meta-information in Machine Translation

    Authors: Shahram Khadivi, Patrick Wilken, Leonard Dahlmann, Evgeny Matusov

    Abstract: In this paper, we discuss different methods which use meta information and richer context that may accompany source language input to improve machine translation quality. We focus on category information of input text as meta information, but the proposed methods can be extended to all textual and non-textual meta information that might be available for the input text or automatically predicted us… ▽ More

    Submitted 10 August, 2017; originally announced August 2017.

    Comments: To appear in MT Summit 2017

  22. arXiv:1701.08533  [pdf, ps, other

    cs.CL

    Graph-Based Semi-Supervised Conditional Random Fields For Spoken Language Understanding Using Unaligned Data

    Authors: Mohammad Aliannejadi, Masoud Kiaeeha, Shahram Khadivi, Saeed Shiry Ghidary

    Abstract: We experiment graph-based Semi-Supervised Learning (SSL) of Conditional Random Fields (CRF) for the application of Spoken Language Understanding (SLU) on unaligned data. The aligned labels for examples are obtained using IBM Model. We adapt a baseline semi-supervised CRF by defining new feature set and altering the label propagation algorithm. Our results demonstrate that our proposed approach sig… ▽ More

    Submitted 30 January, 2017; originally announced January 2017.

    Comments: Workshop of The Australasian Language Technology Association

  23. arXiv:1701.01854  [pdf

    cs.CL

    Neural Machine Translation on Scarce-Resource Condition: A case-study on Persian-English

    Authors: Mohaddeseh Bastan, Shahram Khadivi, Mohammad Mehdi Homayounpour

    Abstract: Neural Machine Translation (NMT) is a new approach for Machine Translation (MT), and due to its success, it has absorbed the attention of many researchers in the field. In this paper, we study NMT model on Persian-English language pairs, to analyze the model and investigate the appropriateness of the model for scarce-resourced scenarios, the situation that exists for Persian-centered translation s… ▽ More

    Submitted 7 January, 2017; originally announced January 2017.

    Comments: 6 pages, Submitted in ICEE 2017

  24. arXiv:1607.01628  [pdf, other

    cs.CL cs.NE

    Guided Alignment Training for Topic-Aware Neural Machine Translation

    Authors: Wenhu Chen, Evgeny Matusov, Shahram Khadivi, Jan-Thorsten Peter

    Abstract: In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by… ▽ More

    Submitted 6 July, 2016; originally announced July 2016.

    Comments: 11 pages