Skip to main content

Showing 1–16 of 16 results for author: Lancewicki, T

.
  1. arXiv:2406.16093  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Towards Natural Language-Driven Assembly Using Foundation Models

    Authors: Omkar Joglekar, Tal Lancewicki, Shir Kozlovsky, Vladimir Tchuiev, Zohar Feldman, Dotan Di Castro

    Abstract: Large Language Models (LLMs) and strong vision models have enabled rapid research and development in the field of Vision-Language-Action models that enable robotic control. The main objective of these methods is to develop a generalist policy that can control robots with various embodiments. However, in industrial robotic applications such as automated assembly and disassembly, some tasks, such as… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2311.02084  [pdf, other

    cs.CV cs.CL cs.IR

    ITEm: Unsupervised Image-Text Embedding Learning for eCommerce

    Authors: Baohao Liao, Michael Kozielski, Sanjika Hewavitharana, Jiangbo Yuan, Shahram Khadivi, Tomer Lancewicki

    Abstract: Product embedding serves as a cornerstone for a wide range of applications in eCommerce. The product embedding learned from multiple modalities shows significant improvement over that from a single modality, since different modalities provide complementary information. However, some modalities are more informatively dominant than others. How to teach a model to learn embedding from different modal… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 October, 2023; originally announced November 2023.

  3. arXiv:2305.08629  [pdf, ps, other

    cs.LG

    A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

    Authors: Dirk van der Hoeven, Lukas Zierahn, Tal Lancewicki, Aviv Rosenberg, Nicoló Cesa-Bianchi

    Abstract: We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in three important settings. On the one hand, we derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adver… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

  4. arXiv:2305.07911  [pdf, other

    cs.LG

    Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback

    Authors: Tal Lancewicki, Aviv Rosenberg, Dmitry Sotnikov

    Abstract: Policy Optimization (PO) is one of the most popular methods in Reinforcement Learning (RL). Thus, theoretical guarantees for PO algorithms have become especially important to the RL community. In this paper, we study PO in adversarial MDPs with a challenge that arises in almost every real-world application -- \textit{delayed bandit feedback}. We give the first near-optimal regret bounds for PO in… ▽ More

    Submitted 13 May, 2023; originally announced May 2023.

    Comments: ICML 2023

  5. arXiv:2207.14211  [pdf, ps, other

    cs.LG cs.AI cs.GT stat.ML

    Regret Minimization and Convergence to Equilibria in General-sum Markov Games

    Authors: Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, Yishay Mansour

    Abstract: An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable. Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure. In this work, we present the first (to our knowledge) algorithm f… ▽ More

    Submitted 8 August, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

  6. arXiv:2203.13151  [pdf, other

    cs.CL cs.LG stat.ML

    Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking

    Authors: Iñigo Urteaga, Moulay-Zaïdane Draïdia, Tomer Lancewicki, Shahram Khadivi

    Abstract: We design and evaluate a Bayesian optimization framework for resource efficient pre-training of Transformer-based language models (TLMs). TLM pre-training requires high computational resources and introduces many unresolved design choices, such as selecting its pre-training hyperparameters. We propose a multi-armed bandit framework for the sequential selection of TLM pre-training hyperparameters,… ▽ More

    Submitted 30 May, 2023; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Work accepted for publication at ACL Findings 2023. The code used for this study is publicly available at https://github.com/iurteaga/gp_ts_nlp

  7. arXiv:2201.13172  [pdf, ps, other

    cs.LG

    Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

    Authors: Tiancheng **, Tal Lancewicki, Haipeng Luo, Yishay Mansour, Aviv Rosenberg

    Abstract: The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observed in delay. This paper studies online learning in episodic Markov decision process (MDP) with unknown transitions, adversarially changing costs, and unrestricted delayed bandit feedback. More precisely, the feedback for the agent in epi… ▽ More

    Submitted 21 January, 2023; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: NeurIPS 2022

  8. arXiv:2201.13170  [pdf, ps, other

    cs.LG

    Cooperative Online Learning in Stochastic and Adversarial MDPs

    Authors: Tal Lancewicki, Aviv Rosenberg, Yishay Mansour

    Abstract: We study cooperative online learning in stochastic and adversarial Markov decision process (MDP). That is, in each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret. We consider environments with two types of randomness: \emph{fresh} -- where each agent's trajectory is sampled i.i.d, and \emph{non-fresh} -- where the realizat… ▽ More

    Submitted 1 September, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

  9. arXiv:2109.13097  [pdf, other

    cs.CL

    Towards Reinforcement Learning for Pivot-based Neural Machine Translation with Non-autoregressive Transformer

    Authors: Evgeniia Tokarchuk, Jan Rosendahl, Weiyue Wang, Pavel Petrushkov, Tomer Lancewicki, Shahram Khadivi, Hermann Ney

    Abstract: Pivot-based neural machine translation (NMT) is commonly used in low-resource setups, especially for translation between non-English language pairs. It benefits from using high resource source-pivot and pivot-target language pairs and an individual system is trained for both sub-tasks. However, these models have no connection during training, and the source-pivot model is not optimized to produce… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: RL4RealLife Workshop 2021 camera-ready

  10. Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer

    Authors: Evgeniia Tokarchuk, Jan Rosendahl, Weiyue Wang, Pavel Petrushkov, Tomer Lancewicki, Shahram Khadivi, Hermann Ney

    Abstract: Complex natural language applications such as speech translation or pivot translation traditionally rely on cascaded models. However, cascaded models are known to be prone to error propagation and model discrepancy problems. Furthermore, there is no possibility of using end-to-end training data in conventional cascaded systems, meaning that the training data most suited for the task cannot be used… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

    Comments: IWSLT 2021 camera-ready

  11. arXiv:2108.10197  [pdf, other

    cs.CL stat.CO

    Deploying a BERT-based Query-Title Relevance Classifier in a Production System: a View from the Trenches

    Authors: Leonard Dahlmann, Tomer Lancewicki

    Abstract: The Bidirectional Encoder Representations from Transformers (BERT) model has been radically improving the performance of many Natural Language Processing (NLP) tasks such as Text Classification and Named Entity Recognition (NER) applications. However, it is challenging to scale BERT for low-latency and high-throughput industrial use cases due to its enormous size. We successfully optimize a Query-… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

  12. arXiv:2106.02436  [pdf, other

    cs.LG

    Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

    Authors: Tal Lancewicki, Shahar Segal, Tomer Koren, Yishay Mansour

    Abstract: We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards, and the reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings, with an additional addi… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: 33 pages, 5 figures, ICML 2021

  13. arXiv:2012.14843  [pdf, other

    cs.LG

    Learning Adversarial Markov Decision Processes with Delayed Feedback

    Authors: Tal Lancewicki, Aviv Rosenberg, Yishay Mansour

    Abstract: Reinforcement learning typically assumes that agents observe feedback for their actions immediately, but in many real-world applications (like recommendation systems) feedback is observed in delay. This paper studies online learning in episodic Markov decision processes (MDPs) with unknown transitions, adversarially changing costs and unrestricted delayed feedback. That is, the costs and trajector… ▽ More

    Submitted 15 December, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: AAAI 2022

  14. arXiv:1908.07607  [pdf, other

    stat.ML cs.LG stat.CO

    Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent

    Authors: Tomer Lancewicki, Selcuk Kopru

    Abstract: Stochastic Gradient Descent (SGD) methods are prominent for training machine learning and deep learning models. The performance of these techniques depends on their hyperparameter tuning over time and varies for different models and problems. Manual adjustment of hyperparameters is very costly and time-consuming, and even if done correctly, it lacks theoretical justification which inevitably leads… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

  15. arXiv:1707.08885  [pdf, ps, other

    stat.CO

    Sequential Inverse Approximation of a Regularized Sample Covariance Matrix

    Authors: Tomer Lancewicki

    Abstract: One of the goals in scaling sequential machine learning methods pertains to dealing with high-dimensional data spaces. A key related challenge is that many methods heavily depend on obtaining the inverse covariance matrix of the data. It is well known that covariance matrix estimation is problematic when the number of observations is relatively small compared to the number of variables. A common w… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

    Comments: 5 pages

  16. arXiv:1707.06156  [pdf, ps, other

    stat.CO

    Regularization of the Kernel Matrix via Covariance Matrix Shrinkage Estimation

    Authors: Tomer Lancewicki

    Abstract: The kernel trick concept, formulated as an inner product in a feature space, facilitates powerful extensions to many well-known algorithms. While the kernel matrix involves inner products in the feature space, the sample covariance matrix of the data requires outer products. Therefore, their spectral properties are tightly connected. This allows us to examine the kernel matrix through the sample c… ▽ More

    Submitted 19 July, 2017; originally announced July 2017.

    Comments: 7 pages, 4 figures