Enhancing Context Through Contrast
Authors:
Kshitij Ambilduke,
Aneesh Shetye,
Diksha Bagade,
Rishika Bhagwatkar,
Khurshed Fitter,
Prasad Vagdargi,
Shital Chiddarwar
Abstract:
Neural machine translation benefits from semantically rich representations. Considerable progress in learning such representations has been achieved by language modelling and mutual information maximization objectives using contrastive learning. The language-dependent nature of language modelling introduces a trade-off between the universality of the learned representations and the model's perform…
▽ More
Neural machine translation benefits from semantically rich representations. Considerable progress in learning such representations has been achieved by language modelling and mutual information maximization objectives using contrastive learning. The language-dependent nature of language modelling introduces a trade-off between the universality of the learned representations and the model's performance on the language modelling tasks. Although contrastive learning improves performance, its success cannot be attributed to mutual information alone. We propose a novel Context Enhancement step to improve performance on neural machine translation by maximizing mutual information using the Barlow Twins loss. Unlike other approaches, we do not explicitly augment the data but view languages as implicit augmentations, eradicating the risk of disrupting semantic information. Further, our method does not learn embeddings from scratch and can be generalised to any set of pre-trained embeddings. Finally, we evaluate the language-agnosticism of our embeddings through language classification and use them for neural machine translation to compare with state-of-the-art approaches.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
Authors:
Kashif Rasul,
Arjun Ashok,
Andrew Robert Williams,
Hena Ghonia,
Rishika Bhagwatkar,
Arian Khorasani,
Mohammad Javad Darvishi Bayazi,
George Adamopoulos,
Roland Riachi,
Nadhir Hassen,
Marin Biloš,
Sahil Garg,
Anderson Schneider,
Nicolas Chapados,
Alexandre Drouin,
Valentina Zantedeschi,
Yuriy Nevmyvaka,
Irina Rish
Abstract:
Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a…
▽ More
Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data.
△ Less
Submitted 8 February, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.