Showing 1–2 of 2 results for author: Rahimikia, E

Search v0.5.6 released 2020-02-24

arXiv:2404.18543 [pdf, other]

cs.CL cs.CE cs.LG

Time Machine GPT

Authors: Felix Drinkall, Eghbal Rahimikia, Janet B. Pierrehumbert, Stefan Zohren

Abstract: Large language models (LLMs) are often trained on extensive, temporally indiscriminate text corpora, reflecting the lack of datasets with temporal metadata. This approach is not aligned with the evolving nature of language. Conventional methods for creating temporally adapted language models often depend on further pre-training static models on time-specific data. This paper presents a new approac… ▽ More Large language models (LLMs) are often trained on extensive, temporally indiscriminate text corpora, reflecting the lack of datasets with temporal metadata. This approach is not aligned with the evolving nature of language. Conventional methods for creating temporally adapted language models often depend on further pre-training static models on time-specific data. This paper presents a new approach: a series of point-in-time LLMs called Time Machine GPT (TiMaGPT), specifically designed to be nonprognosticative. This ensures they remain uninformed about future factual information and linguistic changes. This strategy is beneficial for understanding language evolution and is of critical importance when applying models in dynamic contexts, such as time-series forecasting, where foresight of future information can prove problematic. We provide access to both the models and training datasets. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: NAACL Findings 2024

MSC Class: I.2.1; I.2.7
arXiv:2108.00480 [pdf, other]

q-fin.CP cs.CL cs.LG

doi 10.2139/ssrn.3895272

Realised Volatility Forecasting: Machine Learning via Financial Word Embedding

Authors: Eghbal Rahimikia, Stefan Zohren, Ser-Huang Poon

Abstract: This study develops FinText, a financial word embedding compiled from 15 years of business news archives. The results show that FinText produces substantially more accurate results than general word embeddings based on the gold-standard financial benchmark we introduced. In contrast to well-known econometric models, and over the sample period from 27 July 2007 to 27 January 2022 for 23 NASDAQ stoc… ▽ More This study develops FinText, a financial word embedding compiled from 15 years of business news archives. The results show that FinText produces substantially more accurate results than general word embeddings based on the gold-standard financial benchmark we introduced. In contrast to well-known econometric models, and over the sample period from 27 July 2007 to 27 January 2022 for 23 NASDAQ stocks, using stock-related news, our simple natural language processing model supported by different word embeddings improves realised volatility forecasts on high volatility days. This improvement in realised volatility forecasting performance switches to normal volatility days when general hot news is used. By utilising SHAP, an Explainable AI method, we also identify and classify key phrases in stock-related and general hot news that moved volatility. △ Less

Submitted 1 March, 2023; v1 submitted 1 August, 2021; originally announced August 2021.

Search v0.5.6 released 2020-02-24