Search | arXiv e-print repository

Industry Classification Using a Novel Financial Time-Series Case Representation

Authors: Rian Dolphin, Barry Smyth, Ruihai Dong

Abstract: The financial domain has proven to be a fertile source of challenging machine learning problems across a variety of tasks including prediction, clustering, and classification. Researchers can access an abundance of time-series data and even modest performance improvements can be translated into significant additional value. In this work, we consider the use of case-based reasoning for an important… ▽ More The financial domain has proven to be a fertile source of challenging machine learning problems across a variety of tasks including prediction, clustering, and classification. Researchers can access an abundance of time-series data and even modest performance improvements can be translated into significant additional value. In this work, we consider the use of case-based reasoning for an important task in this domain, by using historical stock returns time-series data for industry sector classification. We discuss why time-series data can present some significant representational challenges for conventional case-based reasoning approaches, and in response, we propose a novel representation based on stock returns embeddings, which can be readily calculated from raw stock returns data. We argue that this representation is well suited to case-based reasoning and evaluate our approach using a large-scale public dataset for the industry sector classification task, demonstrating substantial performance improvements over several baselines using more conventional representations. △ Less

Submitted 29 April, 2023; originally announced May 2023.

Comments: 15 pages

arXiv:2211.06378 [pdf, other]

A Multimodal Embedding-Based Approach to Industry Classification in Financial Markets

Authors: Rian Dolphin, Barry Smyth, Ruihai Dong

Abstract: Industry classification schemes provide a taxonomy for segmenting companies based on their business activities. They are relied upon in industry and academia as an integral component of many types of financial and economic analysis. However, even modern classification schemes have failed to embrace the era of big data and remain a largely subjective undertaking prone to inconsistency and misclassi… ▽ More Industry classification schemes provide a taxonomy for segmenting companies based on their business activities. They are relied upon in industry and academia as an integral component of many types of financial and economic analysis. However, even modern classification schemes have failed to embrace the era of big data and remain a largely subjective undertaking prone to inconsistency and misclassification. To address this, we propose a multimodal neural model for training company embeddings, which harnesses the dynamics of both historical pricing data and financial news to learn objective company representations that capture nuanced relationships. We explain our approach in detail and highlight the utility of the embeddings through several case studies and application to the downstream task of industry classification. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: 8 pages. Accepted at AICS 2022 under title "A Machine Learning Approach to Industry Classification in Financial Markets". Preliminary version under this title was discussed at ICAIF '22 Workshop on NLP and Network Analysis in Financial Applications. arXiv admin note: text overlap with arXiv:2202.08968

arXiv:2202.08968 [pdf, other]

Stock Embeddings: Learning Distributed Representations for Financial Assets

Authors: Rian Dolphin, Barry Smyth, Ruihai Dong

Abstract: Identifying meaningful relationships between the price movements of financial assets is a challenging but important problem in a variety of financial applications. However with recent research, particularly those using machine learning and deep learning techniques, focused mostly on price forecasting, the literature investigating the modelling of asset correlations has lagged somewhat. To address… ▽ More Identifying meaningful relationships between the price movements of financial assets is a challenging but important problem in a variety of financial applications. However with recent research, particularly those using machine learning and deep learning techniques, focused mostly on price forecasting, the literature investigating the modelling of asset correlations has lagged somewhat. To address this, inspired by recent successes in natural language processing, we propose a neural model for training stock embeddings, which harnesses the dynamics of historical returns data in order to learn the nuanced relationships that exist between financial assets. We describe our approach in detail and discuss a number of ways that it can be used in the financial domain. Furthermore, we present the evaluation results to demonstrate the utility of this approach, compared to several important benchmarks, in two real-world financial analytics tasks. △ Less

Submitted 14 February, 2022; originally announced February 2022.

Comments: Currently under review. 9 pages, 4 figures

arXiv:2107.03926 [pdf, other]

doi 10.1007/978-3-030-86957-1_5

Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities

Authors: Rian Dolphin, Barry Smyth, Yang Xu, Ruihai Dong

Abstract: Forecasting stock returns is a challenging problem due to the highly stochastic nature of the market and the vast array of factors and events that can influence trading volume and prices. Nevertheless it has proven to be an attractive target for machine learning research because of the potential for even modest levels of prediction accuracy to deliver significant benefits. In this paper, we descri… ▽ More Forecasting stock returns is a challenging problem due to the highly stochastic nature of the market and the vast array of factors and events that can influence trading volume and prices. Nevertheless it has proven to be an attractive target for machine learning research because of the potential for even modest levels of prediction accuracy to deliver significant benefits. In this paper, we describe a case-based reasoning approach to predicting stock market returns using only historical pricing data. We argue that one of the impediments for case-based stock prediction has been the lack of a suitable similarity metric when it comes to identifying similar pricing histories as the basis for a future prediction -- traditional Euclidean and correlation based approaches are not effective for a variety of reasons -- and in this regard, a key contribution of this work is the development of a novel similarity metric for comparing historical pricing data. We demonstrate the benefits of this metric and the case-based approach in a real-world application in comparison to a variety of conventional benchmarks. △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: 15 pages. Accepted for presentation at the International Conference on Case-Based Reasoning 2021 (ICCBR)

Showing 1–4 of 4 results for author: Smyth, B