Synthetic Data Applications in Finance
Authors:
Vamsi K. Potluru,
Daniel Borrajo,
Andrea Coletta,
Niccolò Dalmasso,
Yousef El-Laham,
Elizabeth Fons,
Mohsen Ghassemi,
Sriram Gopalakrishnan,
Vikesh Gosai,
Eleonora Kreačić,
Ganapathy Mani,
Saheed Obitayo,
Deepak Paramanand,
Natraj Raman,
Mikhail Solonin,
Srijan Sood,
Svitlana Vyetrenko,
Haibei Zhu,
Manuela Veloso,
Tucker Balch
Abstract:
Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured ar…
▽ More
Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured arising from both markets and retail financial applications. Since finance is a highly regulated industry, synthetic data is a potential approach for dealing with issues related to privacy, fairness, and explainability. Various metrics are utilized in evaluating the quality and effectiveness of our approaches in these applications. We conclude with open directions in synthetic data in the context of the financial domain.
△ Less
Submitted 20 March, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
A supervised generative optimization approach for tabular data
Authors:
Shinpei Nakamura-Sakai,
Fadi Hamad,
Saheed Obitayo,
Vamsi K. Potluru
Abstract:
Synthetic data generation has emerged as a crucial topic for financial institutions, driven by multiple factors, such as privacy protection and data augmentation. Many algorithms have been proposed for synthetic data generation but reaching the consensus on which method we should use for the specific data sets and use cases remains challenging. Moreover, the majority of existing approaches are ``u…
▽ More
Synthetic data generation has emerged as a crucial topic for financial institutions, driven by multiple factors, such as privacy protection and data augmentation. Many algorithms have been proposed for synthetic data generation but reaching the consensus on which method we should use for the specific data sets and use cases remains challenging. Moreover, the majority of existing approaches are ``unsupervised'' in the sense that they do not take into account the downstream task. To address these issues, this work presents a novel synthetic data generation framework. The framework integrates a supervised component tailored to the specific downstream task and employs a meta-learning approach to learn the optimal mixture distribution of existing synthetic distributions.
△ Less
Submitted 9 May, 2024; v1 submitted 10 September, 2023;
originally announced September 2023.