-
Mixed-type Distance Shrinkage and Selection for Clustering via Kernel Metric Learning
Authors:
Jesse S. Ghashti,
John R. J. Thompson
Abstract:
Distance-based clustering and classification are widely used in various fields to group mixed numeric and categorical data. In many algorithms, a predefined distance measurement is used to cluster data points based on their dissimilarity. While there exist numerous distance-based measures for data with pure numerical attributes and several ordered and unordered categorical metrics, an efficient an…
▽ More
Distance-based clustering and classification are widely used in various fields to group mixed numeric and categorical data. In many algorithms, a predefined distance measurement is used to cluster data points based on their dissimilarity. While there exist numerous distance-based measures for data with pure numerical attributes and several ordered and unordered categorical metrics, an efficient and accurate distance for mixed-type data that utilizes the continuous and discrete properties simulatenously is an open problem. Many metrics convert numerical attributes to categorical ones or vice versa. They handle the data points as a single attribute type or calculate a distance between each attribute separately and add them up. We propose a metric called KDSUM that uses mixed kernels to measure dissimilarity, with cross-validated optimal bandwidth selection. We demonstrate that KDSUM is a shrinkage method from existing mixed-type metrics to a uniform dissimilarity metric, and improves clustering accuracy when utilized in existing distance-based clustering algorithms on simulated and real-world datasets containing continuous-only, categorical-only, and mixed-type data.
△ Less
Submitted 30 August, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Measuring Financial Advice: aligning client elicited and revealed risk
Authors:
John R. J. Thompson,
Longlong Feng,
R. Mark Reesor,
Chuck Grace,
Adam Metzler
Abstract:
Financial advisors use questionnaires and discussions with clients to determine a suitable portfolio of assets that will allow clients to reach their investment objectives. Financial institutions assign risk ratings to each security they offer, and those ratings are used to guide clients and advisors to choose an investment portfolio risk that suits their stated risk tolerance. This paper compares…
▽ More
Financial advisors use questionnaires and discussions with clients to determine a suitable portfolio of assets that will allow clients to reach their investment objectives. Financial institutions assign risk ratings to each security they offer, and those ratings are used to guide clients and advisors to choose an investment portfolio risk that suits their stated risk tolerance. This paper compares client Know Your Client (KYC) profile risk allocations to their investment portfolio risk selections using a value-at-risk discrepancy methodology. Value-at-risk is used to measure elicited and revealed risk to show whether clients are over-risked or under-risked, changes in KYC risk lead to changes in portfolio configuration, and cash flow affects a client's portfolio risk. We demonstrate the effectiveness of value-at-risk at measuring clients' elicited and revealed risk on a dataset provided by a private Canadian financial dealership of over $50,000$ accounts for over $27,000$ clients and $300$ advisors. By measuring both elicited and revealed risk using the same measure, we can determine how well a client's portfolio aligns with their stated goals. We believe that using value-at-risk to measure client risk provides valuable insight to advisors to ensure that their practice is KYC compliant, to better tailor their client portfolios to stated goals, communicate advice to clients to either align their portfolios to stated goals or refresh their goals, and to monitor changes to the clients' risk positions across their practice.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Anisotropic local constant smoothing for change-point regression function estimation
Authors:
John R. J. Thompson,
W. John Braun
Abstract:
Understanding forest fire spread in any region of Canada is critical to promoting forest health, and protecting human life and infrastructure. Quantifying fire spread from noisy images, where regions of a fire are separated by change-point boundaries, is critical to faithfully estimating fire spread rates. In this research, we develop a statistically consistent smooth estimator that allows us to d…
▽ More
Understanding forest fire spread in any region of Canada is critical to promoting forest health, and protecting human life and infrastructure. Quantifying fire spread from noisy images, where regions of a fire are separated by change-point boundaries, is critical to faithfully estimating fire spread rates. In this research, we develop a statistically consistent smooth estimator that allows us to denoise fire spread imagery from micro-fire experiments. We develop an anisotropic smoothing method for change-point data that uses estimates of the underlying data generating process to inform smoothing. We show that the anisotropic local constant regression estimator is consistent with convergence rate $O\left(n^{-1/{(q+2)}}\right)$. We demonstrate its effectiveness on simulated one- and two-dimensional change-point data and fire spread imagery from micro-fire experiments.
△ Less
Submitted 7 May, 2024; v1 submitted 30 November, 2020;
originally announced December 2020.
-
Know Your Clients' behaviours: a cluster analysis of financial transactions
Authors:
John R. J. Thompson,
Longlong Feng,
R. Mark Reesor,
Chuck Grace
Abstract:
In Canada, financial advisors and dealers are required by provincial securities commissions and self-regulatory organizations--charged with direct regulation over investment dealers and mutual fund dealers--to respectively collect and maintain Know Your Client (KYC) information, such as their age or risk tolerance, for investor accounts. With this information, investors, under their advisor's guid…
▽ More
In Canada, financial advisors and dealers are required by provincial securities commissions and self-regulatory organizations--charged with direct regulation over investment dealers and mutual fund dealers--to respectively collect and maintain Know Your Client (KYC) information, such as their age or risk tolerance, for investor accounts. With this information, investors, under their advisor's guidance, make decisions on their investments which are presumed to be beneficial to their investment goals. Our unique dataset is provided by a financial investment dealer with over 50,000 accounts for over 23,000 clients. We use a modified behavioural finance recency, frequency, monetary model for engineering features that quantify investor behaviours, and machine learning clustering algorithms to find groups of investors that behave similarly. We show that the KYC information collected does not explain client behaviours, whereas trade and transaction frequency and volume are most informative. We believe the results shown herein encourage financial regulators and advisors to use more advanced metrics to better understand and predict investor behaviours.
△ Less
Submitted 14 May, 2020; v1 submitted 7 May, 2020;
originally announced May 2020.