Skip to main content

Showing 1–13 of 13 results for author: Valdez, E A

.
  1. arXiv:2406.16206  [pdf, other

    cs.LG stat.ML

    Zero-Inflated Tweedie Boosted Trees with CatBoost for Insurance Loss Analytics

    Authors: Banghee So, Emiliano A. Valdez

    Abstract: In this paper, we explore advanced modifications to the Tweedie regression model in order to address its limitations in modeling aggregate claims for various types of insurance such as automobile, health, and liability. Traditional Tweedie models, while effective in capturing the probability and magnitude of claims, usually fall short in accurately representing the large incidence of zero claims.… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2401.16723  [pdf, other

    q-fin.RM

    Improving Business Insurance Loss Models by Leveraging InsurTech Innovation

    Authors: Zhiyu Quan, Changyue Hu, Panyi Dong, Emiliano A. Valdez

    Abstract: Recent transformative and disruptive advancements in the insurance industry have embraced various InsurTech innovations. In particular, with the rapid progress in data science and computational capabilities, InsurTech is able to integrate a multitude of emerging data sources, shedding light on opportunities to enhance risk classification and claims management. This paper presents a groundbreaking… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  3. arXiv:2112.14868  [pdf, other

    stat.ML cs.LG

    The SAMME.C2 algorithm for severely imbalanced multi-class classification

    Authors: Banghee So, Emiliano A. Valdez

    Abstract: Classification predictive modeling involves the accurate assignment of observations in a dataset to target classes or categories. There is an increasing growth of real-world classification problems with severely imbalanced class distributions. In this case, minority classes have much fewer observations to learn from than those from majority classes. Despite this sparsity, a minority class is often… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 25 pages, 8 figures, algorithms

    MSC Class: 62P99

  4. arXiv:2112.14865  [pdf, other

    stat.AP

    Compositional Data Regression in Insurance with Exponential Family PCA

    Authors: Guojun Gan, Emiliano A. Valdez

    Abstract: Compositional data are multivariate observations that carry only relative information between components. Applying standard multivariate statistical methodology directly to analyze compositional data can lead to paradoxes and misinterpretations. Compositional data also frequently appear in insurance, especially with telematics information. However, such type of data does not receive deserved speci… ▽ More

    Submitted 29 December, 2021; originally announced December 2021.

    Comments: 21 pages, 5 figures, 10 tables

    MSC Class: 62P05

  5. arXiv:2102.00252  [pdf, other

    stat.ML cs.LG

    Synthetic Dataset Generation of Driver Telematics

    Authors: Banghee So, Jean-Philippe Boucher, Emiliano A. Valdez

    Abstract: This article describes techniques employed in the production of a synthetic dataset of driver telematics emulated from a similar real insurance dataset. The synthetic dataset generated has 100,000 policies that included observations about driver's claims experience together with associated classical risk variables and telematics-related variables. This work is aimed to produce a resource that can… ▽ More

    Submitted 30 January, 2021; originally announced February 2021.

    Comments: 24 pages, 11 figures, 6 tables

    MSC Class: 62P05

  6. arXiv:2101.10896  [pdf, other

    stat.AP

    Applications of Clustering with Mixed Type Data in Life Insurance

    Authors: Shuang Yin, Guojun Gan, Emiliano A. Valdez, Jeyaraj Vadiveloo

    Abstract: Death benefits are generally the largest cash flow item that affects financial statements of life insurers where some still do not have a systematic process to track and monitor death claims experience. In this article, we explore data clustering to examine and understand how actual death claims differ from expected, an early stage of develo** a monitoring system crucial for risk management. We… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: 25 pages, 6 figures, 5 tables

    MSC Class: 62P05

  7. arXiv:2008.05968  [pdf, other

    stat.AP

    Flexible Modeling of Hurdle Conway-Maxwell-Poisson Distributions with Application to Mining Injuries

    Authors: Shuang Yin, Dipak K. Dey, Emiliano A. Valdez, Xiaomeng Li

    Abstract: While the hurdle Poisson regression is a popular class of models for count data with excessive zeros, the link function in the binary component may be unsuitable for highly imbalanced cases. Ordinary Poisson regression is unable to handle the presence of dispersion. In this paper, we introduce Conway-Maxwell-Poisson (CMP) distribution and integrate use of flexible skewed Weibull link functions as… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: 23 pages, 7 Tables, 3 Figures

    MSC Class: 62P99

  8. arXiv:2008.00048  [pdf, other

    stat.AP

    Analysis of Prescription Drug Utilization with Beta Regression Models

    Authors: Guojun Gan, Emiliano A. Valdez

    Abstract: The healthcare sector in the U.S. is complex and is also a large sector that generates about 20% of the country's gross domestic product. Healthcare analytics has been used by researchers and practitioners to better understand the industry. In this paper, we examine and demonstrate the use of Beta regression models to study the utilization of brand name drugs in the U.S. to understand the variabil… ▽ More

    Submitted 31 July, 2020; originally announced August 2020.

    Comments: 26 pages, 10 Figures, 11 Tables

    MSC Class: 91G05

  9. arXiv:2007.15172  [pdf, other

    stat.AP

    Skewed link regression models for imbalanced binary response with applications to life insurance

    Authors: Shuang Yin, Dipak K. Dey, Emiliano A. Valdez, Guojun Gan, Jeyaraj Vadiveloo

    Abstract: For a portfolio of life insurance policies observed for a stated period of time, e.g., one year, mortality is typically a rare event. When we examine the outcome of dying or not from such portfolios, we have an imbalanced binary response. The popular logistic and probit regression models can be inappropriate for imbalanced binary response as model estimates may be biased, and if not addressed prop… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

    Comments: 25 pages, 7 Tables, 2 Figures

    MSC Class: 62P05

  10. arXiv:2007.03100  [pdf, other

    stat.AP cs.LG stat.ML

    Cost-sensitive Multi-class AdaBoost for Understanding Driving Behavior with Telematics

    Authors: Banghee So, Jean-Philippe Boucher, Emiliano A. Valdez

    Abstract: Powered with telematics technology, insurers can now capture a wide range of data, such as distance traveled, how drivers brake, accelerate or make turns, and travel frequency each day of the week, to better decode driver's behavior. Such additional information helps insurers improve risk assessments for usage-based insurance (UBI), an increasingly popular industry innovation. In this article, we… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: 27 pages, 9 figures, 10 tables

    MSC Class: 62P05

  11. arXiv:2006.06151  [pdf, other

    stat.AP

    On a Multi-Year Microlevel Collective Risk Model

    Authors: Rosy Oh, Himchan Jeong, Jae Youn Ahn, Emiliano A. Valdez

    Abstract: For a typical insurance portfolio, the claims process for a short period, typically one year, is characterized by observing frequency of claims together with the associated claims severities. The collective risk model describes this portfolio as a random sum of the aggregation of the claim amounts. In the classical framework, for simplicity, the claim frequency and claim severities are assumed to… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

  12. arXiv:2006.05617  [pdf, other

    stat.AP

    Hybrid Tree-based Models for Insurance Claims

    Authors: Zhiyu Quan, Zhiguo Wang, Guojun Gan, Emiliano A. Valdez

    Abstract: Two-part models and Tweedie generalized linear models (GLMs) have been used to model loss costs for short-term insurance contract. For most portfolios of insurance claims, there is typically a large proportion of zero claims that leads to imbalances resulting in inferior prediction accuracy of these traditional approaches. This article proposes the use of tree-based models with a hybrid structure… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: 24 pages, 6 figures

    MSC Class: 62P05

  13. arXiv:2004.08032  [pdf, other

    stat.ME stat.AP

    A non-convex regularization approach for stable estimation of loss development factors

    Authors: Himchan Jeong, Hyunwoong Chang, Emiliano A. Valdez

    Abstract: In this article, we apply non-convex regularization methods in order to obtain stable estimation of loss development factors in insurance claims reserving. Among the non-convex regularization methods, we focus on the use of the log-adjusted absolute deviation (LAAD) penalty and provide discussion on optimization of LAAD penalized regression model, which we prove to converge with a coordinate desce… ▽ More

    Submitted 6 December, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: 23 pages, 11 Tables, 6 Figures

    MSC Class: 62P05