Risk Segmentation Using Bayesian Quantile Regression with Natural Cubic Splines

Review Article

Austin Stat. 2014;1(1): 7.

Risk Segmentation Using Bayesian Quantile Regression with Natural Cubic Splines

Xia M*

Department of Statistics, Northern Illinois University, USA

*Corresponding author: :Xia M, Division of Statistics, Northern Illinois University, 300 Normal Road, DeKalb, IL 60115, USA

Received: May 29, 2014; Accepted: July 11, 2014; Published: July 16, 2014

Abstract

An insurance claims department is often interested in obtaining the possible distribution of a claim for the purpose of risk management. Information such as the probability that a claim will exceed a certain amount is helpful for matching the claim complexity with the specialty of claim adjusters. Using information available on the claim and the parties involved, we propose a Bayesian quantile regression model for the purpose of risk identification and segmentation in the claims department. Natural cubic splines are used in order to estimate a smooth relationship between the expected quantiles and continuous explanatory variables such as the age of the claimant. A case study is conducted using the Medical Large Claims Experience Study data from the Society of Actuaries. For the claimant age factor that we study, we observe that the high-risk groups, such as the infants and elderly, exhibit a much higher risk in terms of high quantiles (such as the 99% and 99.5% percentiles) than that is revealed by the mean or the median. Particularly for the claims data where there are various characteristics available on the claimant and other parties involved, our model may reveal helpful information on the possible extremal risk that may be under looked in traditional claims modeling.

Keywords: Risk segmentation; Bayesian quantile regression; Natural cubic splines; Markov chain Monte Carlo; predictive modeling of claims

Introduction

Insurance companies are often interested in assessing the risks associated with an insurance claim before it is finally settled. For the financial industry, a high risk not only means a high average amount, but also the possibility of an extremely large loss. For example, the claims department may be interested in the expected quantiles of the claim distribution (e.g., what is the probability that the claim will exceed a certain amount), once information is available on the claim and the parties involved. This information can be used to match the claim complexity with the specialty and experience of the claim adjusters. Therefore it is helpful to model different quantiles of the loss distribution, given certain characteristics of the claim and the parties involved.

Regression methods have been proven to be useful for the predictive modeling of insurance claims, particularly when there is information available on the claim characteristics. Proposed in earlier papers such as [1], generalized linear models (GLMs) have now become popular in nonlife rate-making and reserving. In the recent decades, more sophisticated regression models such as the generalized additive models (GAMs, [2]. Bayesian GAMs [3], generalized linear mixed models [4], quantile regression [5] have been proposed for rate-making and stochastic reserving. In recent papers such as [6,7] Bayesian generalized linear models were used to predict the outstanding claims for different combinations of loss and accident years. The earlier works on claims and reserve modeling, however, only involves regression on location parameters such as the mean and median of the loss distribution. In this paper, we propose to use Bayesian quantile regression [8] with natural cubic splines [9-13] for the purpose of risk identification and segmentation in the claims department. Quantile regression [14] has become popular in predictive modeling in econometrics and social science, as it provides a more complete picture of the distribution. Recent developments in quantile regression have been focusing on regularization [15-17]. Under the Bayesian framework [18], developed Gibbs samplers for Bayesian regularized quantile regression with lasso [19] group lasso [20] and elastic net penalties [21-22] improved the work of [18] by allowing different penalization parameters for different regression coefficients. Bayesian methods have the advantage of incorporating expert knowledge through priors. In addition, posteriors samples from Markov chain Monte Carlo (MCMC) simulations enable statistical inference on the estimated coefficients as well as regression lines with little extra computational cost.

For the Bayesian quantile regression model we propose for risk segmentation, we will conduct a case study using the Medical Large Claims Experience Study (MLCES) data from the Society of Actuaries (SOA). Natural cubic splines [9,10] will be used for obtaining a smooth relationship between the fitted quantiles and the age of the claimant. For model fitting, we will try using both the Bayesian quantile regression method [8], and non Bayesian methods such as those from [15,23,24] For the claimant age factor that we study, we observe that the high-risk groups, such as the infants and elderly, exhibit a much higher risk in terms of high quantiles (such as 99% and 99.5%) than that is revealed by the mean or the median. Particularly for the claims data where there are various characteristics available on the claim and the parties involved, our model may reveal helpful information on the possible extremal risk that may be under looked in traditional claims predictive modeling. Our case study confirms that Bayesian quantile regression may be a useful tool for risk identification and segmentation in the claims department.

The rest of the paper is organized as follows. In Section 2, we will introduce relevant methodologies such as quantile regression, Bayesian quantile regression and natural cubic splines. In Section 3, we will conduct a case study using the MLCES data from SOA. Section 4 concludes the paper.

Methodologies

Quantile regression

The concept of quantile regression was first introduced by [14]. While linear regression focuses on conditional expectations, quantile regression is for modeling conditional quantiles given certain explanatory variables. Denote y1, y2,..., yn as n observations of the response variable under concern, and x1, x2,..., xn as the vectors of explanatory variables of length k. For 0 < p < 1, the linear regression model for the path quantile is given by

Q p ( y i / x i )= x i β,                                                                               ( 1 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyuamaaBaaaleaacaWGWbaabeaakiaacIcacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaai4laiaadIhadaWgaaWcbaGaamyAaaqabaGccaGGPaGaeyypa0JabmiEayaafaWaaSbaaSqaaiaadMgaaeqaaOGaeqOSdiMaaiilaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckadaqadaqaaiaabgdaaiaawIcacaGLPaaaaaa@A01F@

Where Qp(yi/xi) is the inverse cumulative distribution function of yi given xi evaluated at the probability p, and β is a vector of coefficients for the k explanatory variables in xi. Here we will discuss the methods based on a linear relationship between the quantiles of the response and the explanatory variables, although the methods may be extended for non-linear relationships.

While the coefficients of the ordinary linear regression are estimated by minimizing the sum of squared errors, i=1 n ( y i x i β) 2 MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabmaeaacaGGOaGaamyEamaaBaaaleaacaWGPbaabeaakiabgkHiTiqadIhagaqbamaaBaaaleaacaWGPbaabeaakiabek7aIjaacMcadaahaaWcbeqaaiaaikdaaaaabaGaamyAaiabg2da9iaaigdaaeaacaWGUbaaniabggHiLdaaaa@44AA@ estimates of the quantile regression coefficients β β MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaCbiaeaacqaHYoGyaSqabeaacqGHNis2aaaaaa@398A@ (ρ) are called the pth regression quantile, and are obtained by minimizing

i=1 n ρ p ( y i x i β)                                                         ( 2 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabCaeaacqaHbpGCdaWgaaWcbaGaamiCaaqabaGccaGGOaGaamyEamaaBaaaleaacaWGPbaabeaakiabgkHiTiqadIhagaqbamaaBaaaleaacaWGPbaabeaakiabek7aIjaacMcaaSqaaiaadMgacqGH9aqpcaaIXaaabaGaamOBaaqdcqGHris5aOGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckadaqadaqaaiaabkdaaiaawIcacaGLPaaaaaa@891F@

where ρp(·) is the check loss function defined by

ρ p (t)={ pt (1p)t ift0 ift<0 ,                                                ( 3 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqyWdi3aaSbaaSqaaiaadchaaeqaaOGaaiikaiaadshacaGGPaGaeyypa0ZaaiqaaeaafaqabeGabaaabaGaamiCaiaadshaaeaacqGHsislcaGGOaGaaGymaiabgkHiTiaadchacaGGPaGaamiDaaaaaiaawUhaauaabeqaceaaaeaacaWGPbGaamOzaiaadshacqGHLjYScaaIWaaabaGaamyAaiaadAgacaWG0bGaeyipaWJaaGimaaaacaGGSaGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcWaaeWaaeaacaqGZaaacaGLOaGaayzkaaaaaa@88C4@

which places asymmetric weights on positive and negative residuals.

In [23], the regression quantiles (i.e., coefficients) were estimated using a modified simplex algorithm proposed by [25, 23] noted that the computational cost of the algorithm can increase dramatically when the sample size and the number of parameters increase. Hence, in [26] the authors proposed interior point methods with a new statistical preprocessing approach for l1-type problems. These new algorithms increased the computational speed by 10 to 100-fold. Interested readers may refer to the original papers for detailed information on the algorithms. Statistical inference on the regression quantiles is usually achieved by re sampling methods such as bootstrapping. The re sampling methods for quantile regression were discussed in papers such as [15,27]. Other methods for statistical inference in quantile regression include the inversion of rank test proposed by [28] and the direct and studentization methods by [29]. In actuarial science [5], proposed to use quantile regression for the purpose of nonlife rate-making [5]. Took advantage of the robustness of the fitted quantiles in the presence of outliers. To our knowledge, however, little research seems to have been con- ducted in the actuarial literature to make use of the capability of quantile regression in revealing comprehensive distributional characteristics including both the location and scale.

Quantile regression with penalty

In order to avoid over-fitting and to provide regularization in variable estimation, variable selection by penalized likelihood has gained much attention under the regression context. For quantile regression [15], was the first paper that introduced penalty functions to shrink the estimates of random effects for longitudinal data. The penalized version of Equation (2) is given by

i=1 n ρ p ( y i x i β) +λJ( x i β),                         ( 4 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabCaeaacqaHbpGCdaWgaaWcbaGaamiCaaqabaGccaGGOaGaamyEamaaBaaaleaacaWGPbaabeaakiabgkHiTiqadIhagaqbamaaBaaaleaacaWGPbaabeaakiabek7aIjaacMcaaSqaaiaadMgacqGH9aqpcaaIXaaabaGaamOBaaqdcqGHris5aOGaey4kaSIaeq4UdWMaamOsaiaacIcaceWG4bGbauaadaWgaaWcbaGaamyAaaqabaGccqaHYoGycaGGPaGaaiilaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckadaqadaqaaiaabsdaaiaawIcacaGLPaaaaaa@6F01@

Where λ is the regularization parameter (i.e., a tuning parameter), and J(·) is the penalty function.

In [15,17], the LASSO penalty [19] was used for regularization and variable selection. The LASSO quantile regression [15] is estimated by minimizing

i=1 n ρ p ( y i x i β)+λ β 1 ,                                              ( 5 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabCaeaacqaHbpGCdaWgaaWcbaGaamiCaaqabaGccaGGOaGaamyEamaaBaaaleaacaWGPbaabeaakiabgkHiTiqadIhagaqbamaaBaaaleaacaWGPbaabeaakiabek7aIjaacMcacqGHRaWkcqaH7oaBdaqbdaqaaiabek7aIbGaayzcSlaawQa7amaaBaaaleaacaaIXaaabeaaaeaacaWGPbGaeyypa0JaaGymaaqaaiaad6gaa0GaeyyeIuoakiaacYcacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcWaaeWaaeaacaqG1aaacaGLOaGaayzkaaaaaa@86A4@

where λ is nonnegative, and β 1 MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaauWaaeaacqaHYoGyaiaawMa7caGLkWoadaWgaaWcbaGaaGymaaqabaaaaa@3BA1@ is the l1 penalty which shrinks the regression coefficients to zero as λ increases.

Another well-known penalized quantile regression model is the SCAD quantile regression pro-posed by [24]. According to Fan and [30] the SCAD penalty possesses the oracle properties that the LASSO does not have. The SCAD quantile regression coefficients are estimated by minimizing

i=1 n ρ p ( y i x i β)+ j=1 k p λ ( β j )                                                                ( 6 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabCaeaacqaHbpGCdaWgaaWcbaGaamiCaaqabaGccaGGOaGaamyEamaaBaaaleaacaWGPbaabeaakiabgkHiTiqadIhagaqbamaaBaaaleaacaWGPbaabeaakiabek7aIjaacMcacqGHRaWkdaaeWbqaaiaadchadaWgaaWcbaGaeq4UdWgabeaakiaacIcacqaHYoGydaWgaaWcbaGaamOAaaqabaGccaGGPaaaleaacaWGQbGaeyypa0JaaGymaaqaaiaadUgaa0GaeyyeIuoaaSqaaiaadMgacqGH9aqpcaaIXaaabaGaamOBaaqdcqGHris5aOGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcWaaeWaaeaacaqG2aaacaGLOaGaayzkaaaaaa@9EE1@

where the SCAD penalty pλ(·) is defined based on its first derivative and is symmetric around zero.

For θ> 0, the first derivative of the SCAD penalty is given by

p λ (θ)=λ{ I(θλ)+ (aλθ) + (a1)λ I(θ>λ) }, MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmiCayaafaWaaSbaaSqaaiabeU7aSbqabaGccaGGOaGaeqiUdeNaaiykaiabg2da9iabeU7aSnaacmaabaGaamysaiaacIcacqaH4oqCcqGHKjYOcqaH7oaBcaGGPaGaey4kaSYaaSaaaeaacaGGOaGaamyyaiabeU7aSjabgkHiTiabeI7aXjaacMcadaWgaaWcbaGaey4kaScabeaaaOqaaiaacIcacaWGHbGaeyOeI0IaaGymaiaacMcacqaH7oaBaaGaamysaiaacIcacqaH4oqCcqGH+aGpcqaH7oaBcaGGPaaacaGL7bGaayzFaaGaaiilaaaa@5DA1@

where a > 2 is a tuning parameter. The SCAD penalty has a form similar to the LASSO penalty around zero, but it places a uniform penalty on large coefficients in order to achieve the unbiasedness for penalized estimators.

In addition to the SCAD penalty, [24] proposed the adaptive LASSO penalty for quantile regression. The adaptive LASSO is a generalization of the LASSO penalty which allows adaptive weights (i.e., different weights) for different regression coefficients. According to [31] the adaptive LASSO also posses the oracle properties. Details of adaptive LASSO quantile regression can be found in [24].

Bayesian quantile regression

In [8] the authors introduced Bayesian quantile regression using a likelihood function based on the asymmetric Laplace distribution. This is based on the property that the minimization of Equation (2) is equivalent to the maximization of the likelihood function for in-dependently distributed asymmetric Laplace distributions. The probability density function of an asymmetric Laplace distribution is given by

f p (u)=p(1p)exp{ ρ p (u) },                                                                            ( 7 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOzamaaBaaaleaacaWGWbaabeaakiaacIcacaWG1bGaaiykaiabg2da9iaadchacaGGOaGaaGymaiabgkHiTiaadchacaGGPaGaciyzaiaacIhacaGGWbWaaiWaaeaacqGHsislcqaHbpGCdaWgaaWcbaGaamiCaaqabaGccaGGOaGaamyDaiaacMcaaiaawUhacaGL9baacaGGSaGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOamaabmaabaGaae4naaGaayjkaiaawMcaaaaa@A526@

where 0 < p < 1 and ρp(_) is the check loss function defined in Equation (2). Except for p = 1/2, the density in Equation (7) is asymmetric.

After introducing a location parameter μ and a scale parameter s into Equation (7), we may obtain a generalization of the density as

f p (u;μ,σ)= p(1p) σ exp{ ρ p (uμ) σ }.                                                               ( 8 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOzamaaBaaaleaacaWGWbaabeaakiaacIcacaWG1bGaai4oaiabeY7aTjaacYcacqaHdpWCcaGGPaGaeyypa0ZaaSaaaeaacaWGWbGaaiikaiaaigdacqGHsislcaWGWbGaaiykaaqaaiabeo8aZbaaciGGLbGaaiiEaiaacchadaGadaqaaiabgkHiTmaalaaabaGaeqyWdi3aaSbaaSqaaiaadchaaeqaaOGaaiikaiaadwhacqGHsislcqaH8oqBcaGGPaaabaGaeq4WdmhaaaGaay5Eaiaaw2haaiaac6cacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckadaqadaqaaiaabIdaaiaawIcacaGLPaaaaaa@A186@

Under the assumptions of the asymmetric Laplace distribution and a link function as the inverse cumulative distribution, one may estimate the coefficients of quantile regression by maximizing the likelihood similar to parameter estimation in the case of a generalized linear model (GLM). Denote μ i =E( y i | x i ),f( y i ; μ i ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd02aaSbaaSqaaiaadMgaaeqaaOGaeyypa0JaamyraiaacIcacaWG5bWaaSbaaSqaaiaadMgaaeqaaOWaaqqaaeaacaWG4bWaaSbaaSqaaiaadMgaaeqaaaGccaGLhWoacaGGPaGaaiilaiaadAgacaGGOaGaamyEamaaBaaaleaacaWGPbaabeaakiaacUdacqaH8oqBdaWgaaWcbaGaamyAaaqabaGccaGGPaaaaa@4A7B@ as the distribution function of the response variables yi, and the GLM link function as g( μ i ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4zaiaacIcacqaH8oqBdaWgaaWcbaGaamyAaaqabaGccaGGPaaaaa@3B11@ . Regardless of the original distribution of the data, inference can be made if we assume that for any 0 < p < 1,

f( y i ; μ i )= f p ( y i ; μ i ) g( μ i )= Q p ( y i | x i ),                                                                                   ( 9 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqbaeqabiqaaaqaaiaadAgacaGGOaGaamyEamaaBaaaleaacaWGPbaabeaakiaacUdacqaH8oqBdaWgaaWcbaGaamyAaaqabaGccaGGPaGaeyypa0JaamOzamaaBaaaleaacaWGWbaabeaakiaacIcacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaai4oaiabeY7aTnaaBaaaleaacaWGPbaabeaakiaacMcaaeaacaWGNbGaaiikaiabeY7aTnaaBaaaleaacaWGPbaabeaakiaacMcacqGH9aqpcaWGrbWaaSbaaSqaaiaadchaaeqaaOGaaiikaiaadMhadaWgaaWcbaGaamyAaaqabaGcdaabbaqaaiaadIhadaWgaaWcbaGaamyAaaqabaaakiaawEa7aiaacMcacaGGSaaaaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckadaqadaqaaiaabMdaaiaawIcacaGLPaaaaaa@B801@

where Q p ( y i | x i ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyuamaaBaaaleaacaWGWbaabeaakiaacIcacaWG5bWaaSbaaSqaaiaadMgaaeqaaOWaaqqaaeaacaWG4bWaaSbaaSqaaiaadMgaaeqaaaGccaGLhWoacaGGPaaaaa@3F23@ is the inverse cumulative distribution function defined earlier in Subsection 2.1.For the purpose of Bayesian analysis, we denote π(β) as the priors for the pth regression quantiles (i.e., coefficients), y = (y1, y2,...,yn) as the n observations of the response variable, and L p (y|β) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaaBaaaleaacaWGWbaabeaakiaacIcacaWG5bWaaqqaaeaacqaHYoGyaiaawEa7aiaacMcaaaa@3D7A@ as the conditional distribution of the response variable based on the asymmetric Laplace distribution.

That is,

L p (y|β)= p n (1p) n exp{ i=1 n ρ p ( y i x i β) }                                              ( 10 ) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaaBaaaleaacaWGWbaabeaakiaacIcacaWG5bWaaqqaaeaacqaHYoGyaiaawEa7aiaacMcacqGH9aqpcaWGWbWaaWbaaSqabeaacaWGUbaaaOGaaiikaiaaigdacqGHsislcaWGWbGaaiykamaaCaaaleqabaGaamOBaaaakiGacwgacaGG4bGaaiiCamaacmaabaWaaabCaeaacqaHbpGCdaWgaaWcbaGaamiCaaqabaGccaGGOaGaamyEamaaBaaaleaacaWGPbaabeaakiabgkHiTiqadIhagaqbamaaBaaaleaacaWGPbaabeaakiabek7aIjaacMcaaSqaaiaadMgacqGH9aqpcaaIXaaabaGaamOBaaqdcqGHris5aaGccaGL7bGaayzFaaGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOaiaacckacaGGGcGaaiiOamaabmaabaGaaeymaiaaicdaaiaawIcacaGLPaaaaaa@9349@

Bayesian inference can be made based on the posterior distribution given by

f p (β|y) L p (y|β)π(β) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOzamaaBaaaleaacaWGWbaabeaakiaacIcadaabcaqaaiabek7aIbGaayjcSdGaamyEaiaacMcacqGHDisTcaWGmbWaaSbaaSqaaiaadchaaeqaaOGaaiikaiaadMhadaabbaqaaiabek7aIbGaay5bSdGaaiykaiabec8aWjaacIcacqaHYoGycaGGPaaaaa@4B55@

Although a conjugate prior is not available for exact analysis, Markov chain Monte Carlo (MCMC) techniques may be used to obtain posterior samples of the regression coefficients for the purpose of statistical inference [8]. Demonstrated that improper priors on β will yield a proper posterior distribution. Vague or non-informative priors may be chosen to reflect lack of information, while informative priors may be specified when subject-area knowledge is available. In the case of an informative prior, the prior mean represents a guess of the regression coefficients, and the prior variance or precision indicates the uncertainty on the guess. In quantile regression, the use of MCMC enables statistical inference on the regression quantiles with little extra computational cost. Using the posterior samples from MCMC, one may construct credible intervals for the fitted quantiles, where non Bayesian methods encounters difficulties and for which re sampling methods can be computationally expensive, particularly for large insurance data.

In the recent decade, the developments of Bayesian quantile regression have had a focus on parameter regularization and variable selection. For example, [18] proposed Bayesian regularized quantile regression with lasso [19] group lasso [20] and elastic net penalties [21,18] developed Gibbs samplers for the three types of regularized quantile regression, and demonstrated that the Bayesian regularized quantile regression perform better than the non-Bayesian methods in terms of accuracy and prediction by simulation studies [22] improved the work of [18] by allowing different tuning parameters for different regression coefficients, with the performance of their method evaluated by simulations and case studies. Interested readers may refer to the original papers for details on the new methods.

Natural cubic splines

In the quantile regression model given in Equation (1), the relationship between the quantiles of the response variable and the explanatory variables is assumed to be linear. This is often not true under realistic situations. For example, in property and casualty loss modeling, the younger people and the elderly usually exhibit a higher risk in terms of potential losses (see, e.g., [32]). Another example is the vehicle age variable in auto rate-making, which has a similar pattern at the younger and older vehicle ages (see, e.g., [33]). Some natural ways to model a nonlinear relationship include polynomials and piece-wise functions. For example, the property and casualty pricing software Emblem enables actuaries to model the relationship between the expected losses and continuous rating factors using polynomials. However, for polynomials, the number of parameters grows exponentially with the order of polynomials. And the shapes of polynomials are constraint based on the order specified. Splines are piecewise polynomials with local polynomial representations. For regression purposes, fixed-knot splines are widely used for obtaining a nonlinear relationship. Splines are assumed to be continuous, and have continuous first and second derivatives at the knots, in order to provide a smooth relationship. Splines can be defined based on the order of polynomials, the number of knots and their positions. Cubic splines are the lowest-order splines with the knot-discontinuity that is undetectable by human eyes (see, e.g., Chapter 5 in [9]. In order to avoid the erratic behavior of splines at the boundaries that may cause a problem in extrapolation, we may use natural cubic splines that add the additional constraints of linearity beyond the two boundaries. For example, in[10,11] natural cubic splines were used to model the relationship between disease prevalence and medical expenditure (utilization) with sampling probabilities in order to extrapolate the disease prevalence and medical expenditure (or utilization) for hidden sub-populations in weighted sampling.

Here we present a natural cubic spline with 4 knots as an example. Denoting (x1, x2, x3, x4) as the 4 knots, the spline is defined by three cubic functions within each interval divided by the knots:

S(x)= S k (x)= a k + b k (x x k )+ c k (x x k ) 2 + d k (x x k ) 3 ,  x k x x k+1 , k=1,2,3 MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4uaiaacIcacaWG4bGaaiykaiabg2da9iaadofadaWgaaWcbaGaam4AaaqabaGccaGGOaGaamiEaiaacMcacqGH9aqpcaWGHbWaaSbaaSqaaiaadUgaaeqaaOGaey4kaSIaamOyamaaBaaaleaacaWGRbaabeaakiaacIcacaWG4bGaeyOeI0IaamiEamaaBaaaleaacaWGRbaabeaakiaacMcacqGHRaWkcaWGJbWaaSbaaSqaaiaadUgaaeqaaOGaaiikaiaadIhacqGHsislcaWG4bWaaSbaaSqaaiaadUgaaeqaaOGaaiykamaaCaaaleqabaGaaGOmaaaakiabgUcaRiaadsgadaWgaaWcbaGaam4AaaqabaGccaGGOaGaamiEaiabgkHiTiaadIhadaWgaaWcbaGaam4AaaqabaGccaGGPaWaaWbaaSqabeaacaaIZaaaaOGaaiilaiaabccacaWG4bWaaSbaaSqaaiaadUgaaeqaaOGaeyizImQaamiEaiabgsMiJkaadIhadaWgaaWcbaGaam4AaiabgUcaRiaaigdaaeqaaOGaaiilaiaabccacaqGRbGaeyypa0JaaeymaiaacYcacaqGYaGaaiilaiaabodaaaa@6ED8@

with ak, bk, ck and dk be the coefficients of the local cubic functions. At the four knots the natural cubic spline has the nice properties that

S k ( x k+1 )= S k+1 ( x k+1 ) S k ( x k+1 )= S k+1 ( x k+1 ) S k ( x k+1 )= S k+1 ( x k+1 ) S 1 ( x 1 )= S 4 ( x 4 )=0. MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqbaeqabqqaaaaabaGaam4uamaaBaaaleaacaWGRbaabeaakiaacIcacaWG4bWaaSbaaSqaaiaadUgacqGHRaWkcaaIXaaabeaakiaacMcacqGH9aqpcaWGtbWaaSbaaSqaaiaadUgacqGHRaWkcaaIXaaabeaakiaacIcacaWG4bWaaSbaaSqaaiaadUgacqGHRaWkcaaIXaaabeaakiaacMcaaeaaceWGtbGbauaadaWgaaWcbaGaam4AaaqabaGccaGGOaGaamiEamaaBaaaleaacaWGRbGaey4kaSIaaGymaaqabaGccaGGPaGaeyypa0Jabm4uayaafaWaaSbaaSqaaiaadUgacqGHRaWkcaaIXaaabeaakiaacIcacaWG4bWaaSbaaSqaaiaadUgacqGHRaWkcaaIXaaabeaakiaacMcaaeaaceWGtbGbayaadaWgaaWcbaGaam4AaaqabaGccaGGOaGaamiEamaaBaaaleaacaWGRbGaey4kaSIaaGymaaqabaGccaGGPaGaeyypa0Jabm4uayaagaWaaSbaaSqaaiaadUgacqGHRaWkcaaIXaaabeaakiaacIcacaWG4bWaaSbaaSqaaiaadUgacqGHRaWkcaaIXaaabeaakiaacMcaaeaaceWGtbGbayaadaWgaaWcbaGaaGymaaqabaGccaGGOaGaamiEamaaBaaaleaacaaIXaaabeaakiaacMcacqGH9aqpceWGtbGbayaadaWgaaWcbaGaaGinaaqabaGccaGGOaGaamiEamaaBaaaleaacaaI0aaabeaakiaacMcacqGH9aqpcaaIWaGaaiOlaaaaaaa@7660@

For splines, one may perform a linear basis expansion for the convenience and simplicity of model implementation. A natural cubic spline with K knots can be represented by K basis functions h1(x); h2(x),...,hK(x). The basic functions satisfy the property that

S(x)= k=1 K α k h k (x) MathType@MTEF@5@5@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4uaiaacIcacaWG4bGaaiykaiabg2da9maaqahabaGaeqySde2aaSbaaSqaaiaadUgaaeqaaOGaamiAamaaBaaaleaacaWGRbaabeaakiaacIcacaWG4bGaaiykaaWcbaGaam4Aaiabg2da9iaaigdaaeaacaWGlbaaniabggHiLdaaaa@4717@

where α1; α2,...,αK are the coefficients. Using the basic functions from basis expansion, linear models can be conveniently fitted to the basic functions, which results in natural cubic splines in providing a smooth and flexible relationship.

Case Study on MLCES Data

Data

For demonstration purposes, we will conduct a case study using the Medical Large Claims Experience Study (MLCES) data from the Society of Actuaries (SOA). The 1999 file [34] that we use has 1,591,738 records, containing the total paid charges as well as explanatory variables such as the age, the gender and the major disease diagnosis of the claimant. Here we chose the age of the claimant as a factor that may impact the distribution of the claims. The reason why we chose age is because insurance claims usually exhibit a declining trend at the younger ages and an increasing trend at the older years [32]. It would be interesting to see how the distribution (including the upper tail that is of particular interest to the insurance industry) varies by the age of the claimant. In order to obtain homogeneous data with a reasonable sample size, we only include claims with a major diagnosis of respiratory system problems. The subset we use contains 165,786 records and one specific explanatory variable for illustration purposes, although in reality we may have numerous variables available on the claim to be used as predictors in our quantile regression model. For example, for auto bodily injury claims, the claims department may have various information on the claimant, the insured, the injury conditions, and the legal firm involved, before the claim is finally settled. The information may be used to predict the quantiles of the claim, using a model fitted from historical data. Based on the estimated quantiles, the claims department may be able to make decisions on assigning adjusters or taking risk management measures if necessary.

Exploratory analysis

Due to the heavy-tailed property of the claim data, we transform the amount into the logarithmic scale in order to obtain a good visualization of the distribution, particularly at the body of the claim distribution. The age of the claimant for the MLCES data varies from 0 to 105, with the sample size decreasing for older ages. As our dataset provides an adequately large sample size, it would be interesting to study how the distributions of claims vary by the age of the claimant. In Figure 1, we present the distribution of log10 (total paid charges) in violin plots which contains box plots as well as density curves for different age groups. From an animation density plot we created by age, we observe that the ages 0-1 have a much higher location and scale for the distribution of the claim amounts than toddler years after age 2. We divide the claimants into 5-year age groups, with the infants (age 0 and 1) and the elderly over 76 in separate groups as they exhibit different loss behaviors. After the grouping, we have ensured that the sample size (varying from 515 to 24,727) is large enough for each age group. We observe that the median and the first and third quantiles all show a declining trend at the younger ages and an increasing trend at the older years. The range of the distribution (i.e., IQR) also varies with the age of the claimant, especially obvious at the older ages. From the density curves, the older age groups have a larger spread of density at the upper tail. Some age groups such as the 71 | |75 and 76+ groups have multiple modes in the distribution, suggesting heterogeneity due to possible difference in other factors such as the relationship to the subscriber and deductibles. All of the above observations suggest a higher financial risk for claimants at younger and older ages, both in terms of the claim severity and variability.