Size and Power of Tests of Hypotheses on Survival Parameters from the Lindley Distribution with Covariates

Special Article - Biostatistics Theory and Methods

Austin Biom and Biostat. 2015;2(4): 1026.

# Size and Power of Tests of Hypotheses on Survival Parameters from the Lindley Distribution with Covariates

Macaulay Okwuokenye¹* and Karl E Peace²

¹Biogen Inc, Cambridge, MA

²Jiann-Ping Hsu College of Public Health, Georgia Southern University, USA

*Corresponding author: Macaulay Okwuokenye, Biogen Inc, Cambridge, USA

Received: June 01, 2015; Accepted: June 11, 2015; Published: July 28, 2015

## Abstract

The Lindley model is considered as an alternative model facilitating analyses of time-to-event data with covariates. Covariate information is incorporated using the Cox’s proportional hazard model with the Lindley model as the timedependent component. Simulation studies are performed to assess the size and power of tests of hypotheses on parameters arising from maximum likelihood estimators of parameters in the Lindley model. Results are contrasted with that arising from Cox’s partial maximum likelihood estimator. The Lindley-Cox model, is used to analyze a publicly available data set and contrasted with other models.

Keywords: Lindley distribution; Lindley-Cox model; Hazard function;

## Introduction and Definitions

The Lindley distribution [1] was introduced in connection with fiducial distribution and the Bayes’ theorem, but it has not been widely utilized for analyses of time-to-event data until the past decade when Ghitany et al. [2] explored its mathematical properties and real data applications in time-to-event settings. However, to the best of our knowledge the Lindley distribution has not been considered in connection with the incorporation of covariate information.

Covariate information and times-to-event are collected on subjects in time-to-event studies (e.g., time-to-death or survival time). Such data are often analyzed by choosing a suitable model for the times-to-event that allows covariates to be utilized in the statistical analyses. The analyses proceed by estimating parameters in the model and testing hypotheses about those parameters based on their estimates. The validity and reliability of inferences from tests of hypotheses about parameters depend on the size and power of the tests.

We motivate incorporation of covariate information using the Lindley distribution as the time-dependent component of the Cox’s proportional (Cox’s PH) model with the data set provided by Freireich et al. [3]. The data set contains survival times and White Blood Cell (WBC) information on a small number of patients. With such a small sample, the efficiency of the partial maximum likelihood estimates of the Cox’s proportional hazard model may not be optimal [4-6].

The relationship among hazard rate function h(t), survival function S(t), death density function f(t), and death distribution function F(t) is important in describing time-to-event probabilities of a population. If the hazard rate h(t) function is given, then the survival function S(t) is

and

F(t)=1-S(t) (3)

Accordingly, if one knows any of the functions, one could determine the remaining three.

## The cox proportional hazard model

Cox [7] considered a distribution-free approach that incorporates covariates, in which the main purpose of survival times is to keep track of the covariate information [4]. Assuming arbitrary baseline hazard, Cox proposed the hazard function

where X=(x1,x2,…,xp) is a vector of p-covariates, β=(β1, β2,…, βp) is the corresponding vector of unknown parameters, h0(t;θ) is the baseline hazard (a function of time only), and θ is the parameter vector of the baseline hazard function. Under Cox’s proportional hazard model, no assumption is made about the specific form of the baseline hazard function, and interest is on assessing the association between the survival times and covariate information. He estimated the covariate parameter vector β by maximum likelihood that is conditional on risk sets of instances of event occurrence.

Cox [7] noted that a major problem is the assessment of asymptotic relative efficiency of tests on covariate parameters under various assumptions about h0(t;θ). Some authors addressed this problem for specific forms of h0(t;θ) [4,5,8,9] . For a single covariate (i.e, p=1), Kalbfleisch [8] suggested that the exponential and Weibull forms for h0(t;θ) do not provide significantly greater efficiency than Cox’s procedure in most practical settings. Kalbfleisch’s results suggested that covariates and their coefficients affect the efficiency properties of Cox’s estimator. Efron [9] maintained that the full maximum likelihood estimator of β is asymptotically equivalent to Cox’s partial likelihood estimator of β—by showing that Fisher’s information for β (for p=1) using the entire data is asymptotically equivalent to that calculated from Cox’s model.

## Specific hazard function as the time-dependent component of the Cox’s proportional hazard model

Many authors proposed distribution-based methods that incorporate covariates in modeling time-to-event data; see for example [5,10-12].

Specifying h(t;X,θ,β)= h(t;θ) exp(X'β) and h(t;θ) to be a specific parametric hazard function yields a parametric model with the death density function f(t; θ, X, β)= h(t;θ,X,)S(t;θ,X,β),

This formulation enables parameters in the time-dependent component and covariate parameters of the hazard rate function to be estimated and attendant inference made using (full) maximum likelihood methods.

Many authors considered models with specific forms of h0(t;θ), the time-dependent component of the hazard function [4,5,12-15] under various assumptions. The exponential, Weibull, and Gompterz are examples of such models. Peace and Flora [4] and Peace [5] assessed the efficiency (size and power) of tests of hypotheses on covariate parameters by assuming the exponential, Weibull, Gompterz, and Rayleigh models (parametric with covariates) for the time-dependent component of Cox’s model using full maximum likelihood theory as compared to those from using Cox’s model and partial maximum likelihood method of estimation. They suggested that if interest is solely on β, choosing any of the considered parametric models as the first step in analyzing time-to-event data may not worth the effort; they recommended Cox’s method for analyses, except for small samples (n=25). They, however, recommended exponential model (when it fits the data) over Cox’s for assessing global null hypotheses. Their recommendations derive from the parametric models having, on average, larger power than Cox’s for small samples-particularly for tests concerning one covariate parameter.

We consider the Lindley model as an alternative model facilitating the analyses of time-to-event data with covariates. Such a consideration reflects an extension of [4] and [5] in comparing tests of hypotheses on time-to-event parameters from Cox’s Model and method of estimation with those from fully parametric methods for the Lindley-Cox model using simulation studies. Covariate information is incorporated using the Cox’s proportional hazard’s model with the Lindley model as the time-dependent component (referred to as the Lindley-Cox model). This mixture model is relatively recent and has not seen applications in the analyses of time-to-event data with covariates. Specifically, we (1) perform simulation studies to assess the size and power of tests on parameters arising from Maximum Likelihood Estimates (MLEs) of β in the Lindley-Cox model, and (2) contrast the results of size and power of tests on β arising from Lindley-Cox MLEs with those arising from Cox’s partial maximum likelihood estimator. In addition, we analyze a publicly available data set using the Lindley-Cox model and contrast our findings with results from other models.

## The Lindley model as the time-dependent component of the Cox’s PH

With the formulation in the peceding section, specifying the Lindley hazard and survival functions with covariate gives

and

S(t;X,θ,β)= exp(-exp(X’β)(θti-In(θti+θ+1)+In(θ+1))) , (7)

respectively, and the Lindley-Cox death density function:

${f}_{l}\left(\theta ,\beta \right)=\left[\frac{{\theta }^{2}\left(1+{t}_{i}\right)}{\theta +1+\theta {t}_{i}}\mathrm{exp}\left({X}_{i}^{\text{'}}\beta \right)\right]$

where θ represent the scale parameter of the Lindley distribution.

Denote the observed and censored event times by ti (i=1,2,…,d) and Tk (k=1,2,…,c=n-d), respectively. The log-likelihood for the Lindley-Cox model is

$\mathrm{ln}{L}_{l}\left(\theta ,\beta \right)=$$\left\{n\mathrm{ln}{\theta }^{2}+\sum _{i=1}^{d}\text{}\mathrm{ln}\left(1+{t}_{i}\right)-\sum _{i=1}^{d}\text{}\mathrm{ln}\left(\theta +1+\theta {t}_{i}\right)+\sum _{i=1}^{d}\text{}{X}_{i}^{\text{'}}\beta$$-\sum _{i=1}^{d}\left[\text{exp}\left({X}_{i}^{\text{'}}\beta \right)\left(\theta {t}_{i}-\mathrm{ln}\left(\theta {t}_{i}+\theta +1\right)+\mathrm{ln}\left(\theta +1\right)\right)\right]\right\}$

The MLEs of θ and β may be found iteratively. The asymptotic covariance matrix of the estimators is approximately

where β=(β1, β2,…, βp), and t=p+1.

## Test of hypotheses

The tests of hypotheses investigated are

H0: Lβ=β0 (11)

and

H0: βq0q, q=1,2,…,p, (12)

where βq and β0q are the qth components of β and β0, respectively, and L represents a matrix of coefficients for the linear hypotheses, and β0 is a vector of constants.

These tests of hypotheses may be achieved using asymptotic likelihood inference: the Wald, the score, and likelihood ratio test statistics, which are approximately low-order Taylor series expansion of each other [16]. These three test statistics are asymptotically equivalent with possible differences among them in finite samples; in which case, the likelihood ratio test is generally deemed most reliable. Since the likelihood ratio test and Wald test are more commonly used, these two tests are presented in this study.

Denote the Likelihood Ratio (LR) statistics by Λ. Let $\stackrel{}{\stackrel{\wedge }{V}}\left(\stackrel{}{\stackrel{\wedge }{\beta }}\right)$ be the estimated covariance matrix of $\stackrel{\wedge }{\beta }$ . The statistics, based on asymptotic properties of $\stackrel{\wedge }{\beta }$ (Given a non-singular covariance matrix), for testing the H0 in Equation 11 is

Under H0,$-2\mathrm{ln}\text{Λ}$ and T are asymptotically (AN) distributed as chi-square with r degrees of freedom, written $-2\mathrm{ln}\text{Λ~}{\chi }_{\left(r\right)}^{2}$ and $T\text{~}{\chi }_{\left(r\right)}^{2}$ where r represents the rank of r in the case of T and the number of parameters under the alternative hypothesis minus that under the null hypothesis in the case of LR. For the hypothesis in Equation 12, the test statistic in Equation 13 reduces to a univariate case

where ${\stackrel{}{\stackrel{\wedge }{\beta }}}_{q}$ represents the qth component of β, and $\stackrel{}{\stackrel{\wedge }{V}}\left({\stackrel{}{\stackrel{\wedge }{\beta }}}_{q}\right)$ is the qth diagonal element of $\stackrel{}{\stackrel{\wedge }{V}}\left({\stackrel{}{\stackrel{\wedge }{\beta }}}_{}\right)$${T}_{0q}\text{~}{\chi }_{\left(1\right)}^{2}$

## Generating Lindley-Cox times-to-event with covariates

Suppose Ui (i=1,2,…,n) are random numbers, times-to-event (ti) may be obtained by solving the following equation (Equation 15) of the cumulative hazard H(ti) for (ti) [4,5,17-20]

where h0 is the baseline hazard rate function, and

Ui is a uniform random variable, Xi=(x1,x2,…,xp) is a p-component row vector of covariates, and β=(β1, β2,…, βp) is the p parameter vector corresponding to X1. The ti’s are obtained by solving for ti in the nonlinear equation that results from replacing h0 in Equation 15 with the Lindley hazard function, given by

Times-to-event are generated by specifying the covariate vector Xi, which is fixed by design; the parameters (θ and β) used in generating the event times are chosen to mirror the data set to be analyzed by the authors. The number of components of Xi, p, is chosen to be 6, and the components of p are selected at random using the following conditions: 14<x1<70, 0<x2<99, x3=1,2, or 3 and x4=x5=x6=0 or 1 The dichotomous covariate information x4,x5 and x6 are selected such that the proportions of 1 in x4,x5 and x6 are approximately 20%, 40%, and 60%, respectively. Parameter values are chosen to be (θ=0.4, β1=1.3, β2=0.08, β3=0.6, β40.8, β5=0.4, β6=1.2).

It is assumed that a sample of N independent, observable survival times (t1,t2,…,tN) are the available information on a population; of those, d (d<=N) are the observed times of the event, and the remaining k=N-d are right censored. The censoring distribution is assumed to follow the Lindley distribution. As an assessment of the sensitivity of the model to the censoring distribution, another set of assessments was performed by simulating the censoring distribution according to a Linley-Cox distribution with different values of the hazard function and the results were similar to those presented. The same data sets are used in assessing size and power; however, in assessing power, new hypotheses produced by inducing 20% deviation of the parameter values are tested.

Size and power are relative to the null hypotheses–the null being the statement that the parameters are equal to the values that were used to simulate the data. Size is the probability of rejecting the null given the null is true. Power is the probability of rejecting the null given the null is false; adding 20% deviation to simulation parameters means null is false, hence a question of power. This percentage deviation stems from the fact that many clinical studies aim to detect a minimum of 20% difference between treatment and control groups.

In assessing size and power of tests of hypotheses, m=5,000 independent data sets are generated for each sample size (n=25,50,100,250,500,1,000). Newton Raphson method, with 10-8 convergence criteria, is used to obtain the maximum likelihood estimates prior to assessing the size and power of tests. Simulations and analyses are performed using SAS/STAT software version 9.4 of SAS system for Windows. Copyright 2011 SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA. Graphics are generated using R software 2.14.

The proportion ${\stackrel{}{\stackrel{\wedge }{\alpha }}}_{s}={T}_{s}\text{.}/m$ = (s=c,l) represents the estimated size of test or false positive rates, where Ts denotes the number of times the tests rejected H0 out of m replicates. (Tests from Cox’s and Lindley- Cox models are indexed by c and l, respectively). Tables 1 & 2 show the size of tests of hypotheses in percent (i.e., $100×{\stackrel{}{\stackrel{\wedge }{\alpha }}}_{s}$ ) for the tests of hypotheses described in section 4.2. Table 1 presents the results for complete data, whereas Table 2 presents the results for 20% censored data.