Austin Biom and Biostat.2015;2(1): 1015.
Chris B. Guure*
Abstract
Survival Analysis Methods are commonly used to analyze clinical trial
data. In most clinical studies, the time until the occurrence of an event is the
main outcome of significance. Clinical trials are conducted to assess the worth
of new treatment regimens. The major events that the trial subjects seek to
determine are either death, development of an undesirable reaction, relapse
from remission, or the progress of a new disease entity. In order to model timeto-
event data or clinical trials data, a parametric distribution can be assumed.
We have in this study assumed that the data follow a log-logistic distribution.
To estimate the parameters of this lifetime distribution, the Bayesian estimation
approach is considered under the assumption of informative (gamma) priors
as well as the frequentist estimation method. The Bayes estimators cannot
be obtained in close forms; therefore, approximate Bayesian estimates are
computed using the idea of Lindley. The clinical trial data considered in this
study is either randomly or non-informatively censored. These types of data
occur when each subject has a censoring time that is statistically independent
of their failure times. A simulation study is carried out and also three different
sets of real data have been analyzed in order to examine our methods. The
Bayesian methods are considered under squared error and linear exponential
loss functions.
Keywords: Bayesian Inference; Maximum likelihood; Squared Error and
LINEX Loss Functions
Introduction
The log-logistic survival model is a lifetime distributional
model which can be used as an alternative to the well-known and
used Weibull distribution in lifetime or clinical trials data analysis.
The shape parameter of the log-logistic distribution performs
similar functions as that of the Weibull distribution. It is important
that sometimes we model the survival or clinical trial data using a
distribution that has a non-monotone hazard rate. According to [1],
when the shape parameter is say p > 1, the hazard function becomes
unimodal and when p = 1, the hazard decreases monotonically. The
fact that the cumulative distribution function can be written in closed
form unlike the lognormal distribution makes it useful for analyzing
survival data. The loglogistic model has the distribution, density and
survival functions respectively as
where p is the shape parameter and θ the scale parameter.
The log logistic distribution is a continuous probability
distribution which has non-negative random variables, hence, it can
be used in survival analysis as a parametric model for events whose
rate increases initially and decreases consequently, For instance, mortality of cancer patients following diagnoses or treatments. See
for instance, [2-5].
According to [6], the log logistic distribution has been shown
to be a suitable model in analyzing survival or clinical data was
considered by Cox, Cox and Oakes, Bennet and others. [7], employed
the log logistic distribution on lung cancer data and in their study,
they estimated the mortality ratio at which it reached a maximum
level. They determined the parameters of the log logistic model by
making use of maximum likelihood estimate and bootstrap methods
and observed the proximity of the results. A study conducted by [8],
on the spread of HIV virus in San Francisco between 1978 and 1986
indicated that, the log logistic model was most suitable among other
models to use with half censored data.
Under random or non-informative censoring, sample of say
n elements are followed for a specified time say, T, the number of
elements that is experiencing the event is considered to be random,
but the entire length of study is fixed. Since the time is fixed, there
are certain practical advantages with regards to designing a follow-up
study. In a straightforward overview of this scheme, which is known
as fixed time censoring, each element has a maximum inspection time
say Ti, for i = 1 ,…, n, which may possibly vary from one situation
to another. S(t)represents the probability that a unit i will be alive
at the end of the inspection time. Consider an experiment where
we start with an observation of 50 cancer patients that have died or
survived at the specified time. The survival of the patients may be due
to withdrawal, inadequate monitoring mechanism or deaths which
are not related to the purpose of the study.
Maximum Likelihood Estimator (MLE) has been used frequently
in determining the parameters of most of the lifetime distributions
such as Weibull, lognormal, generalized exponential and others.
Some of the works can be found in [11], they studied, generalized
exponential distribution: Bayesian estimations. Other estimation
procedures related to the above were considered by [12]. Determined
the Bayes estimates of the reliability function and the hazard rate of
the Weibull failure time distribution by employing squared error
loss function [13]. Applied Bayesian to the parameter and reliability
estimate of Weibull failure time distribution [14], studied the
approximate Bayesian estimates for the Weibull reliability function
and hazard rate from censored data by employing a new method
that has the potential of reducing the number of terms in Lindley’s
approximation procedure. Others include; [15-20].
The main objective of this study is to apply the Bayesian
estimator’s procedure using Lindley’s approximation method with
two loss functions for the unknown parameters of the log logistic
distribution against the classical maximum likelihood estimator
with different sample sizes and parameter values using simulation
study. Since both parameters of the distribution are non-negative, we
assume that both take on the gamma prior distributions which are
not necessarily the conjugate priors for the parameters.
Maximum Likelihood Estimation
Consider a set of n independently and identically distributed
random pairs of (ti,δi), where ti= min (Xi,Ti) and δi=I(Xi=Ti) indicating
whether the observation is censored or not for I = 1, 2,…, n. in an
independent random censored model, it is assumed that the survival
time Xi and the censoring time Ti are independent and from the same
distribution. The score vectors are
where the score becomes a vector of the first partial derivatives of
(θ, p). When using maximum likelihood to estimate unknown
parameters that cannot be obtained in close form, one always requires
that an iterative (eg, Newton-Raphson) procedure be implemented,
such that, one can consider evaluating MLEs of
with a trial value
say α0 using a first order Taylor series as
(1)
Setting the left hand side of equation (1) to zero and solving for we have
(2)
where H(a0) is the Hessian matrix and h(a0) the score vector.
Considering the two parameters of the log logistic distribution,
the Hessian matrix can be obtained as follows for the parameters
estimates. The score vector of
(3)
(4)
From above, the partial derivatives for both θ and p is
(5)
Where are easy to obtain. Equations (3), (4) and
(5) can be substituted into equation (2), from which an iterative
procedure could be implemented to obtain the parameter estimates
under maximum likelihood.
Bayesian Inference of the Unknown Parameters
In this section, we consider Bayesian inference of the unknown
parameters of the log logistic distribution. In order to employ the
Bayesian methods, a prior needs to be defined. A prior is simply one’s
knowledge or an expert’s opinion on the parameters being estimated.
We have little prior information for all the parameters being
estimated, and so we want our data information to dominate the prior
distribution by assuming reasonably non-informative priors for all
the parameters in this model. It is assumed that the two parameters
follow a vague Gamma (a, b) and Gamma (c, d) prior distributions.
These prior models are chosen because both the scale and shape
parameters of the log logistic distribution are non-negative.
π&sub>1(θ)aθ a-1 exp (-θb),θ > 0 (6)
π&sub>1(p)ap c-1 exp (-pd),θ > 0 (7)
The Bayesian posterior distribution based on which inferences
are drawn is
(8)
Squared-error Loss
The squared error loss is the loss incurred by adapting action say,
when the true value is say, a.
In other words, it implies the cost obtained by replacing the actual
value of the parameter with the parameter estimate. Let the Bayesian
estimator say, βse be the posterior mean. If u (θ, p) is considered as the
function of interest, then:
(9)
Note; the function of interest in our study is the loss function
which measures the distribution parameters of θ and p. It is observed
that equation (9) cannot be computed explicitly even if we take
some specific priors on the parameters, as a result [21] proposed
an approximation procedure to compute the ratio of two integrals
similar to equation (9). The approximation procedure is adopted in
this paper.
Lindley Approximation
The posterior Bayes estimator of an arbitrary function u(a) given
by [21] is
Where l(a) is the log-likelihood and ω(a), v(a) are arbitrary
functions of a. We assume that v(a) is the prior distribution for
and ω(a)= u(a).v(a) with u(a) being some function of interest. The
posterior expectation according to [12] is
(11)
Where ρ(a)=log{v(a)}.
An asymptotic expansion of Lindley’s approach of equation (11)
according to [18] is
(12)
where l stands for the log-likelihood function.
Considering the Bayesian estimator via Lindley, the following
are obtained with u1,u11 and u2,u22 representing the first and second
derivatives of θ and ρ respectively under the squared error loss which
is referred to as the posterior mean.
Let l20 and l30 represent the second and third derivatives of the loglikelihood
function with respect to the scale parameter θ, then
If we let l02 and l03 represent the second and third derivatives of
the log-likelihood function with respect to the shape parameter p, we
will have
Linear exponential loss function
This loss function measures the degree of overestimation and
underestimation of the parameters being examined. Let k represent
the shape parameter of the LINEX loss function. Refer to [13] for the
posterior expectation of the LINEX loss function. The Bayes estimator
of a function u=u[exp(-kθ),exp(-kp)] under LINEX is given as
With Lindley’s approach, u1,u11 and u2,u22 are the first and second
derivatives for θ and p respectively under the linear exponential loss
function, hence
Real Data Analysis
Example 1
The data for this example are from survival of patients with
cervical cancer recruited to a randomised clinical trial that was
aimed at analysing the effect of an addition of a radio sensitizer to
radiotherapy (New therapy- “treatment B”) compared to using
radiotherapy alone (Control - “treatment A”). Treatment A and B
were given to 16 and 14 patients respectively. The data are in days
since the start of the study, the event of interest was death caused by
this cancer. Our interest is on patients under treatment A to illustrate
the proposed methods in this paper. The data is obtained from [22],
and asterisked observations are censored.
Using the iterative procedure suggested in this paper and basing
on comparison criterion on standard errors as well as their average
confidence/credible lengths, we have for the MLEs of
and
to
be 770.5429 and 1.90488 with their corresponding standard errors as
48.15893 and 0.11906 respectively. Since we do not have any prior
information on the hyper-parameters, we assume a = b = c = d =
0.0001. The Bayes estimators under squared error loss for
and
have respectively the following parameters estimates and standard
errors, 770.5429, 1.90206 and 48.15893, 0.11888.
Computing the Bayes estimates of
and and that of their
standard errors via the linear exponential loss function with a loss
parameter of 0:7 we have, 859.7094, 1.78586 and 53.73182, 0.11162.
With the loss parameter of 0:7, we have, 909.4092, 1.82677 and
56.83807, 0.11417 respectively.
What has been observed here is, both the maximum likelihood
and Bayes under squared error loss function have the same scale
parameter estimates and standard errors which are smaller than that
of Bayes under the linear exponential loss function. For the shape
parameter, Bayes under LINEX loss function with the loss parameter
of 0.7 has the smallest standard error. This implies that overestimation
is more serious than underestimation.
Considering a 95% confidence interval under MLE, we have
=
(679.1514,864.9344) and that of = (1.67153, 2.13823). The Bayesian
credible intervals via the squared error loss function for
and are
(679.1514, 864.9344) and (1.66906, 2.13506) respectively. The Bayes
credible intervals with respect to the linear exponential loss function
with a loss parameter of 0.7 for
and are (754.3950, 965.2380) and
(1.56709, 2.00463) and that of the 0:7 are (798.0065, 1020.8120) and
(1.60299, 2.05055) respectively.
Observing from above, LINEX loss function with a positive loss
parameter had narrower credible intervals as compared to squared
error loss function and maximum likelihood for the shape parameter.
For the scale parameter, maximum likelihood’s confidence interval
and Bayes credible interval with squared error loss were narrower
than Bayes using LINEX.
Example 2
In this example, we analyse another data set which is considered
moderate to obtain the parameter estimates and their standard errors
in order to compare the methods employed in this paper. The data
shown in Table 6, Example 2, are obtained from [22] and refer to
remission times, in weeks, for a group of 30 patients with leukaemia
who received similar treatment. Asterisks denote censoring times.