Adjusted Survival Tree Models for Genetic Association: Prognostic and Predictive Effects

Research Article

Austin Biom and Biostat. 2015;2(4): 1027.

Adjusted Survival Tree Models for Genetic Association: Prognostic and Predictive Effects

Wei Xu1,2*, Ryan Del Bel1,2, Isabelle Bairati³, Francois Meyer³ and Geoffrey Liu2,4

¹Department of Biostatistics, Princess Margaret Hospital, University Health Network, University of Toronto, Canada

²Ontario Cancer Institute, Princess Margaret Hospital, University Health Network, University of Toronto, Canada

³Laval University Cancer Research Center, Canada

4Division of Medical Oncology and Hematology, Princess Margaret Hospital, University Health Network, University of Toronto, Canada

*Corresponding author: Wei Xu, Department of Biostatistics, Princess Margaret Hospital, University Health Network, University of Toronto, Canada

Received: April 15, 2015; Accepted: July 23, 2015; Published: August 05, 2015

Abstract

A general method to create adjusted survival trees is developed. Prognostic survival trees have been used to automatically uncover complicated GxG and GxE interactions, however scientist soften want to uncover this structure while adjusting for confounding factors not of interest. Interaction survival trees automatically identify the best treatment choice for patients and area promising model to enable personalized medicine, but simulations to assess their performance on the high dimensional data found in personalized medicine have not been conducted. We develop a general framework to adjust for confounding factors in prognostic and interaction survival trees. These factors are numerous in practice and can include age, gender, study site in a randomized multicenter clinical trial, and the principal components of ancestry difference to control for population stratification in genetic studies. Extensive simulations show the performance of our methods to be well controlled under the null and are robust to large dimensional covariate spaces under the alternative. In a real data example, our adjusted interaction tree successfully identifies subgroups of head and neck cancer patients that respond positively to having antioxidant vitamins added to their treatment regime. Applications, guidelines for use, and areas for future research are discussed.

Keywords: Pruning; Splitting; MLE; Tree models

Introduction

Tree-based methods were first introduced by Morgan and Sonquist [1] and greatly extended by Breiman, Freidman, Olshen, and Stone [2]. The flexibility of tree-based methods are appealing as they can automatically detect complicated interactions, naturally handle missing data via surrogate splits [2], select covariates in the presence of high dimensional data, and can be easily extended by ensemble methods to create random forests [3]. Early work on the development of survival trees began shortly after the original classification and regression tree methods and is summarized by LeBlanc and Crowley [4]. Recent research has extended survival trees to more complicated situations, such as multivariate data [5], extended ensemble methods to survival trees such as in random survival forests [6], and introduced new splitting rules that partition the covariates pace based on interaction with a specified covariate [7].

Tree-based methods are particularly appealing for scientific studies as they automatically partition in the covariate space in a way that mimics a human’s natural decision making process. Indeed, survival trees have often been used to uncover complicated GxE and GxG interactions and classify the prognosis of patients [8]. However in these genetic studies clinician softens want to uncover the effects of a set of genetic and environmental factors of scientific interest while adjusting for the effect of confounders which are not of direct interest. Although such adjusted trees have been developed for continuous and binary out comes [9] they have not been developed for the timeto- event outcomes of ten found in cancer research.

Recently personalized medicine, which is the idea of giving the right treatment to the right person based on their genetic, clinical and demographic characteristics has become of increasing interest in the clinical community [10-13]. Developing statistical methods to help enable personalized medicine in cancer research has many challenges. They should be able to handle high dimensional genetic data, focus on prediction of best treatment rather than prognosis, adjust for clinical confounders, work on survival data, and be easily interpretable for clinical decision making. Common approaches to identify subgroups of patients who respond differently to treatment include sub group analysis, which is not statistically sound [14], and regression modeling, which cannot automatically identify complex subgroups [15]. Survival trees have been modified to partition the covariate space based on differences in response to treatment [7], however, they cannot adjust for confounding and simulations have not been done to assess their effectiveness on large scale genetic data found in personalized medicine.

In this article we develop a general framework to create survival trees that partition the covariates pace based on the effects of asset of genetic and environmental factors that are of scientific interest while adjusting for possible confounders which are not. Such confounders are numerous in practice and can include age, gender, study site in a multicenter randomized clinical trial, and the principal components of ancestry difference to control for population stratification. We apply this framework to prognostic and interaction survival trees. Here, prognostic survival trees are tree-based methods which generate rules to classify the prognosis of patients by partitioning them based on combinations of covariates; while interaction survival trees generate rules to classify the best treatment choice of patients by partitioning them based on combinations of covariates which have strong interaction with treatment.

Next we will introduce the basic structure of a recursive partitioning algorithm used to create trees. We then define our new splitting rules, pruning algorithm, and a method to choose the final tree. Extensive simulations are the presented to evaluate the performance of our innovative methods under several scenarios, including the high dimensional covariate space commonly found in personalized medicine. An application of our methods to a randomized clinical is performed, showing its significant clinical relevance. We then close with a discussion on the practical use and implementation of our methods, as well as on areas of further research.

Methods

Algorithm overview

Tree based models are created using a recursive partitioning algorithm. These algorithms usually consist of three parts: a splitting rule, a pruning algorithm, and a method for selecting the final tree. The splitting rule partitions the covariates pace ? into many groups. It is applied recursively until there are very few observations in each group, or a pre-specified maximal number of groups are created [2]. This partition can be represented as a tree T, with terminal nodes | T  ̃ | [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaG[email protected][email protected] corresponding to the partition of the covariate space ? into | T  ̃ | [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaG[email protected][email protected] subsets. This large tree usually over fits the data and will perform poorly out-of-sample. Thus a sub tree is chosen as the final tree. The space of all possible sub trees is large, and a pruning algorithm is used to efficiently search this space and find the optimal sub trees. The final sub tree is then selected either using a test set or a resampling technique [16].

Splitting

For simplicity, in this paper, we only consider the case of making binary splits to single covariates. A potential splits of a covariate c can then be characterized as follows. If c is binary, then s is the trivial partition of c. If c is continuous or ordinal, s can be any binary partition of c such that all elements in one partition are less than those in the other. If c is categorical then s can be any binary partition of the levels of c.

To partition a node h, find the splits such that some measure of improvement G(s,h) is maximized.

G=( s * ,h)= max s S h G(s,h) [email protected]@[email protected]@+=feaaguart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4raiabg2da9iaacIcacaWGZbWaaWbaaSqabeaacaGGQaaaaOGaaiilaiaadIgacaGGPaGaeyypa0ZaaCbeaeaaciGGTbGaaiyyaiaacIhaaSqaaiaadohacqGHiiIZcaWGtbWaaSbaaWqaa[email protected][email protected]

where Sh is the set of all binary splits that can be made at node h. If there is more than one terminal node to partition, then find the best splits * for each h∈H and split the node with the maximal improvement. In the case of ties randomly select one of the splits with maximal improvement.

Recall the standard survival data set-up with data for observation i of the form (yi,xii) where xi is the covariate vector for observation i and survival time yi is censored if δi= 0 and an event if δi= 1. The Cox model assumes that

λ(t|X)= λ 0 (t) e β ' X [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4UdWMaaiikaiaadshacaGG8bGaamiwaiaacMcacqGH9aqpcqaH7oaBdaWgaaWcbaGaaGimaaqabaGccaGGOaGaamiDaiaacMcac[email protected][email protected]

where x are covariates, λ(t|x) is the hazard at time t given x and λ0 is some baseline hazard function. The maximum likelihood estimate (MLE), β [email protected]@[email protected]@+=feaaguart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciG[email protected][email protected] , of the parameters β is found without specifying λ0 by maximizing the log-partial- likelihood

(β)= i=1 K δ i ( β ' X i log jR( t i ) exp ( β ' X j ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeS4eHWMaaiikaiabek7aIjaacMcacqGH9aqpdaaeWbqaaiabes7aKnaaBaaaleaacaWGPbaabeaakiaacIcacqaHYoGydaahaaWcbeqaaiaacEcaaaGccaWGybWaaSbaaSqaaiaadMgaaeqaaOGaeyOeI0IaciiBaiaac+gacaGGNbWaaabuaeaaciGGLbGaaiiEaiaacchaaSqaaiaadQgacqGHiiIZcaWGsbGaaiikaiaadshadaWgaaadbaGaamyAaaqabaWccaGGPaaabeqdcqGHris5aaWcbaGaamyAaiabg2da9iaaigdaaeaacaWGlbaaniabggHiLdGccaGGOaGaeqOSdi2a[email protected][email protected]

with respect to β where ti is the survival time for observation i and R(ti) is the set of observations i that are at risk at time ti.

Consider two Cox models, m0:log(λ)=β'0 x0 and m1:log(λ)=β'0x0

+β'1x1 is said to be nested in m1 and the Likelihood Ratio Test (LRT)

Statistic corresponding to the hypothesis test H01 = 0 is

2( m1 ( β m1 ) m0 ( β m0 )) H 0, n χ 2 rank(X1) [email protected]@[email protected]@+=feaaguart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGOmaiaacIcacqWItecBdaWgaaWcbaGaamyBaiaaigdaaeqaaOGaaiikamaawagabeWcbeqaaiabgEIizdqdbaGaeqOSdigaaOWaaSbaaSqaaiaad2gacaaIXaaabeaakiaacMcacqGHsislcqWItecBdaWgaaWcbaGaamyBaiaaicdaaeqaaOGaaiikamaawagabeWcbeqaaiabgEIizdqdbaGaeqOSdigaaOWaaSbaaSqaaiaad2gacaaIWaaabeaakiaacMcacaGGPaWaaCbiaeaacqWI8iIoaSqabeaacaWGibWaaSbaaWqaaiaaicdacaGGSaaabeaaliaad6gacqGHsgIRcqGHEisPaaGccqaHhpWydaahaaWcbeqaaiaaikdaaaGcdaWgaaWcb[email protected][email protected] we define two new splitting rules which can be used to create adjusted prognostic trees and adjusted interaction trees respectively.

Definition1.Ga(s,h) is the LRT statistic corresponding to H0s=0 in the Cox model log(λ) = βcxc +βsxs

Definition 2.Gai(s,h) is the LRT statistic corresponding to H0ts= 0 in the Cox model log(λ) = βcxc + βtxt + βsxs + βts(xt×xs)

Also recall the non-adjusted interaction survival tree split Gi(s,h) defined in [7] as the LRT statistic corresponding to H0: βts= 0 in the Cox model log(λ) = βtxt + βsxs + βts(xt×xs); Here xc is a vector of confounding variables,xsis an indicator of the potential splits, which is a binary partition of some covariate c, xt is a treatment with =2 levels, and xt×xs is the interactive term of the treatment and the splits. βs is the effect of the splits, βt is the effect of the treatment, βc is the effect of the confounders, and βts is the effect of the interaction of the treatment and the splits.Ga(s,h) is the split for the adjusted survival tree and Gai(s,h) is the split for the adjusted interaction survival tree. The best split can be interpreted as the one that creates the two child nodes with the most statistically significant adjusted difference in prognosis and response to treatment respectively.

Pruning

The split-complexity Ga(T ) can be defined as

Ga(T)=G(T)-a|S|

where S is the set of internal nodes of tree T,|S| is the cardinality of S, a=0 is the complexity parameter, and G(T), the goodness of split of tree T, is the sum of the split improvement statistics over the tree.

G(T)= hS G(h) [email protected]@[email protected]@+=feaaguart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4raiaacIcacaWGubGaaiykaiabg2da9maaqafabaGaam4raia[email protected][email protected]

Consider all possible sub trees of a large tree T0, although this space is large it is easy to see that when a=0 the entire tree T0 will have the largest split complexity, and when a is sufficiently large the null tree with no splits Tm will. Leblanc and Crowley [16] extend this argument and define a pruning algorithm based on splitcomplexity that efficiently finds the sub trees Tm < ... < Tk <...< T0 and corresponding complexity parameters ∞ > am > ... > ak > ...>a1>a0=0 such that Tk has the largest goodness-of-split of any sub tree for all ak = a<ak+1. They prove the theoretical properties of this algorithm directly, and as a special case of CART [2]. After building a large tree with one of our new splitting rules Ga or Gai we efficiently find the optimal sub trees by using this algorithm directly, letting G be the new splitting rule used to build the original tree.

Selection of the final tree

After finding the optimally pruned sub trees with the above algorithm, we may still wish to choose a final tree. Since the splits used to make a tree are adaptively chosen as the maximum of several potentially correlated LRT statistics, the split complexity Gac (T) is larger than would be expected if the splits were chosen a-priori. If we have a large sample we can get an ‘honest’ estimate of Gac (T) by using the following method.

First split the data into a training set and test set. Next build a large tree with the training set and find the optimal sub trees with the algorithm in the above section. Finally force the test set down each of the sub trees. The final tree is the one that maximizes Gac (T) where Gac (T) is calculated using the test set. We recommend using αc = 4. This roughly corresponds to the 0.05significance level of the split [4].

When the data cannot be split into training and test set we propose choosing the final tree with a 5-fold cross validation based method. First build a large tree with the full data and find the optimal sub trees using the algorithm from section 2.3. Toper form the 5-fold cross validation first partition the observations into 5folds Lj,j=(1,...,5) and build 5 trees T(-j) on samples L(-j). For each ak and T(-j) find the optimal sub tree T(-j), k and force Lj on T(-j), k obtaining trees Tj, k. For each Tj, k calculates the goodness of split G(Tj,k) and take the mean over the folds to get G(T.,k). Find k*=max k Gac (T.,k) and if Gac(T.,k*)>0 the final tree is Tk*, otherwise it is the null tree with no nodes.

Simulation

For our two new splitting methods, Ga(s,h) and Gai(s,h), we simulated our recursive partitioning algorithm under the true tree structure corresponding to the ‘null hypothesis’ of no splits, with one split, and with splits at position one and two. These tree structures can be seen in Figure 1. Failure times were simulated from the exponential distribution and censoring times from a uniform (0,γ) distribution. γ was chosen to have approximately 20% censoring. Three set s of covariates, xc,xtandxswere generated.xcwas a set of four potential confounders, two of which were truly associated with the outcome.xtwas a balanced binary treatment. By defaultxswas 100 binary variables used to build the tree. All unassociated xs were assigned a random proportion, while associatedxshad proportion 0.5. Under the simulations of the null tree and tree with one split n was 500, and under the simulation with two splits it was 1000. For the prognostic tree the effect size of the true split ts e ts β [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaab[email protected][email protected] was set to 2 and for the interaction tree the effect size ts e ts β [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaab[email protected][email protected] of the interactive term of true split and treatment was set to 3.5. When simulating two associated nodes, the properties of the first node were fixed while the second node varied. We also ran simulations of the non-adjusted splitting rule Gi(s,h) on data with no confounders and compared the results with Gai(s,h). Under each setting large trees of size 10 were built, the optimal sub trees were found, and a final tree was selected with 1000 replications.