Some Genetic Regression Models for Multiple Quantitative End Points Data

Research Article

Austin Biom and Biostat. 2016; 3(1): 1029.

Some Genetic Regression Models for Multiple Quantitative End Points Data

Ao Yuan* and Jaeil Ahn

Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, USA

*Corresponding author: Ao Yuan, Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, Washington D.C. 20057, USA

Received: May 28, 2015; Accepted: January 28, 2016; Published: February 08, 2016

Abstract

Multiple endpoints data are common in practice. There are various statistical methods for the analysis of this type of data, however, genetic models for familial observations with multiple endpoints data are relatively few, and the existing methods are basically variations of the Elston-Stewart algorithm. Here we consider several joint statistical models for such data with quantitative measurements with a new algorithm, which is computationally more efficient than the existing method. The proposed method is detailed in some commonly used parametric, semi parametric and nonparametric settings for this type of data. For un-genotyped data, the commonly used models are the mixture and variance components models. We elaborate how these genetic models can be extended for multiple endpoints data with the proposed method

Keywords: Censoring; Endpoints data; Familial structure; Genotype; Missing observation

Introduction

Endpoints data are observed responses from patients of some pre-specified clinical events of interests, such as death, loss of vision, occurrences of certain diseases, or other symptomatic events. In medical research, study participants are often followed for a long time, during which some participants may drop out early, so that random censorship may be present in the data. Such data have missing observations, which may be inhomogeneous across the patients. For example, in one patient we have observations on the lung cancer and kidney disease, and on another we have observations on lung cancer, diabetes and asthma. Analyses of such data largely fall into two categories: hypothesis testing (usually non-model based) and model inference. Here we concentrate on the model inference of such data.

For censored data modeling there are extensive literatures [1-8], just mention a few. For multiple endpoints data analyses, there are various statistical methods [9-13], for example. Wei and Glidden [14] provided an overview for some of the methods in this field.

Family genetic data differ from the ordinary data in that they are collected in familial units, often with varying structures and sizes, and with/without genotyping. These features make the models distinct. The key in the modeling is the familial dependence structure and the implementation of the genetic mechanism, existing methods are basically variations of the Elston-Stewart algorithm, which is a multi-level mixture model, and the computation is often challenging. Genetic models for multiple endpoints data are relatively limited. Here we consider some statistical joint models for such data with quantitative measurements with a new algorithm, which is a one-level mixture model, thus enhance the computation considerably. The parametric method is used when one has some confidence about the model specification. The semi parametric method can be used when there is not enough information about the full parametric model specification. The nonparametric method is used for the robustness of least model assumptions. We elaborate our methods for the parametric, semi parametric and nonparametric cases. The methods we describe below are valid for arbitrary pedigrees; however, in this article we focus on the simpler case of nuclear family for illustration.

In genetic analysis, the data contains genotypes, partially genotypes, or no genotypes. However, even if the data are genotyped, it is still of interest to know whether there are some other unknown gene(s) behind the response functioning. There are reports that with added unknown gene locus, the likelihood Akaike information reduced (e.g., [15], p.1091, [16], which makes sense, as correct parameter(s) added to the model will reduce its AIC), or the segregation analysis guided to some other gene(s) which deserve(s) further investigation. So even for the genotyped data, a segregation model is still of importance. It is also the general model including the genotyped data case. In the following we derive some commonly used regressive models for this case, including the parametric model, semiparametric proportional hazards model, nonparametric least squares model, variance components model and the competing risks model. Also, hypothesis testing on parameters of interest can be conducted using the likelihood ratio statistics based on the parametric models. Our aim here is to present several new parametric, semi parametric and nonparametric models for this familial data, and thus we focus derivations of the basic forms of these models. Implementations of these models and applications to real data will follow in our future work.

Methods

Suppose there are d responses observed with some clinical events of interests, along with r covariates, for each member in a family. We concentrate on nuclear family structure for simplicity. In practice we only observe a subset of the responses and covariates for each patient in a family. Let yi=(yif,yim,yis)(i=1,…,n) be the vector of responses for the i-th nuclear family, with corresponding covariates xi=(xif,xim,xis) (i=1,…,n). Here yif a dif(<d) dimensional vector of responses of the father in the i-th family, which belongs to a dif dimensional subspace of the d-dimensional space, with a response non-missing indicator vector Iif and covariate non-missing indicator vector Jif. Similarly, yim denotes a vector of responses of the mother in the i-th family and yis=(yi1,…,yibi) is for offspring with each of the yij s has the same data structure as that for yif. For example, there are three responses to be observed, we have the first and third on the father, then Iif=(1,0,1), and dif=|Iif|=2 is its dimension or cardinality. If there are total of five covariates in the design, and we may only have the first, second, fourth covariates for the father, then Jif=(1,1,0,1,0), and rif=|Jif|=3 is its dimension. We assume random censorship. Let δif=(δif1,…, δifdif) be the censoring indicator of yif, i.e. δifj=1 if yifj is uncensored, and δifj=0 otherwise. Similar notations are used for the mother. For the off springs, yis = (yi1,...,yki) denote the response vector, with yij be dij dimensional observation for the jth sib, with response configuration Iij and covariate configuration jif, (j=1,…,ki). Let di=(dif,dim,dis), Ii=(Iif,Iim,Iis), Ji=(Jif,Jim,Jis), δi=(δifimis) and δi=(Cif,Cim,Cis). The complete data information consists of Zi=(yi,xi,Ii,Jii), (i=1,…,n). Let χ(⋅) be the indicator function, i.e. χ(gir=s)=1 if the genotype of the rth individual is s and zero otherwise. Let πs be the population proportion of the S-th genotype, t(s|i,k) be the transmission probability of a off string’s genotype s given the parents’ genotype (i,k), and θ be the collection of all the parameters, including the as and βs in the mean and the parameters in the within and between individual covariance matrices and the genotype frequencies πks and transmission probabilities t(s|i,k)s. With unobserved genotypes, the computation is a serious challenge because of the mixture nature of the model. Let f(yr|θ), F(yr|θ), and S(yr|θ)=1-F(yr|θ) be the density function, distribution function and survival function of Yr, respectively. They may be described by its genotype gr through the mean function specification. To simplify model specification, we assume random mating so that father and mother can be viewed as independent in most cases. The case of non-random mating or within parent’s dependence can be treated similarly with more involved notations.

Note that there is within family dependence but independence among different families. We assume the genotypes of each patient are unobserved; the case of observed genotypes is automatically covered and simpler. Now we describe the methods in some common settings below.

Parametric model

Let the genotypes at the locus of interest be coded as 1,…,k. We first consider the case of no missing record and censoring. The regressive model assumes yir=μ(gir)+∈ir, (i,=1,…,n; r=f,m,1,…,n),

where μ(gir)=μ0+αχ(gir)+βxir is the mean phenotypic value, α=(α1,…,αk)', χ(gir)=(χ(gir=1),…,χ(gir=k), μ0 is the intercept vector,. The residual error term ir is a d dimensional random vector where they are independent across i but dependent across r.

In the case of multivariate familial quantitative response data, under the commonly used Elston-Stewart [15] algorithm or its variants, the likelihood of the observation yi for the i-th nuclear family is

L i ( y i |θ)= k ( π k f( y if |θ, g if =k) j ( π j f( y im |θ, g im =j)K(θ,k,j) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaaBaaaleaacaWGPbaabeaakiaaiIcacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaaGiFaiabeI7aXjaaiMcacaaI9aWaaabuaeqaleaacaWGRbaabeqdcqGHris5aOGaaGikaiabec8aWnaaBaaaleaacaWGRbaabeaakiaadAgacaaIOaGaamyEamaaBaaaleaacaWGPbGaamOzaaqabaGccaaI8bGaeqiUdeNaaGilaiaadEgadaWgaaWcbaGaamyAaiaadAgaaeqaaOGaaGypaiaadUgacaaIPaWaaabuaeqaleaacaWGQbaabeqdcqGHris5aOGaaGikaiabec8aWnaaBaaaleaacaWGQbaabeaakiaadAgacaaIOaGaamyEamaaBaaaleaacaWGPbGaamyBaaqabaGccaaI8bGaeqiUdeNaaGilaiaadEgadaWgaaWcbaGaamyAaiaad2gaaeqaaOGaaGypaiaadQgac[email protected][email protected]

× l=1 b i r t(r|k,j)f( y il |θ, g if =k, g im =j, g il =r))),(1) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey41aq7aaebCaeqaleaacaWGSbGaaGypaiaaigdaaeaacaWGIbWaaSbaaeaacaWGPbaabeaaa0Gaey4dIunakmaaqafabeWcbaGaamOCaaqab0GaeyyeIuoakiaadshacaaIOaGaamOCaiaaiYhacaWGRbGaaGilaiaadQgacaaIPaGaamOzaiaaiIcacaWG5bWaaSbaaSqaaiaadMgacaWGSbaabeaakiaaiYhacqaH4oqCcaaISaGaam4zamaaBaaaleaacaWGPbGaamOzaaqabaGccaaI9aGaam4AaiaaiYcacaWGNbWaaSbaaSqaaiaadMgacaWGTbaabeaakiaai2dacaWGQbGaaGilaiaadEgadaWgaaWcbaGaamyAaiaadYgaaeqaaOG[email protected][email protected]

where typically the density is assumed multivariate normal with covariate matrix Σ, f(⋅|θ,k) is the density for residual ∈i with genotype gi in the mean vector specification, f(yl|θ,i,j,r) is the density for residual r with genotype gr and with adjusted mean and variance given by

μ(gil=r)-ΩΣ-1[(yif-μ(gf=k))+(ym-μ(gm=j))] and Σ-ΩΣ-1Ω

where Σ=Cov(Yl,Yl) and Ω= Cov(Yl,Ym); K(θ,I,j) is a quantity that depends on the parents’ genotypes and the mean [18]. It is well known that when the number of genotypes is relatively large, this model is computationally inefficient [19]. Proposed a computational more efficient model. In light of [19], let Gf=Gm=(π1,…,πI), then the joint likelihood for the i-th family is written as

L i ( y i |θ)=( k π k f( y if |θ, g if =k))( j π j f( y im |θ, g im =j)) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaaBaaaleaacaWGPbaabeaakiaaiIcacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaaGiFaiabeI7aXjaaiMcacaaI9aGaaGikamaaqafabeWcbaGaam4Aaaqab0GaeyyeIuoakiabec8aWnaaBaaaleaacaWGRbaabeaakiaadAgacaaIOaGaamyEamaaBaaaleaacaWGPbGaamOzaaqabaGccaaI8bGaeqiUdeNaaGilaiaadEgadaWgaaWcbaGaamyAaiaadAgaaeqaaOGaaGypaiaadUgacaaIPaGaaGykaiaaiIcadaaeqbqabSqaaiaadQgaaeqaniabggHiLdGccqaHapaCdaWgaaWcbaGaamOAaaqabaGccaWGMbGaaGikaiaadMhadaWgaaWcbaGaamyAaiaad2gaaeqaaOGaaGiFaiabeI7aXjaaiYcacaWG[email protected][email protected]

K(θ) l=1 b i r T(r)f( y il |G,θ, g il =r),(2) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4saiaaiIcacqaH4oqCcaaIPaWaaebCaeqaleaacaWGSbGaaGypaiaaigdaaeaacaWGIbWaaSbaaeaacaWGPbaabeaaa0Gaey4dIunakmaaqafabeWcbaGaamOCaaqab0GaeyyeIuoakiaadsfacaaIOaGaamOCaiaaiMcacaWGMbGaaGikaiaadMhadaWgaaWcbaGaamyAaiaadYgaaeqaaOGaaGiFaiaadEeacaaISaGaeqiUdeNaaGilaiaadEgadaWgaaWcbaGaamyAa[email protected][email protected]

where K(θ)= i=1 k j=1 k K(θ,i,j) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4saiaaiIcacqaH4oqCcaaIPaGaaGypamaaqadabeWcbaGaamyAaiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOWaaabmaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGRbaaniabggH[email protected][email protected]

T(r)=T(r| G f , G m )= i=1 k s=1 k t(r|i,s)P( g f =i)P( g m =s)= i=1 k s=1 k t(r|i,s) π i π s , [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamivaiaaiIcacaWGYbGaaGykaiaai2dacaWGubGaaGikaiaadkhacaaI8bGaam4ramaaBaaaleaacaWGMbaabeaakiaaiYcacaWGhbWaaSbaaSqaaiaad2gaaeqaaOGaaGykaiaai2dadaaeWbqabSqaaiaadMgacaaI9aGaaGymaaqaaiaadUgaa0GaeyyeIuoakmaaqahabeWcbaGaam4Caiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaamiDaiaaiIcacaWGYbGaaGiFaiaadMgacaaISaGaam4CaiaaiMcacaWGqbGaaGikaiaadEgadaWgaaWcbaGaamOzaaqabaGccaaI9aGaamyAaiaaiMcacaWGqbGaaGikaiaadEgadaWgaaWcbaGaamyBaaqabaGccaaI9aGaam4CaiaaiMcacaaI9aWaaabCaeqaleaacaWGPbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGcdaaeWbqabSqaaiaadohacaaI9aGaaGymaaqaaiaadUgaa0GaeyyeIuoakiaadshacaaIOaGaamOCaiaaiYhacaWGPbGaaGilaiaadohacaaIPaGaeqiWda3aa[email protected][email protected]

In the mean specification f(yl|G,θ,r) is the density of residual ∈l with genotype gl=r with adjusted mean and variance given by

μ( g il =r) Ω p Σ p 1 ( y p μ p ) and Σ Ω p Σ p 1 Ω p , [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd0MaaGikaiaadEgadaWgaaWcbaGaamyAaiaadYgaaeqaaOGaaGypaiaadkhacaaIPaGaeyOeI0IafuyQdCLbauaadaWgaaWcbaGaamiCaaqabaGccqqHJoWudaqhaaWcbaGaamiCaaqaaiabgkHiTiaaigdaaaGccaaIOaGaamyEamaaBaaaleaacaWGWbaabeaakiabgkHiTiabeY7aTnaaBaaaleaacaWGWbaabeaakiaaiMcacaaIGaGaaGiiaiaaiccacaaIGaGaaGjcVlaadggacaWGUbGaamizaiaayIW7caaIGaGaaGiiaiaaiccacaaIGaGaeu4OdmLaeyOeI0IafuyQdCLbauaadaWgaaWcbaGaamiCaaqabaGccqqHJoWudaqhaaWcbaGaamiCaaq[email protected][email protected]

where yp=(yf,ym),μp=(μfm), μ f = i=1 k π i μ f ( g f =i) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd02aaSbaaSqaaiaadAgaaeqaaOGaaGypamaaqadabeWcbaGaamyAaiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaeqiWda3aaSbaaSqaaiaadMgaaeqaaOGaeqiVd02aaSbaaSqaaiaadAgaaeqa[email protected][email protected] similarly for ; μm

Ω p =( Ω Ω ) Σ p =( Σ Ω Ω Σ ). [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeuyQdC1aaSbaaSqaaiaadchaaeqaaOGaaGypamaabmaabaqbaeqabiqaaaqaaiabfM6axbqaaiabfM6axbaaaiaawIcacaGLPaaacaaIGaGaaGiiaiaaiccacaaIGaGaeu4Odm1aaSbaaSqaaiaadchaaeqaaOGaaGypamaabmaabaqbaeqabiGaaaqaaiabfo6atbq[email protected][email protected]

In comparison, model (1) has three layers of mixing (summation) corresponding to bik3 function evaluations that grow exponentially with the number of genotypes. On the other hand, model (2) has only one layer of mixing in three factors each, or (bi+2)k function evaluations that are linearly proportional the number of genotypes. The reduction of computation will be more significant for multiple loci case.

Here we extend this model in the case of censoring and partial observation. In this case, the mean is modeled as

μ( g ir )= I ir ( μ 0 +αχ( g ir )+β) J ir x ir ,0.6cm    (3) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd0MaaGikaiaadEgadaWgaaWcbaGaamyAaiaadkhaaeqaaOGaaGykaiaai2dacaWGjbWaaSbaaSqaaiaadMgacaWGYbaabeaakiablMPiLjaaiIcacqaH8oqBdaWgaaWcbaGaaGimaaqabaGccqGHRaWkcqaHXoqycqaHhpWycaaIOaGaam4zamaaBaaaleaacaWGPbGaamOCaaqabaGccaaIPaGaey4kaSIaeqOSdiMaaGykaiaadQeadaWgaaWcbaGaamyAaiaadkhaaeqaaOGaeSyMIuMaamiEamaaBaaaleaacaWGPbGaamOCaaqabaGccaaISaGaeyOeI0IaaGimaiaai6cacaaI2aG[email protected][email protected]

where the operation I ir μ 0 [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBaaalea[email protected][email protected] means the projection of μ0 onto the subspace corresponding to the nonzero elements of Iir, similarly for I ir αχ( g ir ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBaaaleaacaWGPbGaamOCaaqabaGccqWIzkszcqaHXoq[email protected][email protected] and J ir x ir [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamOsamaaBaaaleaacaWG[email protected][email protected] . The corresponding error is now I ir ε ir [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBaaaleaacaWGP[email protected][email protected] .

Recall that in the case of 1-dimensional observation without genetic implementation, the likelihood for an observation yi with a censoring indicator di is

L i ( y i |θ)=f ( y i |θ) δ i S ( y i |θ) 1 δ i . [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaaBaaaleaacaWGPbaabeaakiaaiIcacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaaGiFaiabeI7aXjaaiMcacaaI9aGaamOzaiaaiIcacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaaGiFaiabeI7aXjaaiMcadaahaaWcbeqaaiabes7aKnaaBaaabaGaamyAaaqabaaaaOGaam4uaiaaiIcacaWG5bWaaSbaaSqaaiaadMgaaeqaaOGaaGiFaiabeI7aXjaaiMcadaahaaWc[email protected][email protected]

To extend this to our situation, for any dimension indicator Ii, and any d-variable function v(⋅), let I i v() [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysa[email protected][email protected] be the marginal version of v(⋅) with respect to the non-zero entry of Ii and δ i I i v()= δ i ( I i v()). [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiTdq2aaSbaaSqaaiaadMgaaeqaaOGaamysamaaBaaaleaacaWGPbaabeaakiablMPiLjaadAhacaaIOaGaeyyXICTaaGykaiaai2dacqaH0oazdaWgaaWcbaGaamyAaaqabaGccqWIzkszcaaIOaGaamysamaaBaaaleaacaW[email protected][email protected] Let 1-di be the indicator with the same length of di but with 0 and 1 reversed. The full likelihood is

L(z|θ)= i=1 n ( j=1 k π j δ if I if f( y if |θ,j))( j=1 k π j (1 δ if ) I if S( y if |θ,j)) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitaiaaiIcacaWG6bGaaGiFaiabeI7aXjaaiMcacaaI9aWaaebCaeqaleaacaWGPbGaaGypaiaaigdaaeaacaWGUbaaniabg+GivdGccaaIOaWaaabCaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGccqaHapaCdaWgaaWcbaGaamOAaaqabaGccqaH0oazdaWgaaWcbaGaamyAaiaadAgaaeqaaOGaamysamaaBaaaleaacaWGPbGaamOzaaqabaGccqWIzkszcaWGMbGaaGikaiaadMhadaWgaaWcbaGaamyAaiaadAgaaeqaaOGaaGiFaiabeI7aXjaaiYcacaWGQbGaaGykaiaaiMcacaaIOaWaaabCaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGccqaHapaCdaWgaaWcbaGaamOAaaqabaGccaaIOaGaaGymaiabgkHiTiabes7aKnaaBaaaleaacaWGPbGaamOzaaqabaGccaaIPaGaamysamaaBaaaleaacaWGPbGaamOzaaqabaGccqWIzkszcaWGtbGaaGikaiaadMhadaWgaaWcbaGa[email protected][email protected]

×( j=1 k π j δ im I im f( y im |θ,j))( j=1 k π j (1 δ im ) I im S( y im |θ,j)) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey41aqRaaGikamaaqahabeWcbaGaamOAaiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaeqiWda3aaSbaaSqaaiaadQgaaeqaaOGaeqiTdq2aaSbaaSqaaiaadMgacaWGTbaabeaakiaadMeadaWgaaWcbaGaamyAaiaad2gaaeqaaOGaeSyMIuMaamOzaiaaiIcacaWG5bWaaSbaaSqaaiaadMgacaWGTbaabeaakiaaiYhacqaH4oqCcaaISaGaamOAaiaaiMcacaaIPaGaaGikamaaqahabeWcbaGaamOAaiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaeqiWda3aaSbaaSqaaiaadQgaaeqaaOGaaGikaiaaigdacqGHsislcqaH0oazdaWgaaWcbaGaamyAaiaad2gaaeqaaOGaaGykaiaadMeadaWgaaWcbaGaamyAaiaad2gaaeqaaOGaeSyMIuMaam4uaiaaiIcacaWG5bWaaSbaaSqaaia[email protected][email protected]

× j=1 b i ( r=1 k T(r) δ ij I ij f( y j |G,θ,r))( r=1 k T(r)(1 δ ij ) I ij S( y j |G,θ,r)).(4) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey41aq7aaebCaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGIbWaaSbaaeaacaWGPbaabeaaa0Gaey4dIunakiaaiIcadaaeWbqabSqaaiaadkhacaaI9aGaaGymaaqaaiaadUgaa0GaeyyeIuoakiaadsfacaaIOaGaamOCaiaaiMcacqaH0oazdaWgaaWcbaGaamyAaiaadQgaaeqaaOGaamysamaaBaaaleaacaWGPbGaamOAaaqabaGccqWIzkszcaWGMbGaaGikaiaadMhadaWgaaWcbaGaamOAaaqabaGccaaI8bGaam4raiaaiYcacqaH4oqCcaaISaGaamOCaiaaiMcacaaIPaGaaGikamaaqahabeWcbaGaamOCaiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaamivaiaaiIcacaWGYbGaaGykaiaaiIcacaaIXaGaeyOeI0IaeqiTdq2aaSbaaSqaaiaadMgacaWGQbaabeaakiaaiMcacaWGjbWaaSbaaSqaaiaadMgacaWGQbaabeaakiablMPiLjaadofacaaIOaGaamyEamaaBaaaleaacaWGQbaabeaakiaaiYhacaWGhbGaaGilaia[email protected][email protected]

Here extra caution should be taken since the observation vector from each individual may vary in dimensions and sub-spaces. For a d-dimensional vector v, let δ ij I ij v [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiTdq2aaSbaaSqaaiaadMgacaWGQ[email protected][email protected] be its margin with respect to ; and for a d-dimensional matrix A, δ ij I ij A I il δ il [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiTdq2aaSbaaSqaaiaadMgacaWGQbaabeaakiaadMeadaWgaaWcbaGaamyAaiaadQgaaeqaaOGaeSyMIuMaamyqaiablMPiLjaadMeadaWgaaWc[email protected][email protected] denote the submatrix by the rows corresponding to the non-zero entry of dijIij and columns corresponding to the non-zero entry of dilIil. In particular, δ ij I ij A I il δ il [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiTdq2aaSbaaSqaaiaadMgacaWGQbaabeaakiaadMeadaWgaaWcbaGaamyAaiaadQgaaeqaaOGaeSyMIuMaamyqaiablMPiLjaadMeadaWgaaWc[email protected][email protected] has adjusted mean given by

δ ij I ij μ( g ij =r)( δ ij I ij Ω p I ip δ ip )( δ ip I ip Σ p 1 I ip δ ip )( δ ip I ip ( y p μ p )), [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiTdq2aaSbaaSqaaiaadMgacaWGQbaabeaakiaadMeadaWgaaWcbaGaamyAaiaadQgaaeqaaOGaeSyMIuMaeqiVd0MaaGikaiaadEgadaWgaaWcbaGaamyAaiaadQgaaeqaaOGaaGypaiaadkhacaaIPaGaeyOeI0IaaGikaiabes7aKnaaBaaaleaacaWGPbGaamOAaaqabaGccaWGjbWaaSbaaSqaaiaadMgacaWGQbaabeaakiablMPiLjqbfM6axzaafaWaaSbaaSqaaiaadchaaeqaaOGaeSyMIuMaamysamaaBaaaleaacaWGPbGaamiCaaqabaGccqaH0oazdaWgaaWcbaGaamyAaiaadchaaeqaaOGaaGykaiaaiIcacqaH0oazdaWgaaWcbaGaamyAaiaadchaaeqaaOGaamysamaaBaaaleaacaWGPbGaamiCaaqabaGccqWIzkszcqqHJoWudaqhaaWcbaGaamiCaaqaaiabgkHiTiaaigdaaaGccqWIzkszcaWGjbWaaSbaaSqaaiaadMgacaWGWbaabeaakiabes7aKnaaBaaaleaacaWGPbGaamiCaaqabaGccaaIPaGaaGikaiabes7aKnaaBaaaleaacaWGPbGaamiCaaqabaGccaWGjbWaaSbaaSqaaiaadMgacaWGWbaabeaakiablMPiLjaaiIcacaWG5bWaaSbaaSqaaiaadchaaeqaaOG[email protected][email protected]

and adjusted variance matrix given by

δ ij I ij Σ I ij δ ij ( δ ij I ij Ω p I ip δ ip )( δ ip I ip Σ p 1 I ip δ ip )( δ ip I ip Ω p I ij δ ij ), [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiTdq2aaSbaaSqaaiaadMgacaWGQbaabeaakiaadMeadaWgaaWcbaGaamyAaiaadQgaaeqaaOGaeSyMIuMaeu4OdmLaeSyMIuMaamysamaaBaaaleaacaWGPbGaamOAaaqabaGccqaH0oazdaWgaaWcbaGaamyAaiaadQgaaeqaaOGaeyOeI0IaaGikaiabes7aKnaaBaaaleaacaWGPbGaamOAaaqabaGccaWGjbWaaSbaaSqaaiaadMgacaWGQbaabeaakiablMPiLjqbfM6axzaafaWaaSbaaSqaaiaadchaaeqaaOGaeSyMIuMaamysamaaBaaaleaacaWGPbGaamiCaaqabaGccqaH0oazdaWgaaWcbaGaamyAaiaadchaaeqaaOGaaGykaiaaiIcacqaH0oazdaWgaaWcbaGaamyAaiaadchaaeqaaOGaamysamaaBaaaleaacaWGPbGaamiCaaqabaGccqWIzkszcqqHJoWudaqhaaWcbaGaamiCaaqaaiabgkHiTiaaigdaaaGccqWIzkszcaWGjbWaaSbaaSqaaiaadMgacaWGWbaabeaakiabes7aKnaaBaaaleaacaWGPbGaamiCaaqabaGccaaIPaGaaGikaiabes7aKnaaBaaaleaacaWGPbGaamiCaaqabaGccaWGjbWaaSbaaSqaaiaadMgacaWGWbaabeaakiablMPiLjabfM6axnaaBaaaleaacaWGWbaabeaakiablMPiLjaadMeadaWgaaWcbaGaamyAaiaadQg[email protected][email protected]

where Iip=(Iif,Iim). The corresponding adjustment in (1 δ ij ) I ij S( y j |G,θ,r) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGikaiaaigdacqGHsislcqaH0oazdaWgaaWcbaGaamyAaiaadQgaaeqaaOGaaGykaiaadMeadaWgaaWcbaGaamyAaiaadQgaaeqaaOGaeSyMIuMaam4uaiaaiIcacaWG5bWaaSbaaSqaaia[email protected][email protected] is made. The parameter θ is estimated by its MLE θ ^ [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0x[email protected][email protected] under (3), along with the restriction j=1 k π j =1 [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabmaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGR[email protected][email protected]

Semiparametric model

For censored data, a commonly used semi parametric regression model is Cox’s proportional hazards model [20,21]. In the univariate case, let y(3)<y(4)<…<y(n) be the ordered observations of y1,…,yn (assume no ties for simplicity), x(i) and d(i) be the associated quantities, for y(i), of the xi’s and di’s. Let R(i) be the i-th risk set, the set of all individuals who are still under study at the ‘time’ just prior to y(i), U be the set of all uncensored individuals, and

λ(y|x,θ)= f(y|x,θ) 1F(y|x,θ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4UdWMaaGikaiaadMhacaaI8bGaamiEaiaaiYcacqaH4oqCcaaIPaGaaGypamaalaaabaGaamOzaiaaiIcacaWG5bGaaGiFaiaadIhacaaISaGaeqiUdeNaaGykaaqaaiaaigdacqGHs[email protected][email protected]

be the hazard function. The proportional hazards model has a form of λ(y/x,θ)=h(β'x)λ0(y)-0.2cm for some known positive function h(⋅), and unspecified baseline hazard rate λ0(⋅), which implies that the distribution belongs to the Lehmann family [4] 1-F=(1-F0) for some F0(⋅) and γ>0. Under these assumptions, the conditional likelihood (partial likelihood [20,21]; marginal rank likelihood, [4]) is

L c (y|θ)= iU h( β x (i) ) j R (i) h( β x (j) ) , [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaaBaaaleaacaWGJbaabeaakiaaiIcacaWG5bGaaGiFaiabeI7aXjaaiMcacaaI9aWaaebuaeqaleaacaWGPbGaeyicI4Saamyvaaqab0Gaey4dIunakmaalaaabaGaamiAaiaaiIcacuaHYoGygaqbaiaadIhadaWgaaWcbaGaaGikaiaadMgacaaIPaaabeaakiaaiMcaaeaadaaeqbqabSqaaiaadQgacqGHiiIZcaWGsbWaaSbaaeaacaaIOaGaamyAaiaaiMcaaeqaaaqab0GaeyyeIuoakiaadIgacaaIOaGafqOSdiMbauaa[email protected][email protected]

where the estimate of θ is the MLE θ ^ [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0x[email protected][email protected] under Lc(y|θ). The optimality property of θ ^ [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0x[email protected][email protected] is studied extensively. In the case of multivariate observations, various extensions of this method have focused on each marginal distribution and Markov chain Monte Carlo on the margins [22]. Proposed a multivariate extension of the proportional hazards model, or frailty model, which is equivalent to an exponential specification of the joint survival function [23]. Proposed a class of multivariate failure time distributions, including a multivariate version of Cox’s proportional hazards model, in which the within family dependence is modeled by a common latent variable with a known parametric distribution given that all the family members are independent. Then the joint distribution is obtained by taking expectation of the conditional one. All these frailty models assume that there is a shared common dependent latent variable. This assumption basically requires that the distribution be interchangeable among the involved individuals. This is reasonable for some familial data but not generally true. Other existing multivariate proportional hazards models [24-26] are similar in nature. Here we model the within family dependence in a manifest way to be desirable for our genetic analysis. We adopt a successive conditional version of the proportional hazards model where we assume a special semi parametric form of the survival function in order to evaluate the conditioning in closed form easily. More specifically, in our multivariate proportional hazards model, we assume h(⋅) and λ0(⋅) are functions of d-variates each. Let yi,(3),…,yi,(ni) be the ordered observations on the i-th variable (i=1,…,d), define xi,(j),di,(j),Ri,(j), and Ui accordingly. Note there are structures in h(⋅) through the dependent effects among the covariates. Recall β'x is a d-vector, let

h( β x)= e 1 2 ( β x ) Ω β x , [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiAaiaaiIcacuaHYoGygaqbaiaadIhacaaIPaGaaGypaiaadwgadaahaaWcbeqaaiabgkHiTmaalaaabaGaaGymaaqaaiaaikdaaaGaaGikaiqbek7aIzaa[email protected][email protected]

where Ω is the within individual covariance matrix. Then h(⋅) behaves as a d-variate normal density, and its marginal and conditional versions are well defined and in closed forms, although it is not a proper density function. We need the successive ‘conditioning’ form of h(⋅) to apply the proportional hazards method. Specifically, let wij be the j-th diagonal element of Ω where Ωj be the upperleft j-dimensional sub-matrix of it, aj be the first j elements in the j-th column; [β'x]j be the first j components of β'x,hj+1|j(⋅), be the conditional version of covariates [β'x]j+1 given [β'x]j. Then hj+1|j(β'x) is a univariate normal kernel with mean a j Ω j 1 ( β x) j [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeyOeI0IabmyyayaafaWaaSbaaSqaaiaadQgaaeqaaOGaeuyQdC1aa0baaSqaaiaadQgaaeaacqGHsislcaaIXaa[email protected][email protected] and variance ω jj a j Ω j 1 a [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqyYdC3aaSbaaSqaaiaadQgacaWGQbaabeaakiabgkHiTiqadggagaqbamaaBaaaleaacaWGQbaa[email protected][email protected] , and

h( β x)= h 1|0 ([ β x ] 1 ) j=1 d1 h j+1|j ([ β x ] j+1 ).(5) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiAaiaaiIcacuaHYoGygaqbaiaadIhacaaIPaGaaGypaiaadIgadaWgaaWcbaGaaGymaiaaiYhacaaIWaaabeaakiaaiIcacaaIBbGafqOSdiMbauaacaWG4bGaaGyxamaaBaaaleaacaaIXaaabeaakiaaiMcadaqeWbqabSqaaiaadQgacaaI9aGaaGymaaqaaiaadsgacqGHsislcaaIXaaaniabg+GivdGccaWGObWaaSbaaSqaaiaadQgacqGHRaWkcaaIXaGaaGiFaiaadQgaaeqaaOGaaGikaiaaiUfacuaHYoGygaqbaiaadIhacaaIDbWaaSbaaSqaaia[email protected][email protected]

Thus, without mixing over gene, for singleton multivariate observations, the joint conditional likelihood is

L c (z|θ)= i U 1 h([ β x (i) ] 1 ) l R 1,(i) h([ β x (l) ] 1 ) j=1 d1 i U j h j+1|j ([ β x (i) ] j ) l R j,(i) h j+1|j ([ β x (l) ] j ) . [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaaBaaaleaacaWGJbaabeaakiaaiIcacaWG6bGaaGiFaiabeI7aXjaaiMcacaaI9aWaaebuaeqaleaacaWGPbGaeyicI4SaamyvamaaBaaabaGaaGymaaqabaaabeqdcqGHpis1aOWaaSaaaeaacaWGObGaaGikaiaaiUfacuaHYoGygaqbaiaadIhadaWgaaWcbaGaaGikaiaadMgacaaIPaaabeaakiaai2fadaWgaaWcbaGaaGymaaqabaGccaaIPaaabaWaaabuaeqaleaacaWGSbGaeyicI4SaamOuamaaBaaabaGaaGymaiaaiYcacaaIOaGaamyAaiaaiMcaaeqaaaqab0GaeyyeIuoakiaadIgacaaIOaGaaG4waiqbek7aIzaafaGaamiEamaaBaaaleaacaaIOaGaamiBaiaaiMcaaeqaaOGaaGyxamaaBaaaleaacaaIXaaabeaakiaaiMcaaaWaaebCaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGKbGaeyOeI0IaaGymaaqdcqGHpis1aOWaaebuaeqaleaacaWGPbGaeyicI4SaamyvamaaBaaabaGaamOAaaqabaaabeqdcqGHpis1aOWaaSaaaeaacaWGObWaaSbaaSqaaiaadQgacqGHRaWkcaaIXaGaaGiFaiaadQgaaeqaaOGaaGikaiaaiUfacuaHYoGygaqbaiaadIhadaWgaaWcbaGaaGikaiaadMgacaaIPaaabeaakiaai2fadaWgaaWcbaGaamOAaaqabaGccaaIPaaabaWaaabuaeqaleaacaWGSbGaeyicI4SaamOuamaaBaaabaGaamOAaiaaiYcacaaIOaGaamyAaiaaiMcaaeqaaaqab0GaeyyeIuoakiaadIgadaWgaaWcbaGaamOAaiabgUcaRiaaigdacaaI8bGaamOAaaqabaGccaaIOaGaaG4waiqbek7aIzaafaGaamiEamaaBaaaleaacaaIOaGaamiBa[email protected][email protected]

Now for the case of nuclear family, inspired by (4), we assume h(⋅) has the form

h( μ i )= j=1 k π j h I if (μ( g j , I if )) j=1 k π j h I im (μ( g j , I im )) l=1 b i j=1 k T(j) h I ij (μ( g j , I ij )),(6) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiAaiaaiIcacqaH8oqBdaWgaaWcbaGaamyAaaqabaGccaaIPaGaaGypamaaqahabeWcbaGaamOAaiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaeqiWda3aaSbaaSqaaiaadQgaaeqaaOGaamiAamaaBaaaleaacaWGjbWaaSbaaeaacaWGPbGaamOzaaqabaaabeaakiaaiIcacqaH8oqBcaaIOaGaam4zamaaBaaaleaacaWGQbaabeaakiaaiYcacaWGjbWaaSbaaSqaaiaadMgacaWGMbaabeaakiaaiMcacaaIPaWaaabCaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGccqaHapaCdaWgaaWcbaGaamOAaaqabaGccaWGObWaaSbaaSqaaiaadMeadaWgaaqaaiaadMgacaWGTbaabeaaaeqaaOGaaGikaiabeY7aTjaaiIcacaWGNbWaaSbaaSqaaiaadQgaaeqaaOGaaGilaiaadMeadaWgaaWcbaGaamyAaiaad2gaaeqaaOGaaGykaiaaiMcadaqeWbqabSqaaiaadYgacaaI9aGaaGymaaqaaiaadkgadaWgaaqaaiaadMgaaeqaaaqdcqGHpis1aOWaaabCaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGccaWGubGaaGikaiaadQgacaaIPaGaamiAamaaBaaaleaacaWGjbWaaSbaaeaacaWGPbGaamOAaaqabaaabeaakiaaiIcacqaH8oqBcaaIOaGaam4zamaaBaaaleaacaWGQbaabeaakiaaiYcacaWGjbWaaSbaaSqaa[email protected][email protected]

Treat h(⋅) as a ‘density’. Recall μi=(μifimi1,…,μibi). The conditioning i]j+1|[μi]j can be applied component-wise, i.e.

i]j+1|[μi]j=([μif]j+1|[μif]j, [μim]j+1|[μim]j, [μi1]j+1|[μi1]j,…, [μibi]j+1|[μibi]j)

Now we have

h j+1|j ([ μ i ] j+1 )=( j=1 k π j h I if ,1|0 ([μ( g j , I if )] 1 ) j=1 | I if |1 h I if ,j+1|j ([μ( g j , I if )] j )) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiAamaaBaaaleaacaWGQbGaey4kaSIaaGymaiaaiYhacaWGQbaabeaakiaaiIcacaaIBbGaeqiVd02aaSbaaSqaaiaadMgaaeqaaOGaaGyxamaaBaaaleaacaWGQbGaey4kaSIaaGymaaqabaGccaaIPaGaaGypaiaaiIcadaaeWbqabSqaaiaadQgacaaI9aGaaGymaaqaaiaadUgaa0GaeyyeIuoakiabec8aWnaaBaaaleaacaWGQbaabeaakiaadIgadaWgaaWcbaGaamysamaaBaaabaGaamyAaiaadAgaaeqaaiaaiYcacaaIXaGaaGiFaiaaicdaaeqaaOGaaGikaiaaiUfacqaH8oqBcaaIOaGaam4zamaaBaaaleaacaWGQbaabeaakiaaiYcacaWGjbWaaSbaaSqaaiaadMgacaWGMbaabeaakiaaiMcacaaIDbWaaSbaaSqaaiaaigdaaeqaaOGaaGykamaarahabeWcbaGaamOAaiaai2dacaaIXaaabaGaaGiFaiaadMeadaWgaaqaaiaadMgacaWGMbaabeaacaaI8bGaeyOeI0IaaGymaaqdcqGHpis1aOGaamiAamaaBaaaleaacaWGjbWaaSbaaeaacaWGPbGaamOzaaqabaGaaGilaiaadQgacqGHRaWkcaaIXaGaaGiFaiaadQgaaeqaaOGaaGikaiaaiUfacqaH8oqBcaaIOaGaam4zamaaBaaaleaacaWGQbaabeaakiaaiYcacaWGjbWaaSbaaSqaaiaadMgacaWG[email protected][email protected]

×( j=1 k π j h I im ,1|0 ([μ( g j , I im )] 1 ) j=1 | I im |1 h I if ,j+1|j ([μ( g j , I im )] j )) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey41aqRaaGikamaaqahabeWcbaGaamOAaiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaeqiWda3aaSbaaSqaaiaadQgaaeqaaOGaamiAamaaBaaaleaacaWGjbWaaSbaaeaacaWGPbGaamyBaaqabaGaaGilaiaaigdacaaI8bGaaGimaaqabaGccaaIOaGaaG4waiabeY7aTjaaiIcacaWGNbWaaSbaaSqaaiaadQgaaeqaaOGaaGilaiaadMeadaWgaaWcbaGaamyAaiaad2gaaeqaaOGaaGykaiaai2fadaWgaaWcbaGaaGymaaqabaGccaaIPaWaaebCaeqaleaacaWGQbGaaGypaiaaigdaaeaacaaI8bGaamysamaaBaaabaGaamyAaiaad2gaaeqaaiaaiYhacqGHsislcaaIXaaaniabg+GivdGccaWGObWaaSbaaSqaaiaadMeadaWgaaqaaiaadMgacaWGMbaabeaacaaISaGaamOAaiabgUcaRiaaigdacaaI8bGaamOAaaqabaGccaaIOaGaaG4waiabeY7aTjaaiIcacaWGNbWaaSbaaSqaaiaadQgaaeqaaOGaaGilaiaadMeadaWgaaWcbaGaamyAaiaad[email protected][email protected]

× l=1 b i ( j=1 k T(j) h I il ,1|0 ([μ( g j , I il )] 1 ) j=1 | I il |1 h I il ,j+1|j ([μ( g j , I il )] j )).(7) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey41aq7aaebCaeqaleaacaWGSbGaaGypaiaaigdaaeaacaWGIbWaaSbaaeaacaWGPbaabeaaa0Gaey4dIunakiaaiIcadaaeWbqabSqaaiaadQgacaaI9aGaaGymaaqaaiaadUgaa0GaeyyeIuoakiaadsfacaaIOaGaamOAaiaaiMcacaWGObWaaSbaaSqaaiaadMeadaWgaaqaaiaadMgacaWGSbaabeaacaaISaGaaGymaiaaiYhacaaIWaaabeaakiaaiIcacaaIBbGaeqiVd0MaaGikaiaadEgadaWgaaWcbaGaamOAaaqabaGccaaISaGaamysamaaBaaaleaacaWGPbGaamiBaaqabaGccaaIPaGaaGyxamaaBaaaleaacaaIXaaabeaakiaaiMcadaqeWbqabSqaaiaadQgacaaI9aGaaGymaaqaaiaaiYhacaWGjbWaaSbaaeaacaWGPbGaamiBaaqabaGaaGiFaiabgkHiTiaaigdaa0Gaey4dIunakiaadIgadaWgaaWcbaGaamysamaaBaaabaGaamyAaiaadYgaaeqaaiaaiYcacaWGQbGaey4kaSIaaGymaiaaiYhacaWGQbaabeaakiaaiIcacaaIBbGaeqiVd0MaaGikaiaadEgadaWgaaWcbaGaamOAaaqabaGccaaISaGaamysamaaBaaaleaacaWGPbGaamiBaaqabaGccaaIPaGaaGyxamaaB[email protected][email protected]

In (7), [μ(gj,Iir)]1, means the first component of μ(gj,Iir) in Iir, and |Iir| denote its cardinality (r=f,m,1,…,bi). Now, the conditional likelihood is

L c (z|θ)= i U 1 h([ μ (i) ] 1 ) l R 1,(i) h([ μ (i) ] 1 ) j=1 d1 i U j h j+1|j ([ μ (i) ] j ) l R j,(i) h j+1|j ([ μ (i) ] j ) ,(8) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaaBaaaleaacaWGJbaabeaakiaaiIcacaWG6bGaaGiFaiabeI7aXjaaiMcacaaI9aWaaebuaeqaleaacaWGPbGaeyicI4SaamyvamaaBaaabaGaaGymaaqabaaabeqdcqGHpis1aOWaaSaaaeaacaWGObGaaGikaiaaiUfacqaH8oqBdaWgaaWcbaGaaGikaiaadMgacaaIPaaabeaakiaai2fadaWgaaWcbaGaaGymaaqabaGccaaIPaaabaWaaabuaeqaleaacaWGSbGaeyicI4SaamOuamaaBaaabaGaaGymaiaaiYcacaaIOaGaamyAaiaaiMcaaeqaaaqab0GaeyyeIuoakiaadIgacaaIOaGaaG4waiabeY7aTnaaBaaaleaacaaIOaGaamyAaiaaiMcaaeqaaOGaaGyxamaaBaaaleaacaaIXaaabeaakiaaiMcaaaWaaebCaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGKbGaeyOeI0IaaGymaaqdcqGHpis1aOWaaebuaeqaleaacaWGPbGaeyicI4SaamyvamaaBaaabaGaamOAaaqabaaabeqdcqGHpis1aOWaaSaaaeaacaWGObWaaSbaaSqaaiaadQgacqGHRaWkcaaIXaGaaGiFaiaadQgaaeqaaOGaaGikaiaaiUfacqaH8oqBdaWgaaWcbaGaaGikaiaadMgacaaIPaaabeaakiaai2fadaWgaaWcbaGaamOAaaqabaGccaaIPaaabaWaaabuaeqaleaacaWGSbGaeyicI4SaamOuamaaBaaabaGaamOAaiaaiYcacaaIOaGaamyAaiaaiMcaaeqaaaqab0GaeyyeIuoakiaadIgadaWgaaWcbaGaamOAaiabgUcaRiaaigdacaaI8bGaamOAaaqabaGccaaIOaGaaG4waiabeY7aTnaaBaaaleaacaaIOaGaamyAaiaaiMcaaeqaaOGaaGyxa[email protected][email protected]

where hj+1|j([μ(i)]j is given by (7). The MLE θ ^ [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0x[email protected][email protected] of θ is obtained under (8).

Nonparametric model

For univariate censored data, [27,28] considered a class of estimators, including the weighted least squares estimators, for censored data. Here the weights are determined by the ordered statistics of the observations and the associated censoring indicators, and are derived from the empirical survival function, i.e., the Kaplan- Meier product limit estimator [29-32]. Formulated the multivariate Kaplan-Meier estimator. Using the product integral, the mathematical expressions are quite involved. So instead of choosing the weights according to the multivariate Kaplan-Meier estimator, we use the nonparametric locally weighted least squares method, also called locally linear regression smoothers [33,34]. Let Y and X be the d and J-dimensional random vectors corresponding to the full observation and the covariates for an individual. Let μ(x)=E(Y|X=x) denote the regression function. In the univariate observation case, the locally linear estimator μ ^ (x) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGa[email protected][email protected] of μ(x) is first to find a ^ [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0[email protected][email protected] and b ^ [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0[email protected][email protected] to minimize

i=1 n ( y i ab(x x i )) 2 K( x x i h n ), [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabCaeqaleaacaWGPbGaaGypaiaaigdaaeaacaWGUbaaniabggHiLdGccaaIOaGaamyEamaaBaaaleaacaWGPbaabeaakiabgkHiTiaadggacqGHsislcaWGIbGaaGikaiaadIhacqGHsislcaWG4bWaaSbaaSqaaiaadMgaaeqaaOGaaGykaiaaiMcadaahaaWcbeqaaiaaikdaaaGccaWGlbGaaGikamaalaaabaGaamiEaiabgkHiTiaadIhadaWgaaWcbaGaamy[email protected][email protected]

Where K(⋅) is a kernel function, hn is the bandwidth, and μ ^ (x)= a ^ [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaa[email protected][email protected] . In our case, keep the notations in section 1. We choose the kernel to be the J-dimensional standard normal density f(⋅), and (⋅) to be its distribution function. To simplify the expression of the likelihood, let

y ˜ ifr = y if I if μ(r), x ˜ if = J if (x x if ), x if =(1 J if )(x x if ), [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGabmyEayaaiaWaaSbaaSqaaiaadMgacaWGMbGaamOCaaqabaGccaaI9aGaamyEamaaBaaaleaacaWGPbGaamOzaaqabaGccqGHsislcaWGjbWaaSbaaSqaaiaadMgacaWGMbaabeaakiablMPiLjabeY7aTjaaiIcacaWGYbGaaGykaiaaiYcacaaMf8UabmiEayaaiaWaaSbaaSqaaiaadMgacaWGMbaabeaakiaai2dacaWGkbWaaSbaaSqaaiaadMgacaWGMbaabeaakiablMPiLjaaiIcacaWG4bGaeyOeI0IaamiEamaaBaaaleaacaWGPbGaamOzaaqabaGccaaIPaGaaGilaiaaywW7ceWG4bGbaqbadaWgaaWcbaGaamyAaiaadAgaaeqaaOGaaGypaiaaiIcacaaIXaGaeyOeI0IaamOsamaaBaaaleaacaWGPbGaamOzaaqabaGccaaIPaGaeSyMIuMaaGikaiaadIh[email protected][email protected]

and similarly for imr y ˜ imr [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaama[email protected][email protected] , y ˜ ijr [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaama[email protected][email protected] , x ˜ im [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaa[email protected][email protected] , x im [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaa[email protected][email protected] , x ˜ ij [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaa[email protected][email protected] , and x ˜ ij [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaa[email protected][email protected] . To estimate μ ^ (x) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGa[email protected][email protected] , inspired from the univariate locally linear estimator and (4), we first find ( μ ^ 0 (x), α ^ , β ^ , π ^ ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGikaiqbeY7aTzaajaWaaSbaaSqaaiaaicdaaeqaaOGaaGikaiaadIhacaaIPaGaaGila[email protected][email protected] to minimize

i=1 n ( r=1 k π r 2 y ˜ ' ifr ( I if Ω I if ) y ˜ ifr J if φ( x ˜ if h n )(1(1 J if )Φ( x if h n ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabCaeqaleaacaWGPbGaaGypaiaaigdaaeaacaWGUbaaniabggHiLdGccaaIOaWaaabCaeqaleaacaWGYbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGccqaHapaCdaqhaaWcbaGaamOCaaqaaiaaikdaaaGcceWG5bGbaGaacaaINaWaaSbaaSqaaiaadMgacaWGMbGaamOCaaqabaGccaaIOaGaamysamaaBaaaleaacaWGPbGaamOzaaqabaGccqWIzkszcqqHPoWvcqWIzkszcaWGjbWaaSbaaSqaaiaadMgacaWGMbaabeaakiaaiMcaceWG5bGbaGaadaWgaaWcbaGaamyAaiaadAgacaWGYbaabeaakiaadQeadaWgaaWcbaGaamyAaiaadAgaaeqaaOGaeSyMIuMaeqOXdOMaaGikamaalaaabaGabmiEayaaiaWaaSbaaSqaaiaadMgacaWGMbaabeaaaOqaaiaadIgadaWgaaWcbaGaamOBaaqabaaaaOGaaGykaiaaiIcacaaIXaGaeyOeI0IaaGikaiaaigdacqGHsislcaWGkbWaaSbaaSqaaiaadMgacaWGMbaabeaakiaaiMcacqWIzkszcqqHMoGrcaaIOaWaaSaaaeaaceWG4bGbaqbadaWgaaWcbaGaamy[email protected][email protected]

+ r=1 k π r 2 y ˜ ' imr ( I im Ω I im ) y ˜ imr J im φ( x ˜ im h n )(1(1 J im )Φ( x im h n ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey4kaSYaaabCaeqaleaacaWGYbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGccqaHapaCdaqhaaWcbaGaamOCaaqaaiaaikdaaaGcceWG5bGbaGaacaaINaWaaSbaaSqaaiaadMgacaWGTbGaamOCaaqabaGccaaIOaGaamysamaaBaaaleaacaWGPbGaamyBaaqabaGccqWIzkszcqqHPoWvcqWIzkszcaWGjbWaaSbaaSqaaiaadMgacaWGTbaabeaakiaaiMcaceWG5bGbaGaadaWgaaWcbaGaamyAaiaad2gacaWGYbaabeaakiaadQeadaWgaaWcbaGaamyAaiaad2gaaeqaaOGaeSyMIuMaeqOXdOMaaGikamaalaaabaGabmiEayaaiaWaaSbaaSqaaiaadMgacaWGTbaabeaaaOqaaiaadIgadaWgaaWcbaGaamOBaaqabaaaaOGaaGykaiaaiIcacaaIXaGaeyOeI0IaaGikaiaaigdacqGHsislcaWGkbWaaSbaaSqaaiaadMgacaWGTbaabeaakiaaiMcacqWIzkszcqqHMoGrcaaIOaWaaSaaaeaaceWG4bGbaqbadaWgaaWcbaGaamy[email protected][email protected]

+ j=1 b i r=1 k T 2 (r) y ˜ ' ijr ( I ij Ω I ij ) y ijr J ij φ( x ˜ ij h n | x ˜ p )(1(1 J ij )Φ( x ij h n | x p ))  (9) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey4kaSYaaabCaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGIbWaaSbaaeaacaWGPbaabeaaa0GaeyyeIuoakmaaqahabeWcbaGaamOCaiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaamivamaaCaaaleqabaGaaGOmaaaakiaaiIcacaWGYbGaaGykaiqadMhagaacaiaaiEcadaWgaaWcbaGaamyAaiaadQgacaWGYbaabeaakiaaiIcacaWGjbWaaSbaaSqaaiaadMgacaWGQbaabeaakiablMPiLjabfM6axjablMPiLjaadMeadaWgaaWcbaGaamyAaiaadQgaaeqaaOGaaGykaiaadMhadaWgaaWcbaGaamyAaiaadQgacaWGYbaabeaakiaadQeadaWgaaWcbaGaamyAaiaadQgaaeqaaOGaeSyMIuMaeqOXdOMaaGikamaalaaabaGabmiEayaaiaWaaSbaaSqaaiaadMgacaWGQbaabeaaaOqaaiaadIgadaWgaaWcbaGaamOBaaqabaaaaOGaaGiFaiqadIhagaacamaaBaaaleaacaWGWbaabeaakiaaiMcacaaIOaGaaGymaiabgkHiTiaaiIcacaaIXaGaeyOeI0IaamOsamaaBaaaleaacaWGPbGaamOAaaqabaGccaaIPaGaeSyMIuMaeuOPdyKaaGikamaalaaabaGabmiEayaauaWaaSbaaSqaaiaadMgacaWGQbaabeaaaOqaaiaadIgadaWgaaWcbaGaamOBaaqabaaaaOGaaGiFaiqadIhagaafamaaBaaaleaa[email protected][email protected]

Where Ω is the within individual variance matrix, ( φ(| x ˜ p ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqOXdOMaaGi[email protected][email protected] and Φ(| x p ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeuOPdyKaaGi[email protected][email protected] are the adjusted quantities as those in (4). And I ir Ω I ir [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBaaaleaacaWGPbGaamOCaaqabaGc[email protected][email protected] I is the sub-matrix of Ω with rows and columns corresponding to the non-zero elements of Iim.

We estimate

We estimate Ω by Ω ^ =( ω ^ rs ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGafuyQdCLbaKaacaaI[email protected][email protected] with

ω ^ rs = 1 n rs 1 l=1 n rs ( z rl z ¯ l )( z sl z ¯ s ), i,j=1,...,d, [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGafqyYdCNbaKaadaWgaaWcbaGaamOCaiaadohaaeqaaOGaaGypamaalaaabaGaaGymaaqaaiaad6gadaWgaaWcbaGaamOCaiaadohaaeqaaOGaeyOeI0IaaGymaaaadaaeWbqabSqaaiaadYgacaaI9aGaaGymaaqaaiaad6gadaWgaaqaaiaadkhacaWGZbaabeaaa0GaeyyeIuoakiaaiIcacaWG6bWaaSbaaSqaaiaadkhacaWGSbaabeaakiabgkHiTiqadQhagaqeamaaBaaaleaacaWGSbaabeaakiaaiMcacaaIOaGaamOEamaaBaaaleaacaWGZbGaamiBaaqabaGccqGHsislceWG6bGbaebadaWgaaWcbaGaam4CaaqabaGccaaIPaGaaGilaiaaiccacaaIGaGaaGiiaiaaiccacaWGPbGaaGilaiaadQga[email protected][email protected]

where nrs is the total number of individuals with non-missing (r,s)- th components, zrls are the rearrangement of the r-th component of y it j=1 k π k I it μ(j) [email protected]@[email protected]@+=feaaguart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0Firpepue9fr=xfr=xfrpeWZqaaeaabiGaaiaacaqaaeaadaqaaqaaaOqaaiaadMhadaWgaaWcbaGaamyAaiaadshaaeqaaOGaeyOeI0YaaabCaeaacqaHapaCdaWgaaWcbaGaam4AaaqabaaabaGaamOAaiabg2da9iaaigdaaeaacaWGRbaaniabggHiLdGccaWGjbWaaSbaa[email protected][email protected] for which the (r,s)-th components are non-missing.

Let μ ^ (r,x), π ^ ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGafqiVd0MbaKaa[email protected][email protected] be the minimize of (9), where the full μ ^ (r,x) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaa[email protected][email protected] depends on the genotype r and the point value x. It has the intercept term μ ^ 0 (x) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGc[email protected][email protected] (recall (3)), and μ ^ (x) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGa[email protected][email protected] is approximated by setting μ ^ (x)= μ ^ 0 (x) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGafqiVd0MbaKaacaaIOaGaamiEaiaaiMcacaaI[email protected][email protected] Direct computation of μ ^ (r,x), π ^ ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGafqiVd0MbaKaa[email protected][email protected] in (7) is not easy, instead we use an iterative procedure as in the following steps.

Select starting values π(0) for π. With this π(0), compute Ω(0), and T(0)(r)s. Let η=(μ0,α,β) be the full representation of the regression parameters, Xir(r=f,m,1,…,ni) be the corresponding design matrix for the r-th individual in the i-th family. In iterations 1-m do the following

(i) Fix π(i) , Ω(i) and T(i)(r) s minimize (7) with respect to η to get

η (i) =( i=1 n r=1 k l ϖ lr (i)2 X il X il J il φ( x ˜ il h n )(1(1 J il )Φ( x il h n ))) 1 [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeq4TdG2aaWbaaSqabeaacaaIOaGaamyAaiaaiMcaaaGccaaI9aGaaGikamaaqahabeWcbaGaamyAaiaai2dacaaIXaaabaGaamOBaaqdcqGHris5aOWaaabCaeqaleaacaWGYbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGcdaaeqbqabSqaaiaadYgaaeqaniabggHiLdGccqaHwpGDdaqhaaWcbaGaamiBaiaadkhaaeaacaaIOaGaamyAaiaaiMcacaaIYaaaaOGaamiwamaaBaaaleaacaWGPbGaamiBaaqabaGcceWGybGbauaadaWgaaWcbaGaamyAaiaadYgaaeqaaOGaamOsamaaBaaaleaacaWGPbGaamiBaaqabaGccqWIzkszcqaHgpGAcaaIOaWaaSaaaeaaceWG4bGbaGaadaWgaaWcbaGaamyAaiaadYgaaeqaaaGcbaGaamiAamaaBaaaleaacaWGUbaabeaaaaGccaaIPaGaaGikaiaaigdacqGHsislcaaIOaGaaGymaiabgkHiTiaadQeadaWgaaWcbaGaamyAaiaadYgaaeqaaOGaaGykaiablMPiLjabfA6agjaaiIcadaWcaaqaaiqadIhagaafamaaBaaaleaacaWGPbGaamiBaaqabaaakeaacaWGObWaaSbaaSqaaiaad6ga[email protected][email protected]

× i=1 n r=1 k l ϖ lr (i)2 y il X il J il φ( x ˜ il h n )(1(1 J il )Φ( x il h n )),(10) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey41aq7aaabCaeqaleaacaWGPbGaaGypaiaaigdaaeaacaWGUbaaniabggHiLdGcdaaeWbqabSqaaiaadkhacaaI9aGaaGymaaqaaiaadUgaa0GaeyyeIuoakmaaqafabeWcbaGaamiBaaqab0GaeyyeIuoakiabeA9a2naaDaaaleaacaWGSbGaamOCaaqaaiaaiIcacaWGPbGaaGykaiaaikdaaaGccaWG5bWaaSbaaSqaaiaadMgacaWGSbaabeaakiqadIfagaqbamaaBaaaleaacaWGPbGaamiBaaqabaGccaWGkbWaaSbaaSqaaiaadMgacaWGSbaabeaakiablMPiLjabeA8aQjaaiIcadaWcaaqaaiqadIhagaacamaaBaaaleaacaWGPbGaamiBaaqabaaakeaacaWGObWaaSbaaSqaaiaad6gaaeqaaaaakiaaiMcacaaIOaGaaGymaiabgkHiTiaaiIcacaaIXaGaeyOeI0IaamOsamaaBaaaleaacaWGPbGaamiBaaqabaGccaaIPaGaeSyMIuMaeuOPdyKaaGikamaalaaabaGabmiEayaauaWaaSbaaSqaaiaadMgacaWGSbaabeaaaOqaaiaadIgadaWgaaWcbaGaa[email protected][email protected]

where ϖ lr (i) = π r (i) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqO1dy3aa0baaSqaaiaadYgacaWGYbaabaGaaGikaiaadMgacaaIPaaaa[email protected][email protected] for l=f,m, and ϖ lr (i) = T (i) (r) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqO1dy3aa0baaSqaaiaadYgacaWGYbaabaGaaGikaiaadMgacaaIPaaaaOGaaGypaiaad[email protected][email protected] for l=1,…,ni. (ii) Fix (i), minimize (7) with respect to π, with the constraint j=1 k π j =1 [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabmaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGR[email protected][email protected] to get

π r (i+1) = i=1 n l=f,m y ˜ ' ilr ( I il Ω I il ) y ˜ ilr J il φ( x ˜ il h n )(1(1 J il )Φ( x il h n )) i=1 n l=f,m r=1 k y ˜ ' ilr ( I il Ω I il ) y ˜ ilr J il φ( x ˜ il h n )(1(1 J il )Φ( x il h n )) ,   (11) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiWda3aa0baaSqaaiaadkhaaeaacaaIOaGaamyAaiabgUcaRiaaigdacaaIPaaaaOGaaGypamaalaaabaWaaabCaeqaleaacaWGPbGaaGypaiaaigdaaeaacaWGUbaaniabggHiLdGcdaaeqbqabSqaaiaadYgacaaI9aGaamOzaiaaiYcacaWGTbaabeqdcqGHris5aOGabmyEayaaiaGaaG4jamaaBaaaleaacaWGPbGaamiBaiaadkhaaeqaaOGaaGikaiaadMeadaWgaaWcbaGaamyAaiaadYgaaeqaaOGaeSyMIuMaeuyQdCLaeSyMIuMaamysamaaBaaaleaacaWGPbGaamiBaaqabaGccaaIPaGabmyEayaaiaWaaSbaaSqaaiaadMgacaWGSbGaamOCaaqabaGccaWGkbWaaSbaaSqaaiaadMgacaWGSbaabeaakiablMPiLjabeA8aQjaaiIcadaWcaaqaaiqadIhagaacamaaBaaaleaacaWGPbGaamiBaaqabaaakeaacaWGObWaaSbaaSqaaiaad6gaaeqaaaaakiaaiMcacaaIOaGaaGymaiabgkHiTiaaiIcacaaIXaGaeyOeI0IaamOsamaaBaaaleaacaWGPbGaamiBaaqabaGccaaIPaGaeSyMIuMaeuOPdyKaaGikamaalaaabaGabmiEayaauaWaaSbaaSqaaiaadMgacaWGSbaabeaaaOqaaiaadIgadaWgaaWcbaGaamOBaaqabaaaaOGaaGykaiaaiMcaaeaadaaeWbqabSqaaiaadMgacaaI9aGaaGymaaqaaiaad6gaa0GaeyyeIuoakmaaqafabeWcbaGaamiBaiaai2dacaWGMbGaaGilaiaad2gaaeqaniabggHiLdGcdaaeWbqabSqaaiaadkhacaaI9aGaaGymaaqaaiaadUgaa0GaeyyeIuoakiqadMhagaacaiaaiEcadaWgaaWcbaGaamyAaiaadYgacaWGYbaabeaakiaaiIcacaWGjbWaaSbaaSqaaiaadMgacaWGSbaabeaakiablMPiLjabfM6axjablMPiLjaadMeadaWgaaWcbaGaamyAaiaadYgaaeqaaOGaaGykaiqadMhagaacamaaBaaaleaacaWGPbGaamiBaiaadkhaaeqaaOGaamOsamaaBaaaleaacaWGPbGaamiBaaqabaGccqWIzkszcqaHgpGAcaaIOaWaaSaaaeaaceWG4bGbaGaadaWgaaWcbaGaamyAaiaadYgaaeqaaaGcbaGaamiAamaaBaaaleaacaWGUbaabeaaaaGccaaIPaGaaGikaiaaigdacqGHsislcaaIOaGaaGymaiabgkHiTiaadQeadaWgaaWcbaGaamyAaiaadYgaaeqaaOGaaGykaiablMPiLjabfA6agjaaiIcadaWcaaqaaiqadIhagaafamaaBaaaleaacaWGPbGaamiBaaqabaaakeaacaWGObWaaSbaaSqaaiaad6gaaeqaaaaakiaaiMcacaa[email protected][email protected]

(r=1,…,k) (11)

and update Ω(i+1), and T(i+1(r) with (i+1).For some pre-specified ∈>0, when the relative errors

|( μ (m) (r,x), π (m) )( μ (m1) (r,x), π (m1) )| |( μ (m1) (r,x), π (m1) )| ε [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaSaaaeaacaaI8bGaaGikaiabeY7aTnaaCaaaleqabaGaaGikaiaad2gacaaIPaaaaOGaaGikaiaadkhacaaISaGaamiEaiaaiMcacaaISaGaeqiWda3aaWbaaSqabeaacaaIOaGaamyBaiaaiMcaaaGccaaIPaGaeyOeI0IaaGikaiabeY7aTnaaCaaaleqabaGaaGikaiaad2gacqGHsislcaaIXaGaaGykaaaakiaaiIcacaWGYbGaaGilaiaadIhacaaIPaGaaGilaiabec8aWnaaCaaaleqabaGaaGikaiaad2gacqGHsislcaaIXaGaaGykaaaakiaaiMcacaaI8baabaGaaGiFaiaaiIcacqaH8oqBdaahaaWcbeqaaiaaiIcacaWGTbGaeyOeI0IaaGymaiaaiMcaaaGccaaIOaGaamOCaiaaiYcacaWG4bGaaGykaiaaiYcacqaHapaCdaahaaWcbeqaaiaaiIcacaWG[email protected][email protected]

we stop the process at the last step m, and take

( μ ^ (r,x), π ^ )=( μ (m) (r,x), π (m) ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaaGikaiqbeY7aTzaajaGaaGikaiaadkhacaaISaGaamiEaiaaiMcacaaISaGafqiWdaNbaKaacaaIPaGaaGypaiaaiIcacqaH8oqBdaahaaWcbeqaaiaaiIcacaWGTbGaaGykaaaakiaaiIcacaWGYbGaaGilaiaadIhacaaIP[email protected][email protected]

For arbitrary kernel and reasonably chosen band width hn, various asymptotic results are established in case of standard nonmixture data. We conjecture that similar results will hold under some regularity conditions.

Lastly, the bandwidth determines the smoothness of the estimate. Interesting research that addresses the crucial problem of bandwidth selection can be found in [35]. There are considerable literatures for automatic methods that attempt to minimize a lack-of-fit criterion such as an integrated squared error. But most of the methods provide an optimal hn determined by some unknown quantities. For simplicity, let k=|Jij| be the dimension of the observed covariate of the j-th (j=f,m,1,…,ni) individual in the i-th family, for the corresponding kernel, we choose hn=Cn-1/(k+1), for some constant C>0, and C can be selected through numerical trial.

Variance components model

As an alternative to the mixture models considered above, the Variance Components (VC) model [36,37] has received much attention recently due to very efficient in computation as well as relatively robustness to model misspecification [38-46].

Let yi be the trait vector of the i-th individual in the family, in case without censoring and missing records, the commonly used VC model describing the trait value is

yi=μ+gi+Gi+ηxi+ei

Where μ is the overall mean, gi is the unobserved random vector of major gene effects at the trait locus with alleles A and B, Gi is the unobserved polygenic effects vector, the ηj’s are effects associated with the covariates xij’s, and ei is the residual random error vector. The usual assumption is that gi,Gi and ei are uncorrelated and E(gi)=E(Gi)=E(ei)=0. When missing records are present, the model is modified as

y i = I i (μ+ g i + G i +η) J i x i + I i e i    ( 12 ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGPbaabeaakiaai2dacaWGjbWaaSbaaSqaaiaadMgaaeqaaOGaeSyMIuMaaGikaiabeY7aTjabgUcaRiaadEgadaWgaaWcbaGaamyAaaqabaGccqGHRaWkcaWGhbWaaSbaaSqaaiaadMgaaeqaaOGaey4kaSIaeq4TdGMaaGykaiaadQeadaWgaaWcbaGaamyAaaqabaGccqWIzkszcaWG4bWaaSbaaSqaaiaadMgaaeqaaOGaey4kaSIaamysamaaBaaaleaacaWGPbaabeaakiablMPiLjaadwgadaWgaaWcbaGaamyAaaqabaG[email protected][email protected]

In this model, the parameters of interests are specified in the family variance matrix, thus computation can be carried out efficiently without the multiple mixing. Let ykk and Ωk be the observation, its mean and variance matrix of the k-th family. We can define Ik, dk and Jk accordingly. The commonly used model for quantitative traits is the multivariate normal distribution, thus the total likelihood is

L(z|θ)= k=1 K δ k I k φ( y k μ k | Ω k )(1 δ k ) I k Φ y k μ k | Ω k ). [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitaiaaiIcacaWG6bGaaGiFaiabeI7aXjaaiMcacaaI9aWaaabCaeqaleaacaWGRbGaaGypaiaaigdaaeaacaWGlbaaniabggHiLdGccqaH0oazdaWgaaWcbaGaam4AaaqabaGccaWGjbWaaSbaaSqaaiaadUgaaeqaaOGaeSyMIuMaeqOXdOMaaGikaiaadMhadaWgaaWcbaGaam4AaaqabaGccqGHsislcqaH8oqBdaWgaaWcbaGaam4AaaqabaGccaaI8bGaeuyQdC1aaSbaaSqaaiaadUgaaeqaaOGaaGykaiaaiIcacaaIXaGaeyOeI0IaeqiTdq2aaSbaaSqaaiaadUgaaeqaaOGaaGykaiaadMeadaWgaaWcbaGaam4AaaqabaGccqWIzkszcqqHMoGrcaWG5bWaaSbaaSqaaiaadUgaaeqaaOGaeyOeI0IaeqiVd02aaSbaaSqaaiaadUg[email protected][email protected]

Here f is the distribution function of the normal distribution with mean 0 and variance Ω.

The key lies in the specification of the variance matrices Ωks, which we illustrate in the following settings.

In the simplest case of Hardy-Weinberg equilibrium among locus alleles without linkage to marker, and without censoring and missing records, the covariance matrix between individuals i and j of a given family can be found, for example, in [38]. Modified to our case, it is

Cov( Y ki , Y kj )={ I i ( σ a 2 + σ d 2 + σ G 2 + σ e 2 ) I j if i=j 2 Φ ij I i ( σ a 2 + Δ 7ij σ d 2 +2 Φ ij σ G 2 ) I j , if ij     ( 13 ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4qaiaad+gacaWG2bGaaGikaiaadMfadaWgaaWcbaGaam4AaiaadMgaaeqaaOGaaGilaiaadMfadaWgaaWcbaGaam4AaiaadQgaaeqaaOGaaGykaiaai2dadaGabaqaauaabaqaciaaaeaacaWGjbWaaSbaaSqaaiaadMgaaeqaaOGaeSyMIuMaaGikaiabeo8aZnaaDaaaleaacaWGHbaabaGaaGOmaaaakiabgUcaRiabeo8aZnaaDaaaleaacaWGKbaabaGaaGOmaaaakiabgUcaRiabeo8aZnaaDaaaleaacaWGhbaabaGaaGOmaaaakiabgUcaRiabeo8aZnaaDaaaleaacaWGLbaabaGaaGOmaaaakiaaiMcacqWIzkszcaWGjbWaaSbaaSqaaiaadQgaaeqaaaGcbaGaaGjcVlaadMgacaWGMbGaaGjcVlaaiccacaaIGaGaamyAaiaai2dacaWGQbaabaGaaGOmaiabfA6agnaaBaaaleaacaWGPbGaamOAaaqabaGccaWGjbWaaSbaaSqaaiaadMgaaeqaaOGaeSyMIuMaaGikaiabeo8aZnaaDaaaleaacaWGHbaabaGaaGOmaaaakiabgUcaRiabfs5aenaaBaaaleaacaaI3aGaamyAaiaadQgaaeqaaOGaeq4Wdm3aa0baaSqaaiaadsgaaeaacaaIYaaaaOGaey4kaSIaaGOmaiabfA6agnaaBaaaleaacaWGPbGaamOAaaqabaGccqaHdpWCdaqhaaWcbaGaam4raaqaaiaaikdaaaGccaaIPaGaeSyMIuMaamysamaaBaaaleaacaWGQbaabeaakiaaiYcaaeaacaaMi8UaamyAaiaadAgacaaMi8UaaGiiaiaaiccacaWGPbGaeyiyIKRaamOAaaaaaiaawUhaaiaabcc[email protected][email protected]

where σ a 2 [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaa[email protected][email protected] is the additive genetic variance matrix due to the locus, σ d 2 [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaa[email protected][email protected] is the dominant genetic variance matrix, Φ ij = Δ 7ij /2+ Δ 8ij /4 [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeuOPdy0aaSbaaSqaaiaadMgacaWGQbaabeaakiaai2dacqqHuoardaWgaaWcbaGaaG4naiaadMgacaWGQbaabeaakiaai+cacaaIYaGaey4ka[email protected][email protected] is the kinship coefficient between individuals i and j [47], and Δ7ij8ij9ij, etc. are the condensed kinship coefficient of Jacquard [48], between individuals i and j.

In the more general Hardy-Weinberg disequilibrium case, let f be the within population inbreeding coefficient f at the trait locus [49- 51]. Introduced CV model in this case, which modified in our case is

Cov( Y i , Y j |f)={ I i ((1+ f 2 ) σ a 2 +(1f) σ d 2 +f σ 0 2 + σ G 2 + σ e 2 ) I j , if i=j I i ( Δ 7ij γ 7 (f)+ Δ 8ij γ 8 (f)+2 Φ ij σ G 2 ) I j , if ij    (14) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4qaiaad+gacaWG2bGaaGikaiaadMfadaWgaaWcbaGaamyAaaqabaGccaaISaGaamywamaaBaaaleaacaWGQbaabeaakiaaiYhacaWGMbGaaGykaiaai2dadaGabaqaauaabaqaciaaaeaacaWGjbWaaSbaaSqaaiaadMgaaeqaaOGaeSyMIuMaaGikaiaaiIcacaaIXaGaey4kaSYaaSaaaeaacaWGMbaabaGaaGOmaaaacaaIPaGaeq4Wdm3aa0baaSqaaiaadggaaeaacaaIYaaaaOGaey4kaSIaaGikaiaaigdacqGHsislcaWGMbGaaGykaiabeo8aZnaaDaaaleaacaWGKbaabaGaaGOmaaaakiabgUcaRiaadAgacqaHdpWCdaqhaaWcbaGaaGimaaqaaiaaikdaaaGccqGHRaWkcqaHdpWCdaqhaaWcbaGaam4raaqaaiaaikdaaaGccqGHRaWkcqaHdpWCdaqhaaWcbaGaamyzaaqaaiaaikdaaaGccaaIPaGaeSyMIuMaamysamaaBaaaleaacaWGQbaabeaakiaaiYcaaeaacaaMi8UaamyAaiaadAgacaaMi8UaaGiiaiaaiccacaWGPbGaaGypaiaadQgaaeaacaWGjbWaaSbaaSqaaiaadMgaaeqaaOGaeSyMIuMaaGikaiabfs5aenaaBaaaleaacaaI3aGaamyAaiaadQgaaeqaaOGaeq4SdC2aaSbaaSqaaiaaiEdaaeqaaOGaaGikaiaadAgacaaIPaGaey4kaSIaeuiLdq0aaSbaaSqaaiaaiIdacaWGPbGaamOAaaqabaGccqaHZoWzdaWgaaWcbaGaaGioaaqabaGccaaIOaGaamOzaiaaiMcacqGHRaWkcaaIYaGaeuOPdy0aaSbaaSqaaiaadMgacaWGQbaabeaakiabeo8aZnaaDaaaleaacaWGhbaabaGaaGOmaaaakiaaiMcacqWIzkszcaWGjbWaaSbaaSqaaiaadQgaaeqaaOGaaGilaaqaaiaayIW7caWGPbGaamOzaiaayIW7caaIGaGaaGiiaiaadMgacqGHGjsUcaWGQ[email protected][email protected]

where γl(f)s are matrices determined by 2 σ a 2 [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaa[email protected][email protected] , σ d 2 [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaa[email protected][email protected] and f etc., see there for details.

In the case of linkage to marker with both Hardy-Weinberg and linkage equilibrium, the covariance in our case can be specified based on that of, for example [40], in the same way as above. In the case of linkage to marker with either one or both Hardy-Weinberg and linkage disequilibrium, the covariance in our case can be specified based on that of [51], in the same way.

Competing risks

Now suppose that the response yi is the failure time and only the failure for one of the d diseases is observed for each individual. For the i-th family, the data have the form (yi,di,ji,xi), where yi=(yif,yim,yi1,… ,yibi), similarly for di,ji, and xi, where ji is the observed disease type indicator. For example if the observed disease for the father is type 2, then jif=2. Given the data (yi,di,ji,xi)s, we like to investigate the objective of interests for each of the d disease. This problem is that of the competing risks. Note here the response for each individual is one-dimensional, and hence the corresponding quantities have simple notations. We are interested in the genetic regression analysis for the competing risks. We use a variant of the proportional hazards model. The mean of the j-th type, r-th member of the i-th family is specified as

μ j ( g ir )= I ir ( μ 0j + α j χ( g ir )+ β i ) J ir x ir , (r=f,m,1,..., b i ). [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd02aaSbaaSqaaiaadQgaaeqaaOGaaGikaiaadEgadaWgaaWcbaGaamyAaiaadkhaaeqaaOGaaGykaiaai2dacaWGjbWaaSbaaSqaaiaadMgacaWGYbaabeaakiablMPiLjaaiIcacqaH8oqBdaWgaaWcbaGaaGimaiaadQgaaeqaaOGaey4kaSIaeqySde2aaSbaaSqaaiaadQgaaeqaaOGaeq4XdmMaaGikaiaadEgadaWgaaWcbaGaamyAaiaadkhaaeqaaOGaaGykaiabgUcaRiabek7aInaaBaaaleaacaWGPbaabeaakiaaiMcacaWGkbWaaSbaaSqaaiaadMgacaWGYbaabeaakiablMPiLjaadIhadaWgaaWcbaGaamyAaiaadkhaaeqaaOGaaGilaiaaiccacaaIGaGaaGiiaiaaiccacaaIOaGaamOCaiaai2dacaWGMbGaaGilaiaad2gacaaISaGaaGymaiaaiYcacaaIUaGaaGO[email protected][email protected]

For a reasonably chosen function h(⋅), we specify

h( μ i )=( l=1 k π l h( μ j f ( g l )))( l=1 k π l h( μ j m ( g l ))) l=1 b i ( s=1 k T(s)h( μ j l ( g s ))). [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiAaiaaiIcacqaH8oqBdaWgaaWcbaGaamyAaaqabaGccaaIPaGaaGypaiaaiIcadaaeWbqabSqaaiaadYgacaaI9aGaaGymaaqaaiaadUgaa0GaeyyeIuoakiabec8aWnaaBaaaleaacaWGSbaabeaakiaadIgacaaIOaGaeqiVd02aaSbaaSqaaiaadQgadaWgaaqaaiaadAgaaeqaaaqabaGccaaIOaGaam4zamaaBaaaleaacaWGSbaabeaakiaaiMcacaaIPaGaaGykaiaaiIcadaaeWbqabSqaaiaadYgacaaI9aGaaGymaaqaaiaadUgaa0GaeyyeIuoakiabec8aWnaaBaaaleaacaWGSbaabeaakiaadIgacaaIOaGaeqiVd02aaSbaaSqaaiaadQgadaWgaaqaaiaad2gaaeqaaaqabaGccaaIOaGaam4zamaaBaaaleaacaWGSbaabeaakiaaiMcacaaIPaGaaGykamaarahabeWcbaGaamiBaiaai2dacaaIXaaabaGaamOyamaaBaaabaGaamyAaaqabaaaniabg+GivdGccaaIOaWaaabCaeqaleaacaWGZbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGccaWGubGaaGikaiaadohacaaIPaGaamiAaiaaiIcacqaH8oqBdaWgaaWcbaGaamOAamaaBaaabaGaamiBaaqabaaabeaakiaai[email protected][email protected]

More convenient below is to use the notations

h r ( μ i )= l=1 k π l h( μ j r ( g l )), (r=f,m) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiAamaaBaaaleaacaWGYbaabeaakiaaiIcacqaH8oqBdaWgaaWcbaGaamyAaaqabaGccaaIPaGaaGypamaaqahabeWcbaGaamiBaiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaeqiWda3aaSbaaSqaaiaadYgaaeqaaOGaamiAaiaaiIcacqaH8oqBdaWgaaWcbaGaamOAamaaBaaabaGaamOCaaqabaaabeaakiaaiIcacaWGNbWaaSbaaSqaaiaadYgaaeqaaOGaaGykaiaaiMcacaaISaGaaGiiaiaaicca[email protected][email protected]

and

h r ( μ i )= s=1 k T(s)h( μ j r ( g s )| y p ), (r=1,..., b i ). [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamiAamaaBaaaleaacaWGYbaabeaakiaaiIcacqaH8oqBdaWgaaWcbaGaamyAaaqabaGccaaIPaGaaGypamaaqahabeWcbaGaam4Caiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaamivaiaaiIcacaWGZbGaaGykaiaadIgacaaIOaGaeqiVd02aaSbaaSqaaiaadQgadaWgaaqaaiaadkhaaeqaaaqabaGccaaIOaGaam4zamaaBaaaleaacaWGZbaabeaakiaaiMcacaaI8bGaamyEamaaBaaaleaacaWGWbaabeaakiaaiMcacaaISaGaaGiiaiaaiccacaaIGaGaaGiiaiaaiIcacaWGYbGaaGypaiaaigdacaaISaGaaGOlaiaai[email protected][email protected]

Let yj1<…<yjkj be the kj failures of type j(j=1,…,d), R(yji),be the risk set at yji, the partial likelihood is

L(y|θ)= j=1 d i=1 k j h r i ( μ j ) lR( y ji ) h r l ( μ j ) .(15) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitaiaaiIcacaWG5bGaaGiFaiabeI7aXjaaiMcacaaI9aWaaebCaeqaleaacaWGQbGaaGypaiaaigdaaeaacaWGKbaaniabg+GivdGcdaqeWbqabSqaaiaadMgacaaI9aGaaGymaaqaaiaadUgadaWgaaqaaiaadQgaaeqaaaqdcqGHpis1aOWaaSaaaeaacaWGObWaaSbaaSqaaiaadkhadaWgaaqaaiaadMgaaeqaaaqabaGccaaIOaGaeqiVd02aaSbaaSqaaiaadQgaaeqaaOGaaGykaaqaamaaqafabeWcbaGaamiBaiabgIGiolaadkfacaaIOaGaamyEamaaBaaabaGaamOAaiaadMgaaeqaaiaaiMcaaeqaniabggHiLdGccaWGObWaaSbaaSqaaiaadkhadaWgaaqaaiaadYgaaeqaaaqabaGccaaIOaGaeqiVd02aaSbaa[email protected][email protected]

Asymptotic heuristic

For IID data, various asymptotic results can be obtained. The results from the score function, the likelihood ratio statistic, and the MLE are equivalent. These results can be used to establish confidence intervals or hypothesis testing, etc. for θ. Here we are more interested in using the MLE. For general dependence model, usually the treatment is non-standard. But for our model, since the log-likelihood is in the form of several additive pieces, standard method can be used to derive the asymptotic distribution of the MLE. Let zi=(yi,di)(i=1,…,n). For the IID data, it is well known that under mild regularity conditions, the MLE θ ^ n [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciG[email protected][email protected] nis strongly consistent and asymptotically distributed normal with mean at the true parameter value θ0, and variance matrix given by the inverse of the Fisher information. Here the observations are unbalanced, the asymptotic variance is the Fisher information times a weight matrix. To derive it, we need some notations, and mainly concentrate on model (4).

Let Nip be the total number of parents with the i-th measurement non-missing (i=1,…,d), Niss be those for the siblings, N be the total number of individuals in the study, γNir=Nir/N(r=p,s). Assume limN→8γNir=γir>0 exists (r=p,s;i=1,…,d). Let Yp and Yj be general random vectors associated with a parent and sib respectively, and Δp and Δj be the corresponding random vectors.

For model (4), let

H(θ| Y p , Y j )= Δ p log( l=1 k π l f( Y p |θ,l))+(1 Δ p )log( l=1 k π l S Y p |θ,l)) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamisaiaaiIcacqaH4oqCcaaI8bGaamywamaaBaaaleaacaWGWbaabeaakiaaiYcacaWGzbWaaSbaaSqaaiaadQgaaeqaaOGaaGykaiaai2dacqqHuoardaWgaaWcbaGaamiCaaqabaGccqWIzkszcaGGSbGaai4BaiaacEgacaaIOaWaaabCaeqaleaacaWGSbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGccqaHapaCdaWgaaWcbaGaamiBaaqabaGccaWGMbGaaGikaiaadMfadaWgaaWcbaGaamiCaaqabaGccaaI8bGaeqiUdeNaaGilaiaadYgacaaIPaGaaGykaiabgUcaRiaaiIcacaaIXaGaeyOeI0IaeuiLdq0aaSbaaSqaaiaadchaaeqaaOGaaGykaiablMPiLjaacYgacaGGVbGaai4zaiaaiIcadaaeWbqabSqaaiaadYgacaaI9aGaaGymaaqaaiaadUgaa0GaeyyeIuoakiabec8aWnaaBaaaleaacaWGSbaabeaakiaadofacaWGzbWaaSba[email protected][email protected]

+ Δ j log( l=1 k T(l)f( Y j |G,θ,l))+(1 Δ j )log( l=1 k T(l)S( Y j |G,θ,l))   (16) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey4kaSIaeuiLdq0aaSbaaSqaaiaadQgaaeqaaOGaeSyMIuMaaiiBaiaac+gacaGGNbGaaGikamaaqahabeWcbaGaamiBaiaai2dacaaIXaaabaGaam4AaaqdcqGHris5aOGaamivaiaaiIcacaWGSbGaaGykaiaadAgacaaIOaGaamywamaaBaaaleaacaWGQbaabeaakiaaiYhacaWGhbGaaGilaiabeI7aXjaaiYcacaWGSbGaaGykaiaaiMcacqGHRaWkcaaIOaGaaGymaiabgkHiTiabfs5aenaaBaaaleaacaWGQbaabeaakiaaiMcacqWIzkszcaGGSbGaai4BaiaacEgacaaIOaWaaabCaeqaleaacaWGSbGaaGypaiaaigdaaeaacaWGRbaaniabggHiLdGccaWGubGaaGikaiaadYgacaaIPaGaam4uaiaaiIcacaWGzbWaaSbaaSqaaiaadQgaaeqaaOGaaGiFaiaadEeacaaISaGaeqiUdeNaaGilaiaadYga[email protected][email protected]

here we use the notation Δp v(⋅) to represent the marginal version of v(⋅) corresponding to the non-zero components of Δp. The Fisher information matrix is

I(θ)=E( 2 H(θ| Y p , Y j ) θ θ ).(17) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysaiaaiIcacqaH4oqCcaaIPaGaaGypaiabgkHiTiaadweacaaIOaWaaSaaaeaacqGHciITdaahaaWcbeqaaiaaikdaaaGccaWGibGaaGikaiabeI7aXjaaiYhacaWGzbWaaSbaaSqaaiaadchaaeqaaOGaaGilaiaadMfadaWgaaWcbaGaamOAaaqabaGccaaIPaaabaGaeyOaIyRafqiUdeNbaua[email protected][email protected]

The above expectation is more involved than it looks, since that involves summations of all possible combinations of non-zero elements of it with respect to Δr, and also the unknown distribution of it. Instead, an empirical version of it has a known form

I N (θ)= 1 N 2 logL(y|θ) θ θ | θ= θ ^ n ,(18) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamysamaaBaaaleaacaWGobaabeaakiaaiIcacqaH4oqCcaaIPaGaaGypaiabgkHiTmaalaaabaGaaGymaaqaaiaad6eaaaWaaSaaaeaacqGHciITdaahaaWcbeqaaiaaikdaaaGccaGGSbGaai4BaiaacEgacaWGmbGaaGikaiaadMhacaaI8bGaeqiUdeNaaGykaaqaaiabgkGi2kqbeI7aXzaafaGaeyOaIyRaeqiUdehaaiaaiYhadaWgaaWcbaGaeqiUdeNaaGypaiqbeI7aXzaajaWa[email protected][email protected]

where L(y\θ) is given by (4). At the true data generating parameter θ0, IN is strongly consistent for I(⋅). To obtain the weight matrix, we need to specify the parameter order in θ. We arrange the first k-1 entry to be π1,…,πk-1, next we arrange all the regression parameters for the first response variable,..., all the regression parameters for the last response variable, then all the independent parameters in the variance matrix S and covariance matrix &Omega; in the similar order. It is clear that, in the weight matrix W, for (i,j) corresponding to the first k-1 components in θ, the weight should be γp1p+…+γdp; for (i,j) corresponding to the r-th and the l-th regression parameters, the weight is (ϒrp+ϒrs)(ϒlp+ϒls) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0Firpepue9fr=xfr=xfrpeWZqaaeaabiGaaiaacaqaaeaadaqaaqaaaOqaamaakaaabaGaaiikaiabfk9aHkaadkhacaWGWbGaey4kaSIaeuO0deQaamOCaiaadohacaGGPaGaaiikaiabf[email protected][email protected] ; for (i,j) corresponding to the (a,b)-th and the (u,v)-th variance or covariance, the weight is [(γapas) (γbpbs) (γupus) (γvpvs)]1/4. Let θ0 be the true unknown data generating parameter, d→ stands for convergence in distribution. Then, we have

N ( θ ^ N θ 0 ) d N(0, I 1 ( θ 0 )W)    (19) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaOaaaeaacaWGobaaleqaaOGaaGikaiqbeI7aXzaajaWaaSbaaSqaaiaad6eaaeqaaOGaeyOeI0IaeqiUde3aaSbaaSqaaiaaicdaaeqaaOGaaGykamaaxacabaGaeyOKH4kaleqabaGaamizaaaakiaad6eacaaIOaGaaGimaiaaiYcacaWGjbWaaWbaaSqabeaacqGHsislcaaIXaaaaOGaaGikaiabeI7aXnaaBaaaleaacaaIWaaabeaakiaaiMcacqGHxkcXcaWGxbG[email protected][email protected]

where A⊗B stands for the Kronecker product of matrices A and B. Since I(⋅) involves unknown quantities, equivalently

N ( θ ^ N θ 0 ):N(0, I N 1 ( θ ^ n ) W N )    (20) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaOaaaeaacaWGobaaleqaaOGaaGikaiqbeI7aXzaajaWaaSbaaSqaaiaad6eaaeqaaOGaeyOeI0IaeqiUde3aaSbaaSqaaiaaicdaaeqaaOGaaGykaerbbjxAHXgaiuaacaWF6aGaamOtaiaaiIcacaaIWaGaaGilaiaadMeadaqhaaWcbaGaamOtaaqaaiabgkHiTiaaigdaaaGccaaIOaGafqiUdeNbaKaadaWgaaWcbaGaamOBaaqabaGccaaIPaGaey4LIqSaam4vamaaBaaaleaacaWGobaabeaak[email protected][email protected]

where WN is W with the γs replaced by the γNs.

The above ideal applies to the other models in this paper, but the results will be more involved, and we only discuss them briefly.

For the proportional hazards model, even for the IID data case, the conditional likelihood looks much different from the full likelihood. Interestingly, the MLE from this model (under the assumption of correct model specification and some regularity conditions) has the same asymptotic distribution as that from the full likelihood [21,52]. For the proportional hazards model, it is noted [4] that the survival function can be written as

S(y\x,β)=-S0(y)h(β'x),

and f(y\x,β)=-dS(y\x,β)/dy. So to get the full log-likelihood (14), we need the estimate of S0(y) for the d-dimensional case [29,30]. Proposed the multi-dimensional generalization of the Kaplan-Meyer nonparametric estimator of S(y), similar technique can be used here for the construction of S0(y). Then (18) continues to hold in this case. Due to technical involvement, we will not pursue the details here.

For the least squares estimator, since the weight involves the kernel and hn, the treatment is different from those above, and in the case of full observation generally the asymptotic result is of the form

n h n d ( θ ^ n θ 0 h n 2 C o p ( h n 2 )) d N(0,Ω), [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbmqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaOaaaeaacaWGUbGaamiAamaaDaaaleaacaWGUbaabaGaamizaaaaaeqaaOGaaGikaiqbeI7aXzaajaWaaSbaaSqaaiaad6gaaeqaaOGaeyOeI0IaeqiUde3aaSbaaSqaaiaaicdaaeqaaOGaeyOeI0IaamiAamaaDaaaleaacaWGUbaabaGaaGOmaaaakiaadoeacqGHsislcaWGVbWaaSbaaSqaaiaadchaaeqaaOGaaGikaiaadIgadaqhaaWcbaGaamOBaaqaaiaaikdaaaGccaaIPaGaaGykamaaxacabaGaeyOKH4kaleqabaGaamizaaaakiaad6eacaaIOaGaaGim[email protected]@

for some constant C and matrix Ω determined by the kernel and the true (unknown) data and censoring distributions [53,54]. In our case of partial observation, the above result holds with Ω replaced by Ω⊗W.

For the competing risks model, the structure is similar to that of the proportional hazards model. Here the response is one dimensional so that the survival function can be estimated by the Kaplan-Meyer estimator and the weight matrix W is the identity.

Discussion

We have considered several statistical methods, parametric, semi parametric and nonparametric models, for the genetic regression analysis of familial multiple endpoints data, with possible missing records. Here we only considered the case of nuclear families and the parameters are independent of time. The cases of arbitrary pedigrees and/or the time dependent parameter can be treated similarly. The variance components method can also be applied to the proportional hazards model and in the analysis of competing risks. There are some marginal models for the multiple endpoints data, which work well in practice. But we think the joint model is more appropriate when the within responses structure is important in the analysis. Another commonly used method to deal with the missing data is the EM algorithm [55], which can be implemented into the models considered here. But for the multiple endpoints data, the proportion of missing part is usually large; the EM algorithm may not be efficient. When the missing pattern is non-ignorable, more complicated approaches need to be considered to reduce potential biases. Hypothesis testing for parameters of interests can be conducted using the likelihood ratio statistics based on the parametric models. We only derived the basic forms of these models; more features can be implemented to them in particular applications.

References

  1. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958; 53: 457-481.
  2. Breslow N. Covariance analysis of censored survival data. Biometrics. 1974; 30: 89-99.
  3. Fleming TR, Lin DY. Survival analysis in clinical trials: past developments and future directions. Biometrics. 2000; 56: 971-983.
  4. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data, Wiley. 1980.
  5. Andersen PK, Gill RD. Cox’s regression model for counting processes: A large sample study. Annals of Statistics. 1982; 10: 1100-1120.
  6. Arjas E. Survival models and martingale dynamics (with discussion). Scandinavian Journal of Statistics. 1989; 16: 177-225.
  7. Flemming TR, Harrington DP. Counting Processes and Survival Analysis. Wiley, New York. 1991.
  8. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. Springer-Verlag. 1991
  9. Printice R, Williams BJ, Peterson AV. On the regression analysis of multiple failure time data. Biometrika. 1981; 68: 373-379.
  10. Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. Journal of the American Statistical Association. 1989; 84: 1065-1073.
  11. Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Stat Med. 1992; 11: 167-178.
  12. Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med. 1996; 125: 605-613.
  13. Zhang J, Quan H, Ng J, Stepanavage ME. Some statistical methods for multiple endpoints in clinical trials. Controlled Clinical Trials. 1997; 18: 204-221.
  14. Wei LJ, Glidden DV. An overview of statistical methods for multiple failure time data in clinical trials. Stat Med. 1997; 16: 833-839.
  15. Remington DL. Effects of genetic and environmental factors on trait network predictions from quantitative trait locus data. Genetics. 2009; 181: 1087-1099.
  16. Knüppel S, Rohde K, Meidtner K, Drogan D, Holzhütter HG, Boeing H, et al. Evaluation of 41 candidate gene variants for obesity in the EPIC-Potsdam cohort by multi-locus stepwise regression. PLoS One. 2013; 8: e68941.
  17. Elston RC, Stewart J. A general model for the genetic analysis of pedigree data. Hum Hered. 1971; 21: 523-542.
  18. Bonney GE. Compound regressive models for family data. Hum Hered. 1992; 42: 28-41.
  19. Yuan A, Bonney GE. Two new recursive likelihood calculation methods for genetic analysis. Hum Hered. 2002; 54: 82-98.
  20. Cox DR. Regression models and life-tables. Journal of the Royal Statistical Society. 1972; 34: 187-202.
  21. Cox DR. Partial likelihood. Biometrika. 1975; 62: 269-276.
  22. Clayton DG. A Monte Carlo method for Bayesian inference in frailty models. Biometrics. 1991; 47: 467-485.
  23. Hougaard P. A class of multivariate failure time distributions. Biometrika. 1986; 73: 671-678.
  24. Holt JD, Prentice RL. Survival analysis in twin studies and matched pairs experiments. Biometrika. 1974; 61: 17-30.
  25. Oakes D. A model for association in bivariate survival data. Journal of the Royal Statistical Society. 1982; 44: 414-422.
  26. Wild CJ. Failure time models with matched data. Biometrika. 1983; 70: 633-641.
  27. Stute W. Consistent estimation under random censorship when covariable is present. Journal of Multivariate Analysis. 1993; 45: 89-103.
  28. Stute W. Distributional convergence under random censorship when covariables are present.Scandinavian Journal of Statistics. 1996; 23: 462-471.
  29. Dabrowska DM. Kaplan Meier estimate on the plane. Annals of Statistics. 1988; 16: 1475-1489.
  30. Dabrowska DM. Kaplan Meier estimate on the plane: weak convergence, LIL and the bootstrap. Journal of Multivariate Analysis. 1988; 29: 308-325.
  31. Gill R. Multivariate survival analysis. Theory of Probability and Its Applications. 1992a; 37: 18-31.
  32. Gill R. Multivariate survival analysis: Part 2. Theory of Probability and Its Applications. 1992b; 37: 284-301.
  33. Fan J. Locally linear regression smoothers and their minimax efficiencies. Annals of Statistics. 1993; 21: 196-216.
  34. Ruppert D, Wand MP. Multivariate locally weighted least squares regression. Annals of Statistics. 1994; 22: 1346-1370.
  35. Fan J, Gijbels I. Variable bandwidth and local linear regression smoothers. Annals of Statistics. 1992; 20: 2008-2036.
  36. Fisher RA. The correlation between relatives on the supposition of Mendel inheritance. Trans Roy Soc. Edinb. 1918; 52: 399-433.
  37. Harris DL. Genotypic Covariances Between Inbred Relatives. Genetics. 1964; 50: 1319-1348.
  38. Lange K, Boehnke M. Extensions to pedigree analysis IV. Covariance components models for multivariate traits. Am J Med Genet. 1983; 14: 513-524.
  39. Goldgar DE. Multipoint analysis of human quantitative genetic variation. Am J Hum Genet. 1990; 47: 957-967.
  40. Amos CI. Robust variance-components approach for assessing genetic linkage in pedigrees. Am J Hum Genet. 1994; 54: 535-543.
  41. Almasy L, Blangero J. Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998; 62: 1198-1211.
  42. Fulker DW, Cherny SS, Sham PC, Hewitt JK. Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet. 1999; 64: 259-267.
  43. Amos CI, Gu X, Chen J, Davis BR. Least squares estimation of variance components for linkage. Genet Epidemiol. 2000; 19: S1-7.
  44. Blangero J, Williams JT, Almasy L. Robust LOD scores for variance component-based linkage analysis. Genet Epidemiol. 2000; 19: S8-14.
  45. Sham PC, Purcell S. Equivalence between Haseman-Elston and variance-components linkage analyses for sib pairs. Am J Hum Genet. 2001; 68: 1527-1532.
  46. de Andrade M, Guéguen R, Visvikis S, Sass C, Siest G, Amos CI. Extension of variance components approach to incorporate temporal trends and longitudinal pedigree data analysis. Genet Epidemiol. 2002; 22: 221-232.
  47. Lange K. Statistics for Biology and Health. Springer. 1997.
  48. Jacquard A. The genetic structure of populations. 1974.
  49. Wright S. The genetical structure of populations. Ann Eugen. 1951; 15: 323-354.
  50. Cockerham CC. Variance of gene frequencies. Evolution. 1969; 23: 72-84.
  51. Yuan A, Chen G, Yang Q, Rotimi C, Bonney G. Variance components model with disequilibria. Eur J Hum Genet. 2006; 14: 941-952.
  52. Tsiatis A. A large sample study of the estimate for the integrated hazard function in Cox’s regression model for survival data. Technical Report No. 562. 1978; 172-180.
  53. Fan J, Hu I, Truong Y. Robust nonparametric function estimation. Scandinavian Journal of Statistics. 1994; 21: 433-446.
  54. Cai Z. Weighted local linear approach to censored nonparametric regression. Akritas MG, Politis DM. In: Recent Advances and Trends in Nonparametric Statistics. 2003; 217-231.
  55. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. 1977; 39: 1-38.

Download PDF

Citation: Yuan A and Ahn J. Some Genetic Regression Models for Multiple Quantitative End Points Data. Austin Biom and Biostat. 2016; 3(1): 1029. ISSN: 2378-9840

Home
Journal Scope
Online First
Current Issue
Editorial Board
Instruction for Authors
Submit Your Article
Contact Us