A Likelihood Model for Linkage Analysis of Genetic Traits

Research Article

Austin Biom and Biostat. 2014;1(1): 7.

A Likelihood Model for Linkage Analysis of Genetic Traits

Ao Yuan1*, Xiaogang Zhong1 and George E Bonney2

1Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, USA

2National Human Genome Center, Howard University, USA

*Corresponding author: Ao Yuan, Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, Washington, DC 20057, USA.

Received: August 18, 2014; Accepted: September 09, 2014; Published: September 22, 2014

Abstract

Linkage analysis is one of the major approaches for genetic studies of human diseases, for mapping putative genes or studying relationships between loci. Many of the existing methods use identity by descent data, or a particular familial structure, which may not be fully available in some practices. Here we propose a likelihood model for linkage analysis with pedigrees, along with segregation and regressive analysis. Without requiring identity by descent data, this model can be used for both quantitative and qualitative traits to study trait-trait linkage with/without observed genotypes, or trait-marker linkage with observed marker genotype, which include sib pair analysis as a special case. This model is applied to a real data example for illustration.

Keywords: Gamet; Gene loci; Linkage; Recombinant fraction

Introduction

The advances in biotechnology have led to the identification of more and more disease genes without the knowledge of the biochemical nature of the diseases. Linkage analysis is one of the most commonly used approaches for mapping human disease genes, which is often the first step to identify the chromosomal location of them, and may followed by various diagnosis and ultimately therapeutic treatment for these diseases. There are numerous methods, parametric, nonparametric and semi-parametric, for link- age/ association analysis [1-6]. Furthermore Kruglyak et al. [7] proposed a unified multipoint approach, Hor- vath et al. [8] considered family based approach for this problem, Sung et al. [9] suggested a multipoint analysis using Markov chain Monte Carlo algorithm. Many of them use the Identity by Descent (IBD) data, or require some particular familial structure such as infected relative pairs or extreme discordant sib pairs. But in practices IBD data cannot be uniquely determined or not fully available, and particular familial structure are difficult to collect, while marker genotyping data are commonly available. Many of these models are not for the study of trait-trait genetic relationship; some of them use only part of the data information, for example the squared trait value difference. Although robust, the nonparametric model-free methods may suffer potential loss of efficiency since they do not use knowledge of traits generating mechanism. In addition, complex traits are often affected by covariates such as sex, age, race and environmental factors. Here we consider a simple likelihood model for linkage analysis for pedigrees, along with segregation and covariates analysis based on the likelihood principle. This model can be used to study trait-trait linkage with/without observed genotypes, or trait-marker linkage with observed marker genotype, which include sib pair analysis as a special case. Using this model as an illustration, we analyze a set of nuclear family data to reveal the genetic connection of two traits which are known have close phenotypic relationships. Some possible extension of future work is discussed.

Methods

We describe the method for quantitative traits and nuclear family, the cases for qualitative traits or combined traits are similar, the general pedigree case can be analyzed by breaking it into nuclear families. Let yf, ym and yo be d-dimensional observations of the father, mother and off spring respectively, where

y f = ( y f,1 ,.., y f,d ) T , y m = ( y m,1 ,.., y m,d ) T , [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGMbaabeaakiabg2da9maabmaabaGaamyEamaaBaaaleaacaWGMbGaaiilaiaaigdaaeqaaOGaaiilaiaac6cacaGGUaGaaiilaiaadMhadaWgaaWcbaGaamOzaiaacYcacaWGKbaabeaaaOGaayjkaiaawMcaamaaCaaaleqabaGaamivaaaakiaacYcacaWG5bWaaSbaaSqaaiaad2gaaeqaaOGaeyypa0ZaaeWaaeaacaWG5bWaaSbaaSqaaiaad2gacaGGSaGaaGymaaqabaGccaGGSaGaaiOlaiaac6cacaGGSaGaamyEamaaBaaaleaacaWGTbGaaiila[email protected][email protected] y o = ( y 1 ,.., y n ) T , y j = ( y j,1 ,.., y j,d ) T ,( j=1,...,n ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamyEamaaBaaaleaacaWGVbaabeaakiabg2da9maabmaabaGaamyEamaaBaaaleaacaaIXaaabeaakiaacYcacaGGUaGaaiOlaiaacYcacaWG5bWaaSbaaSqaaiaad6gaaeqaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacaWGubaaaOGaaiilaiaadMhadaWgaaWcbaGaamOAaaqabaGccqGH9aqpdaqadaqaaiaadMhadaWgaaWcbaGaamOAaiaacYcacaaIXaaabeaakiaacYcacaGGUaGaaiOlaiaacYcacaWG5bWaaSbaaSqaaiaadQgacaGGSaGaamizaaqabaaakiaawIcacaGLPaaadaahaaWcbeqaaiaadsfaaaGccaGGSaWaaeWaaeaacaWGQbGaeyypa0Jaa[email protected][email protected] and n is the number of sibs in the nuclear family. Denote y = (yf, ym, yo)T and its underlying random variable by Y = (Yf, Ym, Yo)T. Let L1 and L2 be the two loci under consideration for linkage analysis, we assume there are two alleles at each locus, with a1|b1 for L1 and a2 |b2 for L2. We code the genotype at each locus as 0, 1 and 2 for b|b, a|b (b|a) and a|a respectively, r be the recombinant fraction - the probability that a gamet is recombinant, n be the sib size for the family. Let gfi, gmi and gji be the genotypes of father mother and the j-th sib at locus i (i = 1, 2), p1i and p2i be the proportion of the corresponding genotype at locus i. Let pij be the proportion of the haplotype ( g i g j ), ( i,j=0,1,2 ),  g f =( g f1 g f2 ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaeWaaeaadaWcaaqaaiaadEgadaWgaaWcbaGaamyAaaqabaaakeaacaWGNbWaaSbaaSqaaiaadQgaaeqaaaaaaOGaayjkaiaawMcaaiaacYcacaqGGaWaaeWaaeaacaWGPbGaaiilaiaadQgacqGH9aqpcaaIWaGaaiilaiaaigdacaGGSaGaaGOmaaGaayjkaiaawMcaaiaacYcacaqGGaGaam4zamaaBaaaleaacaWGMbaabeaakiabg2da9maabmaabaWaaSaaaeaacaWGNbWaaSbaaSqaaiaadAgacaaIXaaabeaaa[email protected][email protected] g m =( g m1 g m2 ),  g j =( g j1 g j2 ), T( g j | g f , g m ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaam4zamaaBaaaleaacaWGTbaabeaakiabg2da9maabmaabaWaaSaaaeaacaWGNbWaaSbaaSqaaiaad2gacaaIXaaabeaaaOqaaiaadEgadaWgaaWcbaGaamyBaiaaikdaaeqaaaaaaOGaayjkaiaawMcaaiaacYcacaqGGaGaam4zamaaBaaaleaacaWGQbaabeaakiabg2da9maabmaabaWaaSaaaeaacaWGNbWaaSbaaSqaaiaadQgacaaIXaaabeaaaOqaaiaadEgadaWgaaWcbaGaamOAaiaaikdaaeqaaaaaaOGaayjkaiaawMcaaiaacYcacaqGGaGaamivamaabmaabaGaam4zamaaBaaaleaacaWGQbaabeaakmaaeeaabaGaam4zamaaBaaaleaacaWGMbaabeaakiaa[email protected][email protected]

be the transmission probability of the sibs genotype given those of the parents. Note that there are 9 possible composite genotypes at the two loci for each individual. Consider the multivariate model and the notations as in Yuan and Bonney [10], assume unknown phase, the likelihood for a given nuclear family can be written as

L( y )= g f P( g f )f( y f | g f ) g m P( g m )f( y m | y f , g f , g m )K( g f , g m ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaabmaabaGaamyEaaGaayjkaiaawMcaaiabg2da9maaqafabaGaamiuamaabmaabaGaam4zamaaBaaaleaacaWGMbaabeaaaOGaayjkaiaawMcaaiaadAgadaqadaqaaiaadMhadaWgaaWcbaGaamOzaaqabaGcdaabbaqaaiaadEgadaWgaaWcbaGaamOzaaqabaaakiaawEa7aaGaayjkaiaawMcaaaWcbaGaam4zamaaBaaameaacaWGMbaabeaaaSqab0GaeyyeIuoakmaaqafabaGaamiuamaabmaabaGaam4zamaaBaaaleaacaWGTbaabeaaaOGaayjkaiaawMcaaiaadAgadaqadaqaaiaadMhadaWgaaWcbaGaamyBaaqabaGcdaabbaqaaiaadMhadaWgaaWcbaGaamOzaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGMbaabeaakiaacYcacaWGNbWaaSbaaSqaaiaad2gaaeqaaaGccaGLhWoaaiaawIcacaGLPaaacaWGlbWaaeWaaeaacaWGNbWaaSbaaSqaaiaadAgaaeqaaOGaaiilaiaadEgadaWgaaWcbaGaamyBaaqabaaakiaawIc[email protected][email protected] × j=1 n g j T( g j | g f , g m )f( y j | y f , g f , y m , g m ,g j ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey41aq7aaebCaeaadaaeqbqaaiaadsfadaqadaqaaiaadEgadaWgaaWcbaGaamOAaaqabaGcdaabbaqaaiaadEgadaWgaaWcbaGaamOzaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGTbaabeaaaOGaay5bSdaacaGLOaGaayzkaaGaamOzamaabmaabaGaamyEamaaBaaaleaacaWGQbaabeaakmaaeeaabaGaamyEamaaBaaaleaacaWGMbaabeaakiaacYcacaWGNbWaaSbaaSqaaiaadAgaaeqaaOGaaiilaiaadMhadaWgaaWcbaGaamyBaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGTbaabeaakiaacYcacaWGNbaacaGLhWoadaWgaaWcbaGaamOAaaqabaaakiaawIcacaGLPaaaaSqaaiaadEgadaWgaaadbaGaamOAaaqabaaaleqa[email protected][email protected]

where, each summation is over all the genotypes of that individual at the two loci, in its general form with un observed genotypes at both loci, and T(gj |gf , gm) is the transmission probability for the case of unknown phase. In model (1) the conditional densities f (yf |gf), f (ym|yf, gf, gm) and f (yj |yf, gf, ym, gm, gj) can be any general densities. Latter on for easy of exposition and convenience of application, we will assume that f(yf |gf) is the d-dimensional normal density with mean

μ f = i=1 9 β i χ( g f =i )+β x f [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd02aaSbaaSqaaiaadAgaaeqaaOGaeyypa0ZaaabCaeaacqaHYoGydaWgaaWcbaGaamyAaaqabaGccqaHhpWydaqadaqaaiaadEgadaWgaaWcbaGaamOzaaqabaGccqGH9aqpcaWGPbaacaGLOaGaayzkaaGaey4kaSIaeqOSdiMaamiEamaaBaaaleaa[email protected][email protected]

and variance matrix Σf, where the Χ(gf = i) denote the event that the father's composite genotype if of type i, β's are d-dimensional vector of parameters and xf is the covariates matrix for the father; in the same manner, f(ym|yf , gf , gm) is the conditional normal density with mean

μ m + Ω p f 1 ( y f μ f ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd02aaSbaaSqaaiaad2gaaeqaaOGaey4kaSIaeuyQdC1aaSbaaSqaaiaadchaaeqaaOWaaabmaeaadaqadaqaaiaadMhadaWgaaWcbaGaamOzaaqabaGccqGHsislcqaH8oqBdaWgaaWcbaGaamOzaaqabaa[email protected][email protected]

where

μ m = i=1 9 β i χ( g m =i )+β x m [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd02aaSbaaSqaaiaad2gaaeqaaOGaeyypa0ZaaabCaeaacqaHYoGydaWgaaWcbaGaamyAaaqabaGccqaHhpWydaqadaqaaiaadEgadaWgaaWcbaGaamyBaaqabaGccqGH9aqpcaWGPbaacaGLOaGaayzkaaGaey4kaSIaeqOSdiMaamiEamaaBaaaleaa[email protected][email protected]

and variance matrix Σ m Ω p Σ f 1 Ω p [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeu4Odm1aaSbaaSqaaiaad2gaaeqaaOGaeyOeI0IaeuyQdC1aaSbaaSqaaiaadchaaeqaaOGaeu4Odm1aa0baaSqaa[email protected][email protected] and Σm is the variance matrix of mother alone and Ωp is the between-parents correlation matrix. Furthermore, we take K(gf,gm) as the K-function as in Yuan and Bonney [10] which is an adjustment factor for the product of the penetrance of the sibs given the parents genotypes and f(yj|yf, gf, ym, gm, gj) is the conditional normal density function with mean

μ j + Ω sp p 1 ( y p μ p ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeqiVd02aaSbaaSqaaiaadQgaaeqaaOGaey4kaSIaeuyQdC1aaSbaaSqaaiaadohacaWGWbaabeaakmaaqadabaWaaeWaaeaacaWG5bWaaSbaaSqaaiaadchaaeqaaOGaeyOeI0IaeqiVd02aaSbaaSqaaiaadchaaeqaa[email protected][email protected]

where Ωsp=(Ωsfsm) is the sib-parents correlation matrix which is composed of the sib-father and sib-mother blocks of correlation matrices,

p =( Σ f Ω p Ω p Σ m ) , y p =( y f y m ), μ p =( μ f μ m ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaWaaabeaeaacqGH9aqpdaqadaqaauaabeqaciaaaeaacqqHJoWudaWgaaWcbaGaamOzaaqabaaakeaacqqHPoWvdaWgaaWcbaGaamiCaaqabaaakeaacqqHPoWvdaWgaaWcbaGaamiCaaqabaaakeaacqqHJoWudaWgaaWcbaGaamyBaaqabaaaaaGccaGLOaGaayzkaaaaleaacaWGWbaabeqdcqGHris5aOGaaiilaiaadMhadaWgaaWcbaGaamiCaaqabaGccqGH9aqpdaqadaqaauaabeqaceaaaeaacaWG5bWaaSbaaSqaaiaadAgaaeqaaaGcbaGaamyEamaaBaaaleaacaWGTbaabeaaaaaakiaawIcacaGLPaaacaGGSaGaeqiVd02aaSbaaSqaaiaadchaaeqaaOGaeyypa0ZaaeWaaeaafaqabeGabaaabaGaeqiVd02aaSbaaSqaaiaadAgaa[email protected][email protected]

and variance matrix Σ s Ω sp Σ p 1 Ω sp [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaeu4Odm1aaSbaaSqaaiaadohaaeqaaOGaeyOeI0IaeuyQdC1aaSbaaSqaaiaadohacaWGWbaabeaakiabfo6atnaaDaaaleaacaWGWba[email protected][email protected] . Note that although we use the same coding for the two loci, but f1=0 and f2=0 do not mean the same gene at the two loci. The specification of the joint genotype proportion pij's and the transmission probabilities T(gj|gf, gm) is put expression (10) latter, and its values are given in Table II.

Note in model (1), typically there are many zero components of the transmission probability T(gj|gf, gm), so that it will be more efficient to evaluate T(gj|gf, gm) first, if its non-zero then compute the penetrances for the family members, otherwise ignore the computation for that combination of genotypes. The T(gj|gf, gm)'s are functions of the recombination fraction r. When the phase is known, (1) should be modified as

L( y )= g f P( g f )f( y f | g f ) g m P( g m )f( y m | y f , g f , g m )K( g f , g m ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaabmaabaGaamyEaaGaayjkaiaawMcaaiabg2da9maaqafabaGaamiuaiaacIcacaWGNbWaaSbaaSqaaiaadAgaaeqaaOGaaiykaiaadAgacaGGOaGaamyEamaaBaaaleaacaWGMbaabeaakmaaeeaabaGaam4zamaaBaaaleaacaWGMbaabeaaaOGaay5bSdGaaiykaaWcbaGaam4zamaaBaaameaacaWGMbaabeaaaSqab0GaeyyeIuoakmaaqafabaGaamiuaiaacIcacaWGNbWaaSbaaSqaaiaad2gaaeqaaOGaaiykaiaadAgacaGGOaGaamyEamaaBaaaleaacaWGTbaabeaakmaaeeaabaGaamyEamaaBaaaleaacaWGMbaabeaaaOGaay5bSdGaaiilaiaadEgadaWgaaWcbaGaamOzaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGTbaabeaakiaacMcacaWGlbGaaiikaiaadEgadaWgaaWcbaGaamOzaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGTbaabea[email protected][email protected] × j=1 n g j T 1 ( g j | g f , g m ;h ( g j , g f , g m ))f( y j | y f , g f , y m , g m , g j )             (2) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey41aq7aaebCaeaadaaeqbqaaiaadsfadaWgaaWcbaGaaGymaaqabaGccaGGOaGaam4zamaaBaaaleaacaWGQbaabeaakmaaeeaabaGaam4zamaaBaaaleaacaWGMbaabeaakiaacYcacaWGNbWaaSbaaSqaaiaad2gaaeqaaOGaai4oaiaadIgaaiaawEa7aiaacIcacaWGNbWaaSbaaSqaaiaadQgaaeqaaOGaaiilaiaadEgadaWgaaWcbaGaamOzaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGTbaabeaakiaacMcacaGGPaGaamOzaiaacIcacaWG5bWaaSbaaSqaaiaadQgaaeqaaOWaaqqaaeaacaWG5bWaaSbaaSqaaiaadAgaaeqaaaGccaGLhWoacaGGSaGaam4zamaaBaaaleaacaWGMbaabeaakiaacYcacaWG5bWaaSbaaSqaaiaad2gaaeqaaOGaaiilaiaadEgadaWgaaWcbaGaamyBaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGQbaabeaakiaacMcaaSqaaiaadEgadaWgaaadbaGaamOAaaqabaaaleqaniabggHiLdaaleaacaWGQbGaeyypa0JaaGymaaqaaiaad6gaa0Gaey4dIunakiaabccacaqGGaGaaeiiaiaabccacaqGGaGaaeiia[email protected][email protected]

where T1(gj|gf,gm; h(gj,gf,gm)) is the transmission probability for the give phase configuration h(gj, gf, gm) of (gj|gf, gm). So (1) is can be rewritten as

L( y )= g f P( g f )f( y f | g f ) g m P( g m )f( y m | y f , g f , g m )K( g f , g m ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaabmaabaGaamyEaaGaayjkaiaawMcaaiabg2da9maaqafabaGaamiuaiaacIcacaWGNbWaaSbaaSqaaiaadAgaaeqaaOGaaiykaiaadAgacaGGOaGaamyEamaaBaaaleaacaWGMbaabeaakmaaeeaabaGaam4zamaaBaaaleaacaWGMbaabeaaaOGaay5bSdGaaiykaaWcbaGaam4zamaaBaaameaacaWGMbaabeaaaSqab0GaeyyeIuoakmaaqafabaGaamiuaiaacIcacaWGNbWaaSbaaSqaaiaad2gaaeqaaOGaaiykaiaadAgacaGGOaGaamyEamaaBaaaleaacaWGTbaabeaakmaaeeaabaGaamyEamaaBaaaleaacaWGMbaabeaaaOGaay5bSdGaaiilaiaadEgadaWgaaWcbaGaamOzaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGTbaabeaakiaacMcacaWGlbGaaiikaiaadEgadaWgaaWcbaGaamOzaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGTbaabea[email protected][email protected] × h( g j , g f , g m ) P(h( g j , g f , g m )) j=1 n g j T 1 ( g j | g f , g m ,h( g j , g f , g m ))f( y j | y f , g f , y m , g m , g j ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey41aq7aaabuaeaacaWGqbGaaiikaiaadIgacaGGOaGaam4zamaaBaaaleaacaWGQbaabeaakiaacYcacaWGNbWaaSbaaSqaaiaadAgaaeqaaOGaaiilaiaadEgadaWgaaWcbaGaamyBaaqabaGccaGGPaGaaiykamaarahabaWaaabuaeaacaWGubWaaSbaaSqaaiaaigdaaeqaaaqaaiaadEgadaWgaaadbaGaamOAaaqabaaaleqaniabggHiLdaaleaacaWGQbGaeyypa0JaaGymaaqaaiaad6gaa0Gaey4dIunakiaacIcacaWGNbWaaSbaaSqaaiaadQgaaeqaaOWaaqqaaeaacaWGNbWaaSbaaSqaaiaadAgaaeqaaaGccaGLhWoacaGGSaGaam4zamaaBaaaleaacaWGTbaabeaakiaacYcacaWGObGaaiikaiaadEgadaWgaaWcbaGaamOAaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGMbaabeaakiaacYcacaWGNbWaaSbaaSqaaiaad2gaaeqaaOGaaiykaiaacMcacaWGMbGaaiikaiaadMhadaWgaaWcbaGaamOAaaqabaGcdaabbaqaaiaadMhadaWgaaWcbaGaamOzaaqabaaakiaawEa7aiaacYcacaWGNbWaaSbaaSqaaiaadAgaaeqaaOGaaiilaiaadMhadaWgaaWcbaGaamyBaaqabaGccaGGSaGaam4zamaaBaaaleaacaWGTbaabeaakiaacYcacaWGNbWaaSbaaSqaaiaadQgaaeqaaOGaaiykaaWcbaGaamiAaiaacIcacaWGNbWaaSbaaWqaaiaadQgaaeqaaSGaaiilaiaadEgadaWgaaadbaGaamOzaaqabaW[email protected][email protected]

where Σh(gj,gf,gm) is summation across all different phase configurations h(gj, gf, gm)s of (gj, gf, gm), and ?(h(gj, gf, gm)) is the probability of configuration h(gj, gf, gm). The number of different phase configurations of (gj, gf, gm) depends on the number of heterozygote's in it. Note here we have two loci, each locus has two genotypes, and the genotypes of the parents are assumed independent, as common in the literature. If there are k (0 ≤ k ≤ 6) heterogygotes in (gj, gf, gm), then there are 2k different phase configurations, and each has probability P (h(gj, gf, gm))=1/2k. This method needs to enlist all the different phase configurations, since different triple (gj, gf, gm) may have different number of phase configurations, this method will not be easy in terms of programming. A more convenient way in programming is to treat each genotype as heterozygote, and sum over all the 26=64 phase configurations each with probability 1/64. Although this way will have some redundant computations, but is a general procedure, it does not require to enlist the phase configurations for each triple (gj, gf, gm), and so is easy to programming. The values of T(gj|gf, gm) are given in Table II in the Appendix, for all possible composite genotypes of (gj, gf, gm). This is a general procedure for programming without the knowledge of the phase configuration for each triple.

Linkage between trait loci

For simplicity, we only consider the case of two phenotypes controlled by their own loci with unobserved genotypes at both loci.

Linkage between trait and marker loci

Suppose the data y is controlled by one locus with unobserved genotype, and we have the genotype g2 of y at the marker locus, a common assumption is that, g2 has no epistatic interaction with y, i.e. g and y has no direct connection, but g2 has relationship with the unobserved genotype of y, and phase unknown. In this case (1) becomes

L( y )= g f1 P( g f )f( y f | g f ) g m1 P( g m )f( y m | y f , g f , g m )K( g f1 , g m1 ) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaamitamaabmaabaGaamyEaaGaayjkaiaawMcaaiabg2da9maaqafabaGaamiuaiaacIcacaWGNbWaaSbaaSqaaiaadAgaaeqaaOGaaiykaiaadAgacaGGOaGaamyEamaaBaaaleaacaWGMbaabeaakmaaeeaabaGaam4zamaaBaaaleaacaWGMbaabeaaaOGaay5bSdGaaiykamaaqafabaGaamiuaiaacIcacaWGNbWaaSbaaSqaaiaad2gaaeqaaOGaaiykaiaadAgacaGGOaGaamyEamaaBaaaleaacaWGTbaabeaakmaaeeaabaGaamyEamaaBaaaleaacaWGMbaabeaakiaacYcacaWGNbWaaSbaaSqaaiaadAgaaeqaaOGaaiilaiaadEgadaWgaaWcbaGaamyBaaqabaaakiaawEa7aiaacMcacaWGlbGaaiikaiaadEgadaWgaaWcbaGaamOzaiaaigdaaeqaaOGaaiilaiaadEgadaWgaaWcbaGaamyBaiaaigdaaeqaaOGaaiykaaWcbaGaam4zamaaBaaameaacaWGTbGaaGymaaqabaaaleqaniabggHi[email protected][email protected] × j=1 n g f1 T ( g j | g f , g m )f( y j | y f , g f , y m , g m , g j )         (3) [email protected]@[email protected]@+=feaaguart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaGaey41aq7aaebCaeaadaaeqbqaaiaadsfaaSqaaiaadEgadaWgaaadbaGaamOzaiaaigdaaeqaaaWcbeqdcqGHris5aaWcbaGaamOAaiabg2da9iaaigdaaeaacaWGUbaaniabg+GivdGccaGGOaGaam4zamaaBaaaleaacaWGQbaabeaakmaaeeaabaGaam4zamaaBaaaleaacaWGMbaabeaakiaacYcacaWGNbWaaSbaaSqaaiaad2gaaeqaaaGccaGLhWoacaGGPaGaamOzaiaacIcacaWG5bWaaSbaaSqaaiaadQgaaeqaaOWaaqqaaeaacaWG5bWaaSbaaSqaaiaadAgaaeqaaOGaaiilaiaadEgadaWgaaWcbaGaamOzaaqabaGccaGGSaGaamyEamaaBaaaleaacaWGTbaabeaakiaacYcacaWGNbWaaSbaaSqaaiaad2gaaeqaaOGaaiilaiaadEgadaWgaaWcbaGaamOAaaqabaaakiaawEa7aiaacMcacaqGGaGaaeiiaiaabcca[email protected][email protected]

here the summation is only for all the genotypes at the trait locus.

Point analysis

One way of multi-point linkage analysis is to perform 3-point analysis step by step across the segment span the multipoint. Here we use our model to address the 3-point analysis. In this problem, we have two markers and an unknown disease locus, which may lie between the two markers or outside the interval between them. We assume that the case is unknown, while the model is similar when the phase is known. Again, we only need to specify the likelihood for one family. The composite genotypes are gf = (gf1, gf2, gf3) for the father, gm= (gm1, gm2, gm3) for the mother, and gj= (gj1, gj2, gj3) for the j-th sib. We assume the first and second genotypes in the composite genotype of each individual are the observed genotypes at markers 1 and 2, the third marker gj3 is the unobserved disease genotype, assuming marker gj1 is located at the left side of marker gj2 on the chromosome. Since we have three loci, there are three recombination fractions for the three pair wise loci. Denote r1 as the recombination fraction between marker 1 and the disease marker, r2 as that between marker 2 and the disease, r3 as that between the first two markers, and T(gj1, gj2, gj3|gf1, gf2, gf3); (gm1, gm2, gm3)) the 3-point transmission probability, which is a function of (r1, r2, r3). In this case, (3) is rewritten as

Appendix A: