Multivariate Granger Causality Analysis of Obesity Related Variables

Special Article - Biostatistics Theory and Methods

Austin Biom and Biostat. 2015;2(2): 1019.

Multivariate Granger Causality Analysis of Obesity Related Variables

Nitai D Mukhopadhyay*, David Wheeler, Roy Sabo and Shumei S Sun

Department of Biostatistics, Virginia Commonwealth University, USA

*Corresponding author: Nitai D Mukhopadhyay,Department of Biostatistics, Virginia Commonwealth University, Virginia, USA.

Received: June 01, 2015; Accepted: June 11, 2015; Published: June 19, 2015


Obesity is a complex health outcome that is a combination of multiple health indicators. Here we attempt to explore the dependence network among multiple aspects of obesity. Two longitudinal cohort studies across multiple decades have been used. The concept of causality is defined similar to Granger causality among multiple time series, however, modified to accommodate multivariate time series as the nodes of the network. Our analysis reveals relatively central position of physical measurements and blood chemistry measures in the overall network across both genders. Also there are some patterns specific to only male or female population. The geometry of the causality network is expected to help in our strategy to control the increasing trend of obesity rate.

Keywords: Obesity; Granger causality; Network; Canonical correlation


In the US overweight or obesity affects two of three adults and one in three of their children. The authors of the recently published IOM Report on Obesity Prevention (2012) lamented that the epidemic is a “startling setback to major improvements in child health attained in the past century.”

Obesity impairs the metabolic and cardiovascular health of both adults and children and threatens to shorten the life span of the current generation of children [1]. The secular increases we have witnessed in the prevalence of childhood obesity presage an increase in the prevalence of T2DM as early as the second decade of life [2- 4]. The origins of obesity include individual genetic, neurohumoral, and physiological factors as well as familial, social, economic, environmental and policy decisions that influence children’s diet and physical activity.

Obesity is a complex phenotype that is captured through multiple surrogate measurements. Though focus on individual phenotypes of physical nature (such as BMI) can over-simplify this complexity, all of the body function measures that are seemingly inter-related with obesity are highly correlated among themselves. In this manuscript we intend to estimate the dependence pattern among many of the phenotypes and phenotypic groups in the context of obesity.

Aside from the presence or absence of any association, it is also important to understand the interplay between the various measures of body dysfunction and their downstream implications. Thus, among observed associations we will also focus on determining the direction of any causal relationships amongst the measures. As methods to infer causality are often labeled controversial, we instead intend to focus away from the methodological arguments and focus on the issue of emergence of obesity. We borrow the concept of Granger causality from econometrics [5] and adapt it to our context.

In the context of gene interaction, causality inference can be applied to decipher interactions within a network of hundreds of genes [6]. The size of the causality network for childhood obesity is smaller than gene networks, but remains large enough to derive causal inferences. Because the origins of childhood obesity have been widely studied, we have a reasonable understanding of the pathophysiology of childhood obesity that should provide our developed causality network with biological plausibility.

Studies of gene interactions use the correlation coefficient and its variations as a measure of interaction [7]. Used partial correlations, empirical Bayes methodology, and bootstrap methods to derive gene networks [8]. Used correlation as a primary tool for constructing networks and pathways among genes and for analyzing gene clusters. Correlation is an effective tool for computing direction-free linear dependence when a sample of independent data is available. In this proposal, we analyze longitudinal data as opposed to cross-sectional data. The time dependent autocorrelated measurements that characterize longitudinal data can be studied by time-lagged associations for the purposes of establishing causality. Graphical interaction models based on such analyses have been developed by [9] and applied to biological time series by [10-12]. Reported a detailed comparative study of techniques for directed interactions in multivariate time series. Based on these studies, we developed a longitudinal Granger causality network to establish causal relationships among genes [6]. The present paper uses the longitudinal Granger causality network method to analyze multivariate longitudinal data to study causal relations among factors associated with childhood obesity. In the context of childhood obesity, an inferred causality network that includes relevant biological variables could be used to identify those variables that would be most susceptible to interventions to prevent or delay the onset of childhood obesity.

Materials & Methods

Our primary data for obesity inference is the National Heart, Lung and Blood Institute (NHLBI) Growth and Health Study (NGHS), which includes detailed growth profiles of 2380 girls (1,213 African-American and 1,166 Caucasian girls) between the ages of 9 and 21 years. Visits were scheduled annually during the 10-year enrollment, and at each visit measurements of Body Mass Index (BMI), waist circumference, skin fold thickness, blood pressure, blood chemistry, eating habits, and socioeconomic data were taken on each subject. The study population was 40% Caucasian and 51% African Americans. As the NGHS cohort is exclusively female, we also used data from the Fels Longitudinal Study (FLS), grouped into gender categories as our parallel study data for boys. The FLS started annual enrollment of 20-30 infants in 1929, and continues enrollment to follow the participants up to the present time. Like the NGHS, FLS participants provided measurements on body composition, blood pressure, blood chemistry, sexual maturity, cardio-vascular health, etc. over the life span. Visits are scheduled five times during the first year after birth, twice a year after that until age 18, and once every two years in adulthood. As we focus our analysis only on post-pubertal visits, our primary dataset excluded the visits in Tanner stages I and II of sexual maturity. Table 1 shows the mean and standard deviation of the baseline measurements of the demographic variables and the blood chemistry measures where baseline is the first visit after pubertal period.