Sunesara I; Lirette ST; Griswold ME

Review Article

Austin Biom and Biostat. 2015; 2(4): 1028.

Survey Tables Binary: A SAS Macro for Publication Quality Tables of Complex Survey Data

Sunesara I¹*, Lirette ST¹ and Griswold ME¹

Department of Center of Biostatistics and Bioinformatics, University of Mississippi Medical Center, USA

²Ontario Cancer Institute, Princess Margaret Hospital,

*Corresponding author: Sunesara I, Center of Biostatistics and Bioinformatics, University of Mississippi Medical Center, 2500 N State St, Jackson, MS, 39216, USA

Received: September 16, 2015; Accepted: December 08, 2015; Published: December 14, 2015

Abstract

Production of publication-quality tables can be time consuming and tedious. The repetitive copy/paste or the often inaccurate typing by hand is less than optimal solutions for a very common problem. Proc survey in SAS is a very powerful tool for complex multistage probability sampling designs, but digesting the output can be overwhelming. We present a SAS macro that gives the user concise publication quality tables for complex survey data which uses design variables such as stratification, clustering and sampling weights.

Keywords: Complex survey; Multi-stage sampling; Design variables; Population; SAS; Tables

Introduction

SAS proc survey procedures are available to handle complex Multi-Stage Probability Sampling Designs (MDPS), each producing a plethora of analytic output. Unlike other procedures in SAS and competing statistical packages, the survey procedures provide appropriate parameter estimates from a known probability sample by incorporating the necessary design weights. Generally the output produced is extremely valuable to the researcher but is not output in a concise, publishable format. Even when using ODs export functions of tables into output destinations such as html, pdf or rtf formats, the output often requires post transfer processing. Producing publicationquality tables by copying and pasting into formatted shells can be tedious, laborious, and prone to typing errors as well as needing further processing. In this paper we present a SAS macro which automates the production of publication ready tables for complex sampling survey data directly from SAS using the ODs capabilities. We illustrate the macro using a sample from the National Health and Nutritional Education Survey (NHANES) [1]. This study uses multi-stage sampling procedures, which introduces design variables for stratification and clustering, similar to the Medical Monitoring Project [2], and related sampling weights for analysis in order to infer back upon the population of interest from which the sampling frame was derived. In this work, we are most interested in estimates of population prevalence and, therefore, limit the macro mainly to producing proportions and their associated measures of variance and confidence.

Description of Example Datasets

For our example, a combined dataset (N=5871) of NHANES from years 2001 - 2006 is used for show-casing the macro. The dataset includes the subset of variables from NHANES shown in Table 1. Using this example data set; we wish to create (Tables 2 & 3) for demographic characteristics of our sample to illustrate the macro.

Table 1: Description of example dataset.




  
    Variable Type 
    Variable Name 
    Variable Description 
    Variable Attribute 
  
  
    Popln* Characteristic
    RIAGENDR
    Gender, (Boys/Girls)
    Categorical
  
  
    Popln Characteristic
    RIDAGEYR
    Age at screening
    Continuous
  
  
    Popln Characteristic
    BMIGROUP
    Body Mass Index
    Categorical
  
  
    Popln Characteristic
    RACE
    Race
    Categorical
  
  
    Popln Characteristic
    VSTATUS
    Vitamin levels
    Categorical
  
  
    Subgroup
    METSYN
    Metabolic Syndrome
    Categorical
  
  
    Popln Characteristic
    INDFMPIR
    Family poverty index ratio
    Continuous
  
  
    Popln Characteristic
    BMXBMI
    Body Mass Index
    Continuous
  
  
    Design
    SDMVSTRA
    Sampling Stratum
    Design
  
  
    Design
    SDMVPSU
    Sampling Cluster
    Design
  
  
    Design
    MEC6YR
    Sampling Weight
    Design
  
  
    Footnote: Poplin*: Population



Table 1:  Description of example dataset.

Table 2: Table shell for overall participant’s characteristics.




  
    Characteristics 
    Levels 
    N (%) / MN (sd?) 
    95%CI 
  
  
    Body Mass Index
    
    
    
  
  
    Gender
    Boys
    
    
  
  
    
    Girls
    
    
  
  
    
    Total



Table 2:  Table shell for overall participant’s characteristics.

Table 3: Table shell for binary (yes/no) subgroup (metabolic syndrome) with association statistics.




  
    Characteristics 
    Levels 
    Total 
    Total(95%CI) 
    No 
    No(95%CI) 
    Yes 
    Yes(95%CI) 
    p-value 
  
  
    Body Mass Index
    
    
    
    
    
    
    
    
  
  
    Gender
    Boys
    
    
    
    
    
    
    
  
  
    
    Girls
    
    
    
    
    
    
    
  
  
    
    Total



Table 3:  Table shell for binary (yes/no) subgroup (metabolic syndrome) with association statistics.

Features and options

Variance: For variance computation necessary to provide confidence intervals and errors, only Taylor series estimation [3] is currently available in the macro. The survey procedures in SAS do include resampling methods for variance estimation, such as, Balanced Repeated Replication (BRR) and Jackknife (JK); these additional methods are intended to be included in future releases and should be a straightforward addition.

Figure 1: Screenshot of Table 1 output for example dataset.

    
    
    Figure 1:  Screenshot of Table 1 output for example dataset.

Figure 2: Screenshot of Table 2 output for example dataset.

    
    
    Figure 2:  Screenshot of Table 2 output for example dataset.

Missingness: When requesting binary subgroup analysis, the default missingness structure for SAS survey procedures is Missing Completely at Random (MCAR) [4]. Therefore, the macro call assumes MCAR. The Not Missing Completely at Random (NOMCAR) option can be requested and is specified within the source code of the macro. The nomcar option is useful when one cannot assume data values are missing completely at random, and, thus, calculates the variance appropriately. This option applies only to Taylor series variance estimation [4]. However, as noted, this only applies to binary subgroup analysis (Table 2). For estimated means and percentages of overall participant characteristics (Table 1), a MCAR missingness structure is assumed.

Relative standard error: The Standard Error (STDErr) is primarily a measure of the sampling variability that occurs by chance when only a sample, rather than an entire universe, is surveyed [5,6]. Proper estimation of STDerr is important in providing appropriate estimates, p-values, and confidence intervals based on design weights. Relative Standard Error (RSE) is one of the criteria to check for reliability of estimates (mean or percent) [7]. RSE is obtaining by dividing the standard error by the estimate itself (RSE= STDErr / Estimate) [8]. The macro relies on understanding the order of computation, either row or column proportions as needed can be output. If the row option is specified in the macro, row proportions and STDErr will be calculated appropriately. Likewise, column proportions (the default) and STDErr can be calculated with the call option for clarity. The resulting RSE is then expressed as a percent, where 20% or 30% are commonly chosen as reliable estimates. For this macro, the end user should specify 0.30 if they desire a cut point of 30% RSE. By default, the macro will calculate RSE at 20%. Unreliable estimates [7] based on RSE criteria only are marked by double dagger sign (‡) in the output generated by this macro at the specified RSE cut point.

Output: The macro creates a folder named “result” under the active directory that contains relevant output. If the folder similarly named is available all the output will be saved within it. Output file names consist of concatenation of (Tables 1 & 2), name of the data file, and suffix of current date and time.

Implementing the macro

Macro parameters: The macro call allows for several options as well as required fields as noted in Table 4.

Table 4: Macro parameters.




  
    Parameter 
    Explanation 
    Mandatory/Optional 
  
  
    data
    Dataset name only
    Mandatory
  
  
    groupvar
    Binary Outcome or subgroup of    interest
      (Should be coded as 0=No and    1=Yes) (Defines Columns to split)
    Mandatory for Table 2
  
  
    categorical_vars
    Enter all categorical variables    (e.g. Gender…) (Row Variables)
    Mandatory
  
  
    continous_vars
    Enter all continuous variables    (e.g. Age…) (Row Variables)
    Mandatory
  
  
    strata
    Stratification variable
    Mandatory
  
  
    percent_kind
    Row or Column percent (Default=column)
    Mandatory
  
  
    cluster
    Cluster variable
    Mandatory
  
  
    weights
    sampling weights
    Mandatory
  
  
    rse
    Relative Standard Error    (Default = 0.20)
      Input range 0.00 to 1.00
      Recommended 0.20 or 0.30
    Mandatory
  
  
    title1
    Title for the Table of Overall Characteristics
    optional
  
  
    title2
    Title for the Table of    Characteristics split by a binary variable
    optional



Table 4:  Macro parameters.

To download the macro please uses the link (https://sites.google. com/site/imransunesara/macros-programs/sas-software).

Recommended steps to use the macro using example dataset.

Step 1) prepare the dataset: Apply formats to all categorical variables of interest. See appendix for details. Apply dummy coding (0=No, 1=Yes). Only necessary for (Table 2).

Step 2) Read in the Macro using %include statement.

Step 3) Plug in variables of interest.

% survey tables binary (strata = SDMVSTRA, cluster = SDMVPSU, weights = MEC6YR, data = Nhanes_01_06_metsys, categorical_vars = bmigroup RACE RIDEXMON RIAGENDR vstatus, continous_vars = BMXBMI RIDAGEYR INDFMPIR, percent_kind = col, groupvar = metsyn, rse = 0.30, table1title = Characteristics of participants, table2 title = Characteristics of participants by Metabolic Syndrome);

Generated output: This macro uses ODs rtf and ODs markup (Excel xp tag set) [4]. Various outputs have been programmed into it, with and without grid lines (Figures 1 & 2) are screenshots of tables in the example data set.

Errors and limitations

Common errors and/or warning messages generated and displayed in the log file typically result from categorical variables (like race) having “zero” in one of the cells, due to which association statistics are not calculated. The final table produced will contain estimates, but the p-value will be excluded. Another possible error message could be “Lock is not available”. The solution to this problem is to rerun the program. If error message persists, change the active directory to your project directory.

Conclusion

This macro helps in increasing productivity and reproducibility and also helps in preparing error free tables for summarizing data, reporting, and research publications.

Acknowledgement

Authors thank Dr. Warren May, Ph.D. for reviewing this manuscript. We would also like to thank the very supportive and informative SAS user community.

References

Download PDF

Citation: Sunesara I, Lirette ST and Griswold ME. Survey Tables Binary: A SAS Macro for Publication Quality Tables of Complex Survey Data. Austin Biom and Biostat. 2015; 2(4): 1028. ISSN: 2378-9840

Instruction for Authors

Submit Your Article

Characteristics	Levels	N (%) / MN (sd?)	95%CI
Body Mass Index
Gender	Boys
	Girls
	Total

Characteristics	Levels	Total	Total(95%CI)	No	No(95%CI)	Yes	Yes(95%CI)	p-value
Body Mass Index
Gender	Boys
	Girls
	Total