Review Article

Austin Biom and Biostat. 2015; 2(4): 1028.

# Survey Tables Binary: A SAS Macro for Publication Quality Tables of Complex Survey Data

Sunesara I¹*, Lirette ST¹ and Griswold ME¹

Department of Center of Biostatistics and Bioinformatics, University of Mississippi Medical Center, USA

²Ontario Cancer Institute, Princess Margaret Hospital,

***Corresponding author: **Sunesara I, Center of
Biostatistics and Bioinformatics, University of Mississippi
Medical Center, 2500 N State St, Jackson, MS, 39216,
USA

**Received: **September 16, 2015; **Accepted: **December 08, 2015; **Published: **December 14, 2015

## Abstract

Production of publication-quality tables can be time consuming and tedious. The repetitive copy/paste or the often inaccurate typing by hand is less than optimal solutions for a very common problem. Proc survey in SAS is a very powerful tool for complex multistage probability sampling designs, but digesting the output can be overwhelming. We present a SAS macro that gives the user concise publication quality tables for complex survey data which uses design variables such as stratification, clustering and sampling weights.

**Keywords:** Complex survey; Multi-stage sampling; Design variables;
Population; SAS; Tables

## Introduction

SAS proc survey procedures are available to handle complex Multi-Stage Probability Sampling Designs (MDPS), each producing a plethora of analytic output. Unlike other procedures in SAS and competing statistical packages, the survey procedures provide appropriate parameter estimates from a known probability sample by incorporating the necessary design weights. Generally the output produced is extremely valuable to the researcher but is not output in a concise, publishable format. Even when using ODs export functions of tables into output destinations such as html, pdf or rtf formats, the output often requires post transfer processing. Producing publicationquality tables by copying and pasting into formatted shells can be tedious, laborious, and prone to typing errors as well as needing further processing. In this paper we present a SAS macro which automates the production of publication ready tables for complex sampling survey data directly from SAS using the ODs capabilities. We illustrate the macro using a sample from the National Health and Nutritional Education Survey (NHANES) [1]. This study uses multi-stage sampling procedures, which introduces design variables for stratification and clustering, similar to the Medical Monitoring Project [2], and related sampling weights for analysis in order to infer back upon the population of interest from which the sampling frame was derived. In this work, we are most interested in estimates of population prevalence and, therefore, limit the macro mainly to producing proportions and their associated measures of variance and confidence.

## Description of Example Datasets

For our example, a combined dataset (N=5871) of NHANES from years 2001 - 2006 is used for show-casing the macro. The dataset includes the subset of variables from NHANES shown in Table 1. Using this example data set; we wish to create (Tables 2 & 3) for demographic characteristics of our sample to illustrate the macro.

**Table 1:**Description of example dataset.

Variable Type

Variable Name

Variable Description

Variable Attribute

Popln* CharacteristicRIAGENDR

Gender, (Boys/Girls)

Categorical

Popln CharacteristicRIDAGEYR

Age at screening

Continuous

Popln CharacteristicBMIGROUP

Body Mass Index

Categorical

Popln CharacteristicRACE

Race

Categorical

Popln CharacteristicVSTATUS

Vitamin levels

Categorical

SubgroupMETSYN

Metabolic Syndrome

Categorical

Popln CharacteristicINDFMPIR

Family poverty index ratio

Continuous

Popln CharacteristicBMXBMI

Body Mass Index

Continuous

DesignSDMVSTRA

Sampling Stratum

Design

DesignSDMVPSU

Sampling Cluster

Design

DesignMEC6YR

Sampling Weight

Design

Footnote: Poplin*: Population

Table 1:Description of example dataset.

**Table 2:**Table shell for overall participant’s characteristics.

Characteristics

Levels

N (%) / MN (sd?)

95%CI

Body Mass Index

GenderBoys

Girls

Total

Table 2:Table shell for overall participant’s characteristics.

**Table 3:**Table shell for binary (yes/no) subgroup (metabolic syndrome) with association statistics.

Characteristics

Levels

Total

Total(95%CI)

No

No(95%CI)

Yes

Yes(95%CI)

p-value

Body Mass Index

GenderBoys

Girls

Total

Table 3:Table shell for binary (yes/no) subgroup (metabolic syndrome) with association statistics.

## Features and options

**Variance:** For variance computation necessary to provide
confidence intervals and errors, only Taylor series estimation [3]
is currently available in the macro. The survey procedures in SAS do include resampling methods for variance estimation, such as,
Balanced Repeated Replication (BRR) and Jackknife (JK); these
additional methods are intended to be included in future releases and
should be a straightforward addition.

**Figure 1:**Screenshot of Table 1 output for example dataset.

Figure 1:Screenshot of Table 1 output for example dataset.

**Figure 2:**Screenshot of Table 2 output for example dataset.

Figure 2:Screenshot of Table 2 output for example dataset.

**Missingness:** When requesting binary subgroup analysis, the
default missingness structure for SAS survey procedures is Missing
Completely at Random (MCAR) [4]. Therefore, the macro call
assumes MCAR. The Not Missing Completely at Random (NOMCAR)
option can be requested and is specified within the source code of
the macro. The nomcar option is useful when one cannot assume
data values are missing completely at random, and, thus, calculates
the variance appropriately. This option applies only to Taylor series
variance estimation [4]. However, as noted, this only applies to binary
subgroup analysis (Table 2). For estimated means and percentages of overall participant characteristics (Table 1), a MCAR missingness
structure is assumed.

**Relative standard error:** The Standard Error (STDErr) is
primarily a measure of the sampling variability that occurs by chance
when only a sample, rather than an entire universe, is surveyed [5,6].
Proper estimation of STDerr is important in providing appropriate
estimates, p-values, and confidence intervals based on design
weights. Relative Standard Error (RSE) is one of the criteria to check
for reliability of estimates (mean or percent) [7]. RSE is obtaining
by dividing the standard error by the estimate itself (RSE= STDErr
/ Estimate) [8]. The macro relies on understanding the order of
computation, either row or column proportions as needed can be
output. If the row option is specified in the macro, row proportions
and STDErr will be calculated appropriately. Likewise, column
proportions (the default) and STDErr can be calculated with the call
option for clarity. The resulting RSE is then expressed as a percent,
where 20% or 30% are commonly chosen as reliable estimates. For this macro, the end user should specify 0.30 if they desire a cut
point of 30% RSE. By default, the macro will calculate RSE at 20%.
Unreliable estimates [7] based on RSE criteria only are marked by
double dagger sign (‡) in the output generated by this macro at the
specified RSE cut point.

**Output:** The macro creates a folder named “result” under the
active directory that contains relevant output. If the folder similarly
named is available all the output will be saved within it. Output file
names consist of concatenation of (Tables 1 & 2), name of the data
file, and suffix of current date and time.

## Implementing the macro

**Macro parameters:** The macro call allows for several options as
well as required fields as noted in Table 4.

**Table 4:**Macro parameters.

Parameter

Explanation

Mandatory/Optional

dataDataset name only

Mandatory

groupvarBinary Outcome or subgroup of interest

(Should be coded as 0=No and 1=Yes) (Defines Columns to split)

Mandatory for Table 2

categorical_varsEnter all categorical variables (e.g. Gender…) (Row Variables)

Mandatory

continous_varsEnter all continuous variables (e.g. Age…) (Row Variables)

Mandatory

strataStratification variable

Mandatory

percent_kindRow or Column percent (Default=column)

Mandatory

clusterCluster variable

Mandatory

weightssampling weights

Mandatory

rseRelative Standard Error (Default = 0.20)

Input range 0.00 to 1.00

Recommended 0.20 or 0.30

Mandatory

title1Title for the Table of Overall Characteristics

optional

title2Title for the Table of Characteristics split by a binary variable

optional

Table 4:Macro parameters.

To download the macro please uses the link (https://sites.google. com/site/imransunesara/macros-programs/sas-software).

Recommended steps to use the macro using example dataset.

Step 1) prepare the dataset: Apply formats to all categorical variables of interest. See appendix for details. Apply dummy coding (0=No, 1=Yes). Only necessary for (Table 2).

Step 2) Read in the Macro using %include statement.

Step 3) Plug in variables of interest.

% survey tables binary (strata = SDMVSTRA, cluster = SDMVPSU, weights = MEC6YR, data = Nhanes_01_06_metsys, categorical_vars = bmigroup RACE RIDEXMON RIAGENDR vstatus, continous_vars = BMXBMI RIDAGEYR INDFMPIR, percent_kind = col, groupvar = metsyn, rse = 0.30, table1title = Characteristics of participants, table2 title = Characteristics of participants by Metabolic Syndrome);

**Generated output:** This macro uses ODs rtf and ODs markup
(Excel xp tag set) [4]. Various outputs have been programmed into it,
with and without grid lines (Figures 1 & 2) are screenshots of tables
in the example data set.

## Errors and limitations

Common errors and/or warning messages generated and displayed in the log file typically result from categorical variables (like race) having “zero” in one of the cells, due to which association statistics are not calculated. The final table produced will contain estimates, but the p-value will be excluded. Another possible error message could be “Lock is not available”. The solution to this problem is to rerun the program. If error message persists, change the active directory to your project directory.

## Conclusion

This macro helps in increasing productivity and reproducibility and also helps in preparing error free tables for summarizing data, reporting, and research publications.

## Acknowledgement

Authors thank Dr. Warren May, Ph.D. for reviewing this manuscript. We would also like to thank the very supportive and informative SAS user community.

## References

- Zipf G, Chiappa M, Porter KS, Ostchega Y, Lewis BG, Dostal J. National health and nutrition examination survey: plan and operations, 1999-2010. Vital Health Stat 1. 2013; 1-37.
- McNaghten AD, Wolfe MI, Onorato I, Nakashima AK, Valdiserri RO, Mokotoff E, et al. Improving the representativeness of behavioral and clinical surveillance for persons with HIV in the United States: the rationale for developing a population-based approach. PLoS One. 2007; 2: e550.
- Rust K. Variance Estimation for Complex Estimators in Sample Surveys. Journal of Official Statistics. 1985; 1: 381-397.
- SAS Institute Inc. SAS/STAT Software, Version 9.2. Cary, NC.
- Schappert S, Burt C. Ambulatory Care Visits to Physician Offices, Hospital Outpatient Departments, and Emergency Departments: United States, 2001- 2002. National Center for Health Statistics. Vital Health Stat. 2006; 1-66.
- CDC. National Hospital Discharge Survey. 2014; 1979-1996.
- Klein RJ, Proctor SE, Boudreault MA, Turczyn KM. Healthy People 2010 criteria for data suppression. Healthy People 2010 Stat Notes. 2002; 1-12.
- Hing E, Cherry D, Woodell D. National Ambulatory Medical Care Survey: 2004 Summary. National Center for Health Statistics. Vital Health Stat. 2006; 1.