Austin Biom and Biostat. 2015; 2(4): 1028.
Sunesara I¹*, Lirette ST¹ and Griswold ME¹
Department of Center of Biostatistics and Bioinformatics, University of Mississippi Medical Center, USA
²Ontario Cancer Institute, Princess Margaret Hospital,
*Corresponding author: Sunesara I, Center of Biostatistics and Bioinformatics, University of Mississippi Medical Center, 2500 N State St, Jackson, MS, 39216, USA
Received: September 16, 2015; Accepted: December 08, 2015; Published: December 14, 2015
Production of publication-quality tables can be time consuming and tedious. The repetitive copy/paste or the often inaccurate typing by hand is less than optimal solutions for a very common problem. Proc survey in SAS is a very powerful tool for complex multistage probability sampling designs, but digesting the output can be overwhelming. We present a SAS macro that gives the user concise publication quality tables for complex survey data which uses design variables such as stratification, clustering and sampling weights.
Keywords: Complex survey; Multi-stage sampling; Design variables; Population; SAS; Tables
SAS proc survey procedures are available to handle complex Multi-Stage Probability Sampling Designs (MDPS), each producing a plethora of analytic output. Unlike other procedures in SAS and competing statistical packages, the survey procedures provide appropriate parameter estimates from a known probability sample by incorporating the necessary design weights. Generally the output produced is extremely valuable to the researcher but is not output in a concise, publishable format. Even when using ODs export functions of tables into output destinations such as html, pdf or rtf formats, the output often requires post transfer processing. Producing publicationquality tables by copying and pasting into formatted shells can be tedious, laborious, and prone to typing errors as well as needing further processing. In this paper we present a SAS macro which automates the production of publication ready tables for complex sampling survey data directly from SAS using the ODs capabilities. We illustrate the macro using a sample from the National Health and Nutritional Education Survey (NHANES) . This study uses multi-stage sampling procedures, which introduces design variables for stratification and clustering, similar to the Medical Monitoring Project , and related sampling weights for analysis in order to infer back upon the population of interest from which the sampling frame was derived. In this work, we are most interested in estimates of population prevalence and, therefore, limit the macro mainly to producing proportions and their associated measures of variance and confidence.
For our example, a combined dataset (N=5871) of NHANES from years 2001 - 2006 is used for show-casing the macro. The dataset includes the subset of variables from NHANES shown in Table 1. Using this example data set; we wish to create (Tables 2 & 3) for demographic characteristics of our sample to illustrate the macro.
Age at screening
Body Mass Index
Family poverty index ratio
Body Mass Index
Footnote: Poplin*: Population
Table 1: Description of example dataset.
N (%) / MN (sd?)
Body Mass Index
Table 2: Table shell for overall participant’s characteristics.
Body Mass Index
Table 3: Table shell for binary (yes/no) subgroup (metabolic syndrome) with association statistics.
Variance: For variance computation necessary to provide confidence intervals and errors, only Taylor series estimation  is currently available in the macro. The survey procedures in SAS do include resampling methods for variance estimation, such as, Balanced Repeated Replication (BRR) and Jackknife (JK); these additional methods are intended to be included in future releases and should be a straightforward addition.
Figure 1: Screenshot of Table 1 output for example dataset.
Figure 2: Screenshot of Table 2 output for example dataset.
Missingness: When requesting binary subgroup analysis, the default missingness structure for SAS survey procedures is Missing Completely at Random (MCAR) . Therefore, the macro call assumes MCAR. The Not Missing Completely at Random (NOMCAR) option can be requested and is specified within the source code of the macro. The nomcar option is useful when one cannot assume data values are missing completely at random, and, thus, calculates the variance appropriately. This option applies only to Taylor series variance estimation . However, as noted, this only applies to binary subgroup analysis (Table 2). For estimated means and percentages of overall participant characteristics (Table 1), a MCAR missingness structure is assumed.
Relative standard error: The Standard Error (STDErr) is primarily a measure of the sampling variability that occurs by chance when only a sample, rather than an entire universe, is surveyed [5,6]. Proper estimation of STDerr is important in providing appropriate estimates, p-values, and confidence intervals based on design weights. Relative Standard Error (RSE) is one of the criteria to check for reliability of estimates (mean or percent) . RSE is obtaining by dividing the standard error by the estimate itself (RSE= STDErr / Estimate) . The macro relies on understanding the order of computation, either row or column proportions as needed can be output. If the row option is specified in the macro, row proportions and STDErr will be calculated appropriately. Likewise, column proportions (the default) and STDErr can be calculated with the call option for clarity. The resulting RSE is then expressed as a percent, where 20% or 30% are commonly chosen as reliable estimates. For this macro, the end user should specify 0.30 if they desire a cut point of 30% RSE. By default, the macro will calculate RSE at 20%. Unreliable estimates  based on RSE criteria only are marked by double dagger sign (‡) in the output generated by this macro at the specified RSE cut point.
Output: The macro creates a folder named “result” under the active directory that contains relevant output. If the folder similarly named is available all the output will be saved within it. Output file names consist of concatenation of (Tables 1 & 2), name of the data file, and suffix of current date and time.
Macro parameters: The macro call allows for several options as well as required fields as noted in Table 4.
Dataset name only
Binary Outcome or subgroup of interest
(Should be coded as 0=No and 1=Yes) (Defines Columns to split)
Mandatory for Table 2
Enter all categorical variables (e.g. Gender…) (Row Variables)
Enter all continuous variables (e.g. Age…) (Row Variables)
Row or Column percent (Default=column)
Relative Standard Error (Default = 0.20)
Input range 0.00 to 1.00
Recommended 0.20 or 0.30
Title for the Table of Overall Characteristics
Title for the Table of Characteristics split by a binary variable
Table 4: Macro parameters.
To download the macro please uses the link (https://sites.google. com/site/imransunesara/macros-programs/sas-software).
Recommended steps to use the macro using example dataset.
Step 1) prepare the dataset: Apply formats to all categorical variables of interest. See appendix for details. Apply dummy coding (0=No, 1=Yes). Only necessary for (Table 2).
Step 2) Read in the Macro using %include statement.
Step 3) Plug in variables of interest.
% survey tables binary (strata = SDMVSTRA, cluster = SDMVPSU, weights = MEC6YR, data = Nhanes_01_06_metsys, categorical_vars = bmigroup RACE RIDEXMON RIAGENDR vstatus, continous_vars = BMXBMI RIDAGEYR INDFMPIR, percent_kind = col, groupvar = metsyn, rse = 0.30, table1title = Characteristics of participants, table2 title = Characteristics of participants by Metabolic Syndrome);
Generated output: This macro uses ODs rtf and ODs markup (Excel xp tag set) . Various outputs have been programmed into it, with and without grid lines (Figures 1 & 2) are screenshots of tables in the example data set.
Common errors and/or warning messages generated and displayed in the log file typically result from categorical variables (like race) having “zero” in one of the cells, due to which association statistics are not calculated. The final table produced will contain estimates, but the p-value will be excluded. Another possible error message could be “Lock is not available”. The solution to this problem is to rerun the program. If error message persists, change the active directory to your project directory.
This macro helps in increasing productivity and reproducibility and also helps in preparing error free tables for summarizing data, reporting, and research publications.
Authors thank Dr. Warren May, Ph.D. for reviewing this manuscript. We would also like to thank the very supportive and informative SAS user community.
- Zipf G, Chiappa M, Porter KS, Ostchega Y, Lewis BG, Dostal J. National health and nutrition examination survey: plan and operations, 1999-2010. Vital Health Stat 1. 2013; 1-37.
- McNaghten AD, Wolfe MI, Onorato I, Nakashima AK, Valdiserri RO, Mokotoff E, et al. Improving the representativeness of behavioral and clinical surveillance for persons with HIV in the United States: the rationale for developing a population-based approach. PLoS One. 2007; 2: e550.
- Rust K. Variance Estimation for Complex Estimators in Sample Surveys. Journal of Official Statistics. 1985; 1: 381-397.
- SAS Institute Inc. SAS/STAT Software, Version 9.2. Cary, NC.
- Schappert S, Burt C. Ambulatory Care Visits to Physician Offices, Hospital Outpatient Departments, and Emergency Departments: United States, 2001- 2002. National Center for Health Statistics. Vital Health Stat. 2006; 1-66.
- CDC. National Hospital Discharge Survey. 2014; 1979-1996.
- Klein RJ, Proctor SE, Boudreault MA, Turczyn KM. Healthy People 2010 criteria for data suppression. Healthy People 2010 Stat Notes. 2002; 1-12.
- Hing E, Cherry D, Woodell D. National Ambulatory Medical Care Survey: 2004 Summary. National Center for Health Statistics. Vital Health Stat. 2006; 1.
Citation: Sunesara I, Lirette ST and Griswold ME. Survey Tables Binary: A SAS Macro for Publication Quality Tables of Complex Survey Data. Austin Biom and Biostat. 2015; 2(4): 1028. ISSN: 2378-9840