The Application of Last Observation Carried Forward in the Persistent Binary Case

Special Article - Biostatistics Theory and Methods

Austin Biom and Biostat. 2015;2(2): 1018.

The Application of Last Observation Carried Forward in the Persistent Binary Case

Jun He and Donna McClish*

Department of Biostatistics, Virginia Commonwealth University, USA

*Corresponding author: McClish, Department of Biostatistics, Virginia Commonwealth University,Virginia.

Received: June 01, 2015; Accepted: June 11, 2015; Published: June 19, 2015


The main purpose of this research is to evaluate use of Last Observation Carried Forward (LOCF) as an imputation method when persistent binary outcomes are missing in a Randomized Controlled Trial. A simulation study was performed to evaluate the effect of equal event rates and equal/unequal dropout rates on Type I error. Properties of estimated event rates, treatment effect, and bias were also assessed. LOCF was also compared to two versions of complete case analysis - Complete1 (excluding all observations with missing data), and Complete2 (only carrying forward observations if the event is observed to occur). The results showed that 1) If the dropout rates were equal, the three analysis methods all had appropriate Type I error; 2) If the dropout rates were unequal, the Type I error was much greater than 0.05 in both LOCF and Complete2 analysis; 3) Regardless of dropout rates, the estimated mean event rate was underestimated in the LOCF analysis and overestimated in the Complete2 analysis, while Complete1 analysis had the closest estimated mean event rate to the true rate; 4) Compared to the study with no event at the first time point, the estimated mean event rate was underestimated less in the LOCF analysis and overestimated more in the Complete2 analysis when an event could occur at the first time point. LOCF analysis was applied to a mammogram dataset, where the LOCF method underestimated the final event rate.

Keywords: Last observation carry forward; Persistent binary data; Missing data; Estimated mean event rate; Type I error; Bias


RCT: Randomized Clinical Trial; ITT: Intent to Treat; LOCF: Last Observation Carried Forward; WISER: Women Improving Screening through Education and Risk Assessment; HBM: Health Belief Model


In a Randomized Clinical Trial (RCT), patients often drop out before a study is completed because of side effects, recovery, lack of improvement, unpleasant health problems, and other unknown factors, which results in missing data [1]. Intent to treat (ITT) analysis – used in analyzing clinical trial data – is based on the initial treatment plan, and intends to analyze data from all the observations, even if the patients drop out of the study. When there are missing data, following ITT requires some kind of imputation be used. Although there are many missing data imputation methods, such as Last Observation Carried Forward (LOCF) [2], replacement with mean [3], regression imputation [4], multiple imputation [5], and maximum likelihood [6], no single method is appropriate for all problems.

The focus in this paper will be on the LOCF imputation method applied to persistent binary cases. A persistent phenomenon is defined as an event that once it occurs at a time point, it will occur at all the following time points. One example of persistent binary outcomes occurred in the Women Improving Screening through Education and Risk Assessment (WISER) study [7]. To assess a simple tailored health promotion intervention, the participants were asked whether they had a mammogram since the start of the study. Once participants had a mammogram the event persists.

Almost all clinical trials face the problem of missing data. For example, in the WISER study, a nearly 40% dropout rate was observed. Then the question becomes how to analyze a dataset with missing data. LOCF assumes that after the point of dropout the last observed outcome is used in place of missing observations. For continuous outcomes, this method is not recommended because it introduces bias, and alters the mean and variance [8,9]. For binary data, the LOCF imputation method not only has poor frequency properties of estimators when missing values are due to dropout, but also causes inflated Type I error rates [10]. This method may also have poor performance in analyzing binary outcomes if the event is a persistent phenomenon.

The purpose of this paper is to evaluate the LOCF imputation method in situations of persistent binary outcomes. A simulation study is performed to examine the effect of dropout rates and type of dropout (random or associated with treatment arm) on Type I error for the LOCF method of analysis. At the same time, the results from LOCF are compared to two versions of complete case analysis: Complete1 (excluding all observations with missing data), and Complete2 (excluding the missing data when the event hasn’t been observed to occur, but carrying forward the observations if the event is observed to occur).

observed to occur). Section 2 describes the simulation. Section 3 presents results of the simulation, allowing a comparison of the three analysis methods. In section 4, these methods are applied to a real life example using the WISER study. Finally, we summarize the study, discuss the limitations, and mention future work.

Simulation and Methods

Assumptions and parameters in the simulation

Simulations are performed by assuming an RCT with two treatment arms (Control and Treatment), equal sample sizes and three time points (T1, T2, and T3). Two study scenarios are considered. In one, it was assumed that the study event could not have occurred at time one. This would be typical for a clinical trial where T1 is baseline (prior to treatment) and having the event would be exclusion to enrolling in the study. In the second scenario, the first measurement (T1) could be assessed after treatment and the event may or may not occur at that time. In both studies, it was assumed no missing data at T1, and an equal likelihood that the missing data would first occur at T2 or T3. The persisted event is assumed, meaning that the subject will continue to have the event at future time points once a subject has the event. Monotone missing is also assumed, which can be explained as once a subject has missing data at a time point all future time points will also be missing.

For the event rates, we assume no treatment effect (equal event rate), allowing the Type I error rate to be assessed. Both equal and unequal dropout rates are considered (Table 1). For the case of equal dropout rate, 9 scenarios are considered, represented by a range of low, moderate and high event rates (0.2, 0.5, and 0.8), and low, moderate and high dropout rates (10%, 25%, and 40%). In the case of unequal dropout rates, 18 scenarios are investigated. Since the effect of unequal dropout could be influenced by how different the dropout rates are, two scenarios corresponding to each average dropout rate are considered. For example, when the average dropout rate is 10%, dropout rates of 12.5% vs. 7.5% and 15% vs. 5% are used in the control group and the treatment group respectively. For each set of parameters, 2000 replications are used for estimation and testing.