Believe the Extreme (BE) Strategy at the Optimal Point:What Strategy will it become? Special Article - Biostatistics Theory and Methods

Austin Biom and Biostat. 2015;2(3): 1022.

# Believe the Extreme (BE) Strategy at the Optimal Point:What Strategy will it become?

Ahmed AE¹, McClish DK²*, Schubert CM³ and ALJahdali HH4

¹Department of Epidemiology and Biostatistics, King Saud bin Abdulaziz University for Health Sciences, Saudi Arabia

²Department of Biostatistics, Virginia Commonwealth University, USA

³Department of Mathematics and Statistics, Air Force Institute of Technology, USA

4Department of Medicine, Pulmonary Division-ICU, King Saud bin Abdulaziz University for Health Sciences, Saudi Arabia

*Corresponding author: McClish DK, Department of Biostatistics, School of Medicine, Virginia Commonwealth University, Virginia, USA.

Received: June 01, 2015; Accepted: June 11, 2015; Published: June 29, 2015

## Abstract

The choice of what tests to sequence is essential in making a clinical decision. A variety of sequential techniques have been proposed to combine tests to increase the overall accuracy, including Believe the Positive (BP), Believe the Negative (BN), and the relatively new Believe the Extreme (BE). For a two test sequence, the BP strategy administers Test 2 only if the results on Test 1 are not positive. Similarly, the BN strategy administers Tests 2 only if the results on Test 1 are not negative. For both of these strategies (BP and BN), two thresholds are required. In the BE strategy, only those subjects who tested neither positive nor negative for disease with Test 1 are administered Test 2. Thus there are 3 thresholds for a two test BE strategy: 2 for the initial test, and 1 for the second test. The BE strategy can at times approximate the BP and BN strategies if the upper threshold on the first test is estimated very high or low. This paper explores the BE strategy while varying parameters associated with the features of each test to determine when the BE strategy behaves as a BP or BN strategy, as opposed to requiting all three thresholds. Two practical examples are presented: sleep apnea data and pancreatic cancer. The sleep apnea study shows that the BE strategy might actually function as a BN strategy. The cancer study shows how BE can display better accuracy and lower cost than either the BN or BP strategies.

Keywords: Sequential testing; Believe the extreme; Believe the positive; Believe the negative; Obstructive sleep apnea; Pancreatic cancer

## Abbreviations

BP: Believe the Positive; BN: Believe the Negative; BE: Believe the Extreme; PSA: Prostate Specific Antigen; MROC curve: Maximum Receiver Operating Characteristic; FPR: False Positive Rate; U: Upper; L: Lower; N: Non-diseased; D: Diseased; CDFs: Cumulative Distribution Functions; P(D): Prevalence; GYI: Generalized Youden Index; OOP: Optimal Operating Point; b: Standard Deviation Ratio; ρ: Correlations; AUC: Area Under the Curve; OSA: Obstructive sleep apnea; AHI: Apnea-Hypopnea Index; ESS: Epworth Sleepiness Scale; BMI: Body Mass Index; CA125: Cancer Antigen 125; CA19-9: Carbohydrate Antigen 19-9

## Introduction

A number of techniques have been used to improve the overall accuracy of the diagnostic process. Sequential testing techniques combine multiple tests sequentially in order to classify subjects into one of two groups (diseased or non-diseased). The use of combinations of logic rules is a popular technique for combining a sequence of tests [1-3]. Two such logic rules, Believe the Negative (BN) and Believe the Positive (BP) are often explored in the literature and currently are the most popular techniques for combining sequential tests [4-7]. For a two test sequence, the BP strategy administers Test 2 to subjects only if the results on Test 1 are not positive. Similarly, the BN strategy for a two test sequence administers Tests 2 to subjects only if the results on Test 1 are not negative. For both of these strategies (BP and BN), two thresholds are required: one for Test 1 and one for Test 2. A relatively new sequential method, which we call Believe the Extreme (BE) is rarely mentioned in the literature [2,3,5]. In the BE strategy with two tests there are 2 thresholds to classify patients as positive, negative, or uncertain for disease on the first test. Patients who test neither positive nor negative for disease in Test 1 are administered Test 2, where a single threshold determines positivity of the test . Etzioni et al. appear to be the first to formalize the statistical evaluation of the BE strategy in the context of prostate cancer (PSA and percentfree PSA) although the strategy was not named by the researchers. Previous studies found that the BE strategy was the most consistently accurate and least costly choice (the cost of testing defined as the number of subjects who need more than one test to diagnose disease) when compared to the BP and BN strategies [5,8]. The BE strategy has also been shown to resolve to a BP or BN strategy as a special case . The BE strategy has also been shown to resolve to a BP or BN strategy as a special case .

This paper examines the BE strategy to determine when the BE strategy reduces to the BN or BP strategy for which only a single threshold for the initial test is needed and when two thresholds are required for the initial test. That is, are there scenarios or contexts for which the BE strategy stands on its own, or should researchers only consider the BP or BN strategy. In this investigation, the BE strategy is assessed at the optimal point, considering test characteristics such as ratios of the standard deviations of diseased and non-diseased populations, area under the curve, correlation between the two tests of diseased and non-diseased populations, and prevalence of disease.

## Method

The use of the BE strategy is associated with three thresholds (two thresholds for the first test and one threshold for the second test). The first test (Test 1) is measured on all subjects. When the result of Test 1 is in a grey zone (where subjects cannot be classified as negative or positive), Test 2 is administered to determine their diagnostic status. Specifically, the BE strategy will classify a subject as having disease if the result of Test 1 exceed an Upper threshold (U) or if the result for Test 1 is neither positive nor negative (grey zone) and the second test is positive for disease. The BE strategy will classify a subject as nondiseased if the result of Test 1 is less than a Lower (L) threshold or if the result for Test 1 is neither positive or negative (grey zone) and the second test is negative for disease. This procedure may produce more than one sensitivity value corresponding to a fixed specificity, a result of choosing different thresholds for the BE strategy. The Maximum Receiver Operating Characteristic (MROC) curve has been used to summarize the accuracy results for the BE testing strategy as it depicts the best (maximum) sensitivity for a fixed False Positive Rate (FPR=1-specificity) [5,8].

## Computing sensitivity and specificity for the BE strategy

Let X1D and X2D represent the continuous test results of the diseased (D) population for Tests 1 and 2 respectively. Let X1N and X2N represent the test results of non-diseased (N) population for Tests 1 and 2 respectively. Let θ1U and θ1L represent the two thresholds associated with Test 1 where θ2N > θ1L, and θ2 is the threshold associated with Test 2. Let F1D, F2D, F1N and F2N represent the Cumulative Distribution Functions (CDFs) of test results for those with (D) and without (N) disease for the first (1) and second (2) test. Finally, let F1D,2D and F1N,2N represent the joint CDFs of test results for those with (D) and without (N) disease between Tests 1 and 2. The BE strategy uses combinations of “AND” and “OR” statements to define overall disease positive or negative test results. The key rules of testing for this strategy are the following:

Positive result if X1> θ1U or X2> θ2 and θ1L1< θ1U

Negative result if X1< θ1L or X2< θ2 and θ1L1U

The formula for FPR and Sensitivity (Se) of the BE strategy are given by .

## Computation of MROC, cost and optimal operating point

When considering two continuous tests, the use of the BE strategy is associated with three thresholds and as such, produces a collection of FPR-Se pairs from which the Maximum Receiver Operating Characteristic (MROC) curve may be derived . These FPR-Se pairs may, and often do, contain values for which at a given FPR=t, multiple Se values are observed. Clearly, thresholds that can produce a higher Se for a fixed FPR may be preferred over those that produce lower Se values. Thus, the MROC curve is comprised of the FPR-Se pairs for which She is maximized at a fixed FPR=t. The formula used to calculate the MROC curve in general is:

For each point on the MROC curve for the BE strategy, there is a corresponding set of thresholds in1L and θ1U for Test 1 and θ2 for Test 2) that produces maximum sensitivity for an associated fixed FPR.

Also associated with the set of thresholds that define the points on the MROC curve is a cost of testing. This cost is a measure of the number of subjects that must be evaluated by both tests, and therefore is a function of the probability of the second test being used (i.e., the thresholds on the first test that force the subject to proceed to the second test). The more subjects being classified by the second test, the higher the cost associated with conducting the sequence. Thus, the thresholds associated with the points on the MROC curve make the BE strategy less or more expensive depending on the number of patients who receive Test 2.

The formula used to calculate this cost, the cost of conducting the sequence, is

C(θ1L1U)=((F1D(θ1U) - F1D(θ1L)×P(D))+(( F1N(θ1U) - F1N(θ1L)×(1-P(D)))

The MROC curve describes the best performance (highest Se value) across every FPR, and notably, across all threshold combinations. It may be prudent, though, to define and work with the point at which classification accuracy is optimized, that is, rather than working with the entire MROC curve that was generated by the testing sequence, describe instead the performance of the testing sequence at its optimal point. We consider the Optimal Operating Point (OOP) which maximizes the Generalized Youden Index (GYI) where the GYI is given by the following formula:

Where m=[(1-P(D)/P(D)]×[(CFP-CTN)/(CFN-CTP)] and the terms CFP, CTN, CFN, CTP refer to the costs of misclassification associated with a False Positive (FP), True Negative (TN), false negative (FN) and True Positive (TP). The term m, is a weighting factor which represents the slope of the MROC curve at the OOP [8-10]. The misclassification costs in m reflect financial or health costs that result from the decisions of the sequence [9-11], not to be confused with the cost of conducting the sequence, C(θ1L, θ1U ).

A maximum GYI may also be computed amongst the sets of thresholds that restrict cost to particular ranges of values, C(θ1L1U)<C0 . C0 Would be a cost restriction, which does not allow the user to consider threshold values, or testing performance, for subsets of patients receiving both Test 1 and Test2 whom exceed a particular cost (e.g. C0 =80% means that no more than 80% of patients would undergo Test 2). A cost constraint of 100% means that all of the patients could receive both Test 1 and Test 2.

## Simulation methods

To be able to study the behavior of the BE strategy, the effects of four different parameters associated with the accuracy and cost were examined. These parameters were the ratio of the standard deviations for the diseased, σD and non-diseased populations, σN, prevalence of disease P(D), the correlation between tests in the sequence for the non-diseased, ρN,, and diseased, ρD, populations, and the area under the curve (AUC) for each of the two tests when used alone. In this investigation, the values of the AUC considered for Test 1 and Test 2 respectively, were (0.7, 0.7), (0.7, 0.9) and (0.9, 09). These pairs of values assume that the second test was at least as accurate as the first test. In order to see clearly the effect of the ratio of standard deviations, b= σN,/σD, on the BE strategy, we examined three possible values of the ratio of diseased and non-diseased standard deviations: b=0.5, 1 or 2. When b=2, the standard deviation of test results for non-diseased subjects is twice that of the diseased subjects. When b=1 the standard deviation of test results for diseased and non-diseased subjects are equal. When b=0.5 the standard deviation of test results for diseased subjects is twice that of the non-diseased subjects. Four combinations of correlation between the tests for both the diseased and non-diseased populations were considered:

D=0, ρN=0), (ρD=0.3, ρN=0.7), (ρD=0.7, ρN=0.3) and (ρD=0.7, ρN=0.7).

Finally, these parameter settings were examined when imposing a cost constraint of 80%, that is, the cost of conducting the sequence was restricted to no more than 80% of subjects being diagnosed by the second test. For the cost constraint comparisons, a prevalence of 0.1 was used.

For this simulation, we assumed that the values of the test results for subjects with and without disease (X1D, X2D, X1N, X2N ) followed bivariate normal distributions expressed as follows:

Thus, the estimation of the accuracy measures (sensitivity and FPR) was obtained by using normal distribution CDFs for those with and without disease. Together, these are referred to as the binormal model. Without loss of generality, we set the means of the bivariate normal model for the subjects without disease as 0 and the standard deviations to 1, that is: μ1N= μ2N=0 and σ1N1N=1. Then σ1D=1/b1 and σ2D=1/b2 Values of μ1D, μ2D can be obtained for assumed values of AUC by the formula

Correlation values were fixed as one of the parameter settings we varied. Since the binormal model assumption was made, we chose to evaluate thresholds over a grid that ranged between

[min(μN-3σND-3σD), max(μN+3σND+3σD)].

Because test thresholds had to include values from the distribution of both those with and without disease, this range of test thresholds used the minimum of lower limits of the diseased and non-diseased distributions, and the maximum of the upper limits of diseased and non-diseased distributions. Values outside this range, e.g. when

θ1L< μ1N - 2.56σN or θ1Uμ1D+2.56σD , demonstrated less than 0.5% of observations fell in these extremes. When θ1L< μ1N - 2.56σN occurs, the threshold θ1L is so low that the strategy behaves as a BP strategy. Similarly, when θ1U1D+2.56σD occurs, the threshold θ1U is so large that the strategy behaves as a BN strategy.

## Simulation results

The following tables show how the actual strategy at the optimal point can vary according to the AUC’s, standard deviations, correlations, or cost restrictions on the cost of conducting the sequence. Table 1 shows the resultant strategy at the optimal point for varied values of AUC, correlation and ratios of the standard deviations between tests. The ratio of standard deviations has the most important effects. When b1=b2=0.5, the strategy always resolves to a BP strategy, however, when b1=b2=2.0 it resolves to a BN strategy. This is true regardless of the values of m or the correlation. When b1=b2=1 the situation is different. Usually the BE strategy holds, with two finite thresholds for the first test. This is true when correlation is (0, 0) or (0.7, 0.7). However, when correlations are (0.3, 0.7) or (0.7, 0.3) and AUC1=0.7 either the BP or BN strategy may apply.