Research Article
Austin Biom and Biostat. 2014;1(2): 6.
Soft Roc Curves
Xin Huang1, Narayanaswamy Balakrishnan2,3 and Yixin Fang4*
1Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, USA
2Department of Mathematics and Statistics, McMaster University, Canada
3King Saud University, Saudi Arabia
4Division of Biostatistics, Department of Population Health, New York University, USA
*Corresponding author: Yixin Fang, Division of Biostatistics, Department of Population Health, New York University, New York, NY 10016, USA.
Received: August 25, 2014; Accepted: October 13, 2014; Published: November 18, 2014
Abstract
Receiver operating characteristic (ROC) curves are a popular tool for evaluating continuous diagnostic tests. However, the traditional definition of ROC curves incorporates implicitly the idea of "hard" thresholding, which cannot encompass the situation when some intermediate classes are introduced between test result positive and negative, and also results in the empirical curves being step functions. For this reason, we introduce here the definition of soft ROC curves, which incorporates the idea of "soft" thresholding. The softness of a soft ROC curve is controlled by a regularization parameter that can be selected suitably by a cross-validation procedure. A byproduct of the soft ROC curves is that the corresponding empirical curves are smooth. The methods developed here are then examined through some simulation studies as well as a real illustrative example.
Keywords: Cross-validation; Diagnostic test; Intermediate class; Regularization parameter; Thresholding
Introduction
Receiver Operating Characteristic (ROC) curves is a popular tool for evaluating continuous diagnostic tests; see, for example, Pepe [1]. However, the traditional definition of ROC curves incorporates implicitly the idea of "hard" thresholding. To be specific, let T be the outcome of a continuous diagnostic test and D be the disease status. Given a threshold c, the hard thresholding scheme defines a subject as diseased if the test result T = t exceeds c, and as non-diseasedotherwise. It thus results in a binary classifier,
The ROC curve is then a graphical plot of true positives, E{I(T.c)|D=1}, versus false positives, E{I(T.c)|D=0}, for .-∞<c<∞.. It can be expressed as
where F (.) and G(.) are the distributions of T , given D = 0 and D = 1, respectively.
However, from the medical practitioners point of view, if the test result is close to the given threshold c, then one may be indecisive about the status of diseases. This is a common situation for tests with ambiguous thresh- olds (e.g., prostate-specific antigen, which is shown to be not a dichotomous marker [2]. Thus, practitioners tend to implement an intermediate class between the negative and positive [3], within which patients are diagnosed as diseased or nondiseased according to some probability model. Hozo and Djulbegovic [4] provide a definition of acceptable regret threshold to explain such phenomenon. They demonstrate that different practitioners might adapt different acceptable regret thresh- olds for withholding treatment even when the diagnostic tests exceed the pre-defined threshold. Unfortunately, the existing hard-thresholding scheme does not incorporate such intermediate classes. Furthermore,there are other disadvantages in the hard thresholding scheme. In particular, the discontinuity of the binary classifier results in the corresponding estimated ROC curve being a step function, while the underlying ROC curve is likely to be smooth. Consequently, due to the discontinuity in the step function, the variability of the estimated ROC curve becomes large.
To overcome these disadvantages, we consider soft-thresholding scheme,
where the value ? Is between 0 and 1 and will be discussed in the next section, and δ is a regularization parameter controlling the softness. In particular, when δ=0, the soft thresholding simply becomes the hard thresholding. When decision-making rule Iδ is applied with threshold c, the sensitivity (a.k.a. the true positive probability) equals E{Iδ(T-c)|D=1} and the specificity (a.k.a true negative probability) equals E{1-Iδ(T-c)|D = 0}.
The rationale of this soft-thresholding scheme is that if the test result is close to the given threshold c, then one may be indecisive about the status of the disease. Hence, we refer to Iδ (.) as the indecisive function. And the probability model within the intermediate class can be formulated by? In the indecisive function. We will show that different indecisive functions will result in different soft ROC curves. The idea used here is similar in principle to the one used in designing randomization tests to achieve a given significance level in hypothesis testing [5]. The indecisive function has been considered in the literature of ROC analysis. Many authors have used smooth functions to approximate the indicator function, which can also be considered as indecisive functions. For example, Liu et al. [6] and Liu and Tan [7] used an S-type function to approximate the indicator function for the empirical False Positive Rate (FPR) and True Positive Rate (TPR). Huang et al. [8], Wang et al. [9], and Ma and Huang [10,11] used the sigmoid function to approximate the indicator function in the empirical estimate of the Area Under the ROC Curve (AUC).
Instead of looking for an approximation, in this work, we examine the definition of ROC curves directly and introduce the soft ROC curves based on the soft-thresholding. More importantly, we build a bridge between the approximation of an ROC curve and the approximation of its AUC. More- over, continuity of the proposed soft ROC curves is a promising byproduct, although it is not our primary goal. We should point out that in the literature of ROC; many authors have discussed methods to smooth ROC curves. For example, Zou et al. [12] proposed a non-parametric estimator from kernel estimates of the distribution functions F and G. Peng and Zhou [13] proposed a local linear regression for the ROC curve, while Ren et al. [14] proposed a Penalized Spline Linear Mixed-Effects model (PSLME). In this paper, we demonstrate that the proposed soft ROC method not only has similar performances when compared to the local linear regression and the PSLME methods in terms of smoothing, but also has a clearer explanation to the smoothing parameter and much easier implementation.
The remainder of this paper is organized as follows. In Section 2, we define the soft ROC curve, and derive some of its properties. In Section 3, we propose methods to choose the regularization parameter δ. In Section 4, the proposed methods are examined through some simulation studies and a real data example. Finally, some discussion is made in Section 5, and all technical details are relegated to the Appendix.
Soft ROC Curves
When an indecisive function Iδ is applied with threshold c, we can define a soft ROC curve as follows.
Definition 1: A plot of true positives, E{Iδ (T-c)|D=1}, versus false positives, I0}, for all possible values of c, is called the soft ROC curve with respect to the indecisive function Iδ.
Assume that a test is performed on m non-diseased subjects, yielding testing outcomes Xi, and on n diseased subjects, yielding outcomes Yj . Then, an empirical estimate of the soft ROC curve w.r.t. Iδ is
where and The areaunder the soft The areaunder the soft ROC curve w.r.t.Iδ , denoted by AUCδ , is derived inthe following theorem, and its proof is presented in the Appendix A.
Theorem 1: For the soft ROC curve w.r.t. to the indecisive function Iδ (.), we have
AUCδ=E{Kδ(Y-X)},
where X~F (.), Y~G(.),,and I·δ is the derivative of Iδ.
We remark that for functions with piecewise constant, the derivative is defined by using Dirac Delta function. From Theorem 1, we see that an unbiased estimate of AUCδ is given by
It is worth emphasizing if the hard-thresholding decision rule (H) is applied, then we use the classical ROC curve to evaluate its performance, Whereas if the soft-thresholding decision rule (S) is applied, then we use the newly proposed soft ROC curve to evaluate its performance. In other words, which type of ROC curves is used for evaluation depends on the underlying decision rule that is applied. Actually, it is not necessary to define a new ROC curve for any new decision rule. However, we define soft curves for at least three reasons. First, the soft-thresholding decision rule is simple and appropriate. Second, the resulted empirical soft ROC curve is continuous. Third, the relationship between Kδ and Iδ is mathematically beautiful.
Two-sided soft ROC curves
We can categorize indecisive functions and soft ROC curves into one-sided and two-sided according to the following definition.
Definition 2: If Iδ(t-c)=0 for t<c, Iδ and the corresponding soft ROC curve are said to be one-sided. Otherwise, they are said to be twosided.
We now present some examples of indecisive functions Iδ and their correspondingsss Kδδ , which are all displayed in Figure 1. The corresponding detailed calculations are presented in the Appendix B.
Figure 1: Two-sided Id and their corresponding Kd.
Example 1: Order 0 two-sided indecisive function is given by
where 1{.} is an indicator function. This implies that the disease status is totally indecisive (the chance of being diagnosed as diseased is 50%) when t is within δ of threshold c. The corresponding Kδ is
Example 2: Order 1 two-sided indecisive function is given by Appendix C
This indecisive function is continuous, and it implies that the probability of being diagnosed as diseased is linearly increasing in t - c when t is within δ of threshold c. The corresponding Kδ is
where sign(.) is the sign function.
Example 3: Order ∞ two-sided (Sigmoid) indecisive function is given by Appendix D
An appealing property of the sigmoid function is that it has infinite derivatives. The corresponding Kδ is
This Kδ also enjoys the property of having infinite derivatives.
One-sided indecisive functions
In this subsection, we present two examples of one-sided indecisive functions Iδ and their corresponding Kδ, which are displayed in Figure 2. The indecisive functions are similar to the ones in Examples 1 and 2, but the order 1 one-sided Kδ takes on a reasonable form, unlike its two-sided counterpart. The corresponding detailed calculations are presented in the Appendix E.
Figure 2: One-sided Id and their corresponding Kd.
Example 4: Order 0 one-sided indecisive function is given by
This implies that the disease status is totally indecisive when t is in the interval [c,c+δ). The corresponding Kδ is
Example 5: Order 1 one-sided indecisive function is given by Appendix F
This implies that the probability of being diagnosed as diseased is linearly increasing in t - c when t is in the interval [c, c + δ). The corresponding Kδ is
Surprisingly, the minor change in this indecisive function from its two-sided counterpart results in a big change in the corresponding Kδ, and Kδ has a continuous derivative. In what follows, we will focus on this indecisive function. Of course, the procedures developed here for this indecisive function can also be applied to other indecisive functions.
Selection of Regularization Parameter
Method based on softness
The regularization parameter δ controls the softness of a soft ROC curve. The bigger the δ is, the softer the ROC curve is. When δ is taken as zero, it becomes the traditional ROC curve as mentioned earlier. Hence, it is important to select an appropriate regularization parameter δ. First, we define the softness of a soft ROC curve as follows.
Definition 3: For a soft ROC curve with a regularization parameter δ, the softness is defined as
where X ˜ F (.) and Y ˜G(.). The hardness is then naturally defined as 1 - α.
The softness a indirectly controls the form of the empirical soft ROC curve estimated from (1). For example, if the order 1 one-sided indecisive function is used, the softness ranges from 0 (when δ = 0 in which case the soft ROC curve becomes a step function) to 1 (when δ = ∞ in which case the soft ROC curve becomes a diagonal line). As mentioned before, the idea of soft-thresholding is similar to the one used in designing randomization tests in hypothesis testing. In this regard, the softness defined above is analogous to significance level in the setting of randomization tests.
Figure 3 shows the plots of δ versus the differences of means of diseased and non-diseased populations for some choices of a. Here, we denote μ = E{Y} - E{X} and assume that the two populations are normal with unit standard deviation.
Figure 3: Plots of d versus mean difference μ for some given a
Evidently, a non-parametric estimate of softness is given by
For a pre-specified a, we can choose a regularization parameter δ. But the determination of a is quite subjective. Recall that the same issue is present in hypothesis testing wherein the significance level is usually taken to be 5%. From the limited simulation studies we have carried out, we would suggest considering softness between 0.1 and 0.3. In the next subsection, we propose a cross-validation procedure for selecting an appropriate δ without prefixing a.Method based on cross-validation
In this subsection, we propose a Cross-Validation (CV) procedure for selecting δ by minimizing the Average Mean Squared Error (AMSE) [14],
where pk is in a fine grid of (0,1),k =1,. . .,K.
For this purpose, we randomly split the sample into two parts, or we randomly split the diseased and non-diseased samples into two parts each. For each random split, we treat one part as a training sample and the other as a validation sample. Based on the training sample, we construct the soft ROC curve and obtain the estimate , and based on the validation sample, we construct the regular ROC curve and obtain the estimate . By repeating this random split many times, we obtain the following cross-validation estimate of the AMSE:
where H is the number of random splits. Then, δ is chosen as the one that minimizes CVδ in (5).The split ratio (training/validation) can be chosen to be either 1:1 or 2:1. From our limited simulation studies, we observe that the results are not sensitive to the split ratio. Such an idea of crossvalidation has been considered by many authors including Bickel and Levina [15]. In theory, Shao [16] examined the consistency of crossvalidation procedures with different split ratios in linear regressions.
Numerical Results
In this section, the proposed methods are examined through Monte Carlo simulation studies and a real data example. The R codes are available from the authors upon request.
Simulation Study
We investigated the performance of the CV procedure through two simulation studies. Let Y be the diseased population and X be the non-diseased population. We considered two types of distributions: (1) both X and Y follows normal distributions; and (2) both X and Y follows double exponential distributions. We considered four settings of means: (i) (μy, μx) = (1, 0); (ii) (μy, μx) = (1.5, 0); (iii) (μy, μx) = (2, 0); and (iv) (μy, μx) = (2.5, 0). We considered two settings of variances: (A) V ar(Y ) = 1 and V ar(X) = 1; and (B) V ar(Y ) = 2 and V ar(X) = 1. Therefore, there were 2 x 4 x 2 = 16 data generating settings. The sample sizes were taken as m = n = 50 or m = n = 100, and the split ratio was set as 2:1. For each data generating setting, 300 replications were performed to calculate the efficiency measure
and the efficacy measure,
where is the d chosen by the CV procedure. The simulation results so obtained are summarized in Table 1. These results show that all efficiencies are less than 1, while efficacies are all close to 1, which indicates the optimality of . From this table, we also observe that is decreasing when the difference μy - μx increases. In fact, when Y and X are well-distinguished, the indecisive interval vanishes.
We also compared the smoothed empirical ROC curves; a promising byproduct of the proposed soft ROC method with the local linear regression method [13] and PSLME model [14]. In this study, four datasets were generated from two normal distributions with unit standard deviation and means being (μy, μx) = (1.5, 0). And the sample size were taken as m = n = 20, m = n = 50, m = n = 100, and m = n = 500. The results are presented in Figure 4, which show that if the goal is to smooth the empirical ROC curve, all the methods perform similarly.
Figure 4: Comparisons of smoothed empirical ROC curves. Note: Empirical: empirical ROC curve; Soft: soft ROC curve; PS: PSLME model; Local linear: local linear regression method
Pancreatic cancer serum biomarkers example
The dataset comes from a case-control study at Mayo Clinic which included 90 patients with pancreatic cancer and 51 subjects with pancreatitis. These data were originally analyzed by Wieand et al. [17]. Two continuous positive scale serum biomarkers were available to diagnose a patient with pancreatic cancer: CA-125, a cancer antigen, and CA-19-9, a carbohydrate antigen. We applied the CV method to select regularization parameters for CA-125 and CA-19-9, which turn out to be 0.04 and 0.115, respectively. The corresponding empirical ROC, soft ROC curves, and smoothed ROC curves by local linear regression and PSLME model are displayed in Figure 5 and Figure 6. Again, for the overall performance, we observe that the smoothed ROC curves estimated from the soft ROC method are similar to the smoothed curves from existing smoothing methods.
Figure 5: ROC curves for Pancreatic Cancer Serum Biomarkers Example: CA-125. Note: Empirical: empirical ROC curve; Soft: soft ROC curve; PS: PSLME model; Local linear: local linear regression method
Figure 6: ROC curves for Pancreatic Cancer Serum Biomarkers Example: CA-19-9. Note: Empirical: empirical ROC curve; Soft: soft ROC curve; PS: PSLME model; Local linear: local linear regression method
Discussion
Many authors have considered using the sigmoid function to approximate the indicator function when calculating the AUC, but without clear reasoning. In this paper, by introducing soft ROC curves, we have provided a connection between the approximation to ROC curve and the approximation to the corresponding AUC. This explains in some way as to why we can use some function to approximate the indicator function while calculating the AUC.
The selection of the regularization parameter in a soft ROC curve is a critical issue. The application of the proposed cross-validation procedure is straightforward. Since the cross-validation is one of the most popular methods for model selection, we have examined it in the present context, by means of Monte Carlo simulation studies and a real example, and have shown that it performs well. However, the consistency of the proposed cross- validation procedure remains as an open problem.
An asymptotic estimate of the variability of the estimated soft ROC curve can be easily developed by following arguments similar to those of by Pepe [1]. However, if the asymptotic variance is not derivable, one can also use the boostrap.
The choice of the indecisive function varies in different practical scenario. For example, if one wants to model the decision behaviors of the medical practitioners, he/she can use the order 0 twosided indecisive function, where the chance of being diagnosed as diseased is 50% when the test result is within the soft threshold, or the order 1 one-sided indecisive function, where the probability of being diagnosed as diseased is linearly increasing when the test result is within the soft threshold. Moreover, if one focuses on the computational issue of the ROC curves, the order infinity two-sided function (sigmoid function) can be used [8,11].
In the future study, we will also generalize the indecisive function by allowing the regularization parameter to vary across different test results. Under appropriate generalizations, the indecisive function becomes a risk type function; the corresponding soft ROC and AUC thus have direct applications on clinical trials.
Acknowledgment
We are grateful to Dr. Hua Liang from University of Rochester for his smoothing ROC programs for PSLME model.
References
- Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press: Oxford. 2004.
- Thompson IM. PSA: a biomarker for disease. A biomarker for clinical trials. How useful is it? J Nutr. 2006; 136: 2704S.
- Body R, Foex B. On the philosophy of diagnosis: is doing more good than harm better than "primum non nocere"? Emerg Med J. 2009; 26: 238-240.
- Hozo I, Djulbegovic B. When is diagnostic testing inappropriate or irrational? Acceptable regret approach. Med Decis Making. 2008; 28: 540-553.
- Lehmann EL. Testing Statistical Hypotheses, 2nd edn. Springer: New York. 1997.
- Liu Z, Tan M, Jiang F. Regularized F-measure maximization for feature selection and classification. J Biomed Biotechnol. 2009; 617946.
- Liu Z, Tan M. ROC-based utility function maximization for feature selection and classification with applications to high-dimensional protease data. Biometrics. 2008; 64: 1155-1161.
- Huang X, Qin G, Fang Y. Optimal Combinations of Diagnostic Tests Based on AUC. Biometrics. 2011; 67: 568-576.
- Wang Z, Chang YI, Ying Z, Zhu L, Yang Y. A parsimonious thresholding- independent protein feature selection method through the area under re- ceiver operating characteristic curve. Bioinformatics. 2007; 23: 2788-2794.
- Ma S, Huang J. Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics. 2005; 21: 4356-4362.
- Ma S, Huang J. Combining multiple markers for classification using ROC. Biometrics. 2007; 63: 751-757.
- Zou KH, Hall WJ, Shapiro DE. Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Stat Med. 1997; 16: 2143-2156.
- Peng L, Zhou XH. Local linear smoothing of receiver operating characteristic (ROC) curves. Journal of Statistical Planning and Inference. 2004; 118: 129-143.
- Ren H, Zhou XH, Liang H. A flexible method for estimating the ROC curve. Journal of Applied Statistics. 2004; 31: 773-784.
- Bickel PJ, Levina E. Regularized estimation of large covariance matrices. Annals of Statistics. 2008; 36: 199-227.
- Shao J. Linear model selection by cross-validation. Journal of the American Statistical Association. 1993; 88: 486-495.
- Wieand S, Gail MH, James BR, James KL. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika. 1989; 76: 585-592.